26 Commits

Author SHA1 Message Date
Joseph Doherty f23e368a74 fix(server, admin): wire sp_RegisterNodeGenerationApplied + overlay heartbeat onto ClusterNode
dbo.sp_RegisterNodeGenerationApplied was defined by the initial
StoredProcedures migration but had zero callers in src/. The server
polled sp_GetCurrentGenerationForCluster every 5s but never reported
back, so dbo.ClusterNodeGenerationState stayed empty for every node
and both the Admin UI Fleet status page ("No node state recorded")
and the cluster-detail Redundancy LastSeenAt indicator ("never
STALE") showed broken liveness forever.

Server side (GenerationRefreshHostedService):
* New testable seam: Func<long, NodeApplyStatus, string?, CT, Task>?
  registerAppliedAsync constructor parameter, defaulting to a real
  sp_RegisterNodeGenerationApplied call against the central DB.
* TickAsync now calls the proc at two points: after every successful
  apply with NodeApplyStatus.Applied, and on every no-change tick as
  a heartbeat (also Applied) so LastSeenAt stays fresh.
* Apply failures now wrap the lease + coordinator.RefreshAsync in a
  try/catch, report NodeApplyStatus.Failed with the exception message,
  and advance LastAppliedGenerationId regardless of outcome so we
  don't loop on the same broken apply every 5s.
* Register-call failures are best-effort (LogDebug heartbeat, LogWarning
  apply-report) — a transient DB outage during reporting must not
  crash the publisher or block the next apply.

Admin side (ClusterNodeService.ListByClusterAsync): the Redundancy tab
reads ClusterNode.LastSeenAt, but no current writer maintains that
column — the heartbeat goes to ClusterNodeGenerationState.LastSeenAt.
Overlay the GenerationState heartbeat onto the returned ClusterNode
rows when more recent, so IsStale + the Redundancy table column
reflect actual liveness without a schema change or new write path.

Tests: 3 new cases on GenerationRefreshHostedServiceTests verify
first-apply reports Applied, no-change ticks heartbeat with Applied,
and register-call failure does not roll back the cursor or block
subsequent ticks. All 8 GenerationRefresh tests pass.

Verified live on node-dev-a / cluster-dev: dbo.ClusterNodeGenerationState
now populated with CurrentGenerationId=1, LastAppliedStatus=Applied,
fresh LastSeenAt. Fleet status page shows the node (KPIs NODES 1 /
APPLIED 1 / STALE 0 / FAILED 0). Redundancy tab KPI STALE went 1\xe2\x86\x920 and
the row shows a real LAST SEEN timestamp. Bonus: FleetStatusHub
SignalR push now fires the cluster-page Live update banner on every
heartbeat because there are finally state changes to push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:59 -04:00
Joseph Doherty c8de58d6d3 fix(admin-ui): render published gen read-only on the 6 cluster-detail content tabs
The Equipment, UNS Structure, Namespaces, Drivers, Tags, and ACLs tabs
all rendered only an "Open a draft to edit" placeholder when no draft
was open — even when the cluster had a fully populated published
generation. docs/v2/admin-ui.md \xa7Cluster Detail describes these as
"read-only views of the published generation" with an "Edit in draft"
affordance; that view was never wired. The earlier code path also
correctly rendered nothing when the cluster had no published gen yet,
which was indistinguishable from the broken state.

Collapse the six per-tab conditions into one shared branch that threads
the published gen ID into the existing tab components when no draft
exists, wrapped in <fieldset disabled> so any Add/Edit button click in
the read-only state cannot silently mutate published rows even though
the tab components themselves don't yet honor an IsReadOnly flag.
Banner above the content explains the state. Surgical: zero changes to
the ~1500 LoC across the six tab components.

Verified live on cluster-dev gen 1: Drivers tab now shows the
cluster-dev-galaxy-main GalaxyMxGateway row read-only; Namespaces tab
shows cluster-dev-galaxy-ns SystemPlatform row; both with the read-only
banner and visibly disabled affordances.

Follow-up worth doing later: refactor each tab component to accept an
IsReadOnly parameter so the disabled-affordance UX is per-tab rather
than a blanket fieldset opacity wash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:32 -04:00
Joseph Doherty 8fe7c8bea6 refactor(driver-galaxy): switch to sibling-repo MxGateway client + drop vendored libs
The sibling mxaccessgw repo (clients/dotnet/) restored a proper client
library + contracts under the new ZB.MOM.WW.MxGateway namespace, so the
binary-vendoring stopgap from PR Driver.Galaxy-016 can unwind via plan #1
of libs/README.md.

- csproj: replace <Reference HintPath="libs\MxGateway.*.dll"> with a
  ProjectReference into ..\..\..\..\mxaccessgw\clients\dotnet  ZB.MOM.WW.MxGateway.Client\. The five backfill PackageReference shims
  (Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client, Polly.Core,
  Microsoft.Extensions.Logging.Abstractions) are now transitive again.
- Source: 'using MxGateway.X' -> 'using ZB.MOM.WW.MxGateway.X' across
  19 driver files + 14 test files. No fully-qualified MxGateway.* usages
  in code, so no behavioural changes — purely a using-prefix flip.
- libs/: deleted MxGateway.Client.dll, MxGateway.Contracts.dll, README.md
  (orphan after the unwind).

Verified: dotnet build clean (Release), all 245 Driver.Galaxy unit tests
pass, OtOpcUa service running with the new client DLL loaded
(opc.tcp://localhost:4840/OtOpcUa, no FileNotFound/TypeLoad/
MissingMethod in startup logs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 14:55:15 -04:00
Joseph Doherty c6082aa0b9 fix(admin-e2e): register missing DI services so ClusterDetail interactive circuit boots
UnsTabDragDropE2ETests were timing out at the 'UNS Structure' nav-link
locator because AdminWebAppFactory never registered AdminHubConnectionFactory
/ HubTokenService / DataProtection — ClusterDetail.razor's @inject threw at
circuit boot, so the page never advanced past the Loading placeholder. 2 → 3
pass after the registrations land. Also documents the Modbus standard-vs-
exception_injection coverage matrix in the fixture README + cross-references
docs/drivers/AbServer-Test-Fixture.md from each Emulate test so a developer
landing on a skipped test has a direct doc pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:07:17 -04:00
Joseph Doherty b1f3e09661 test(modbus, abcip): align failing integration tests with fixtures
Two real bugs uncovered by re-running with the new fixture defaults
pointing at the shared docker host. Both are test-side, not driver-side.

AbCip — Driver_reads_seeded_DInt_from_ab_server (4 parametrized rows):
  Hardcoded 'ab://127.0.0.1:{port}/1,0' in the deviceUri instead of
  the resolved fixture.Host. The new 10.100.0.35 default (and any
  AB_SERVER_ENDPOINT override) silently couldn't reach this test —
  the driver tried to connect to a non-existent localhost:44818 and
  returned BadCommunicationError on all 4 profile rows. The sibling
  Emulate tests already use the fixture's resolved endpoint; this
  smoke test was missed in the original migration.

  Fix: deviceUri = $"ab://{fixture.Host}:{fixture.Port}/1,0".

Modbus — Float32_With_CDAB_Roundtrips_Through_Wire:
  Test wrote a Float32 to HR 100 (2 consecutive registers: 100+101).
  standard.json's writable HR range declares [100,100] only — a
  single-cell auto-incrementing register, not a 2-register pair. The
  write to register 101 was rejected with Illegal Data Address
  (BadOutOfRange).

  Fix: moved the tag from HR 100 to HR 200 (in standard.json's
  '[200, 209]' scratch range — 10 consecutive writable HRs). The
  Float32+CDAB semantic the test exercises is unchanged.

Modbus — Block_Read_Coalescing_Reduces_PDU_Count_End_To_End:
  Test read HR 300, 302, 304 — outside both the writable ranges and
  the uint16 seed list. pymodbus rejects reads to unseeded HRs even
  though 'hr size' is 2048. BadOutOfRange on every read.

  Fix: moved the tags from 300/302/304 to 200/202/204 (within the
  scratch range). The non-contiguous coalescing semantic (3 tags
  inside a 5-register window with MaxReadGap=5) is preserved.

After this commit:
  - Modbus.IntegrationTests: 6/38 pass / 32 skip / 0 fail
    (was 4 pass / 32 skip / 2 fail; 32 skips are profile-gated
    ExceptionInjectionTests — they need MODBUS_SIM_PROFILE=
    exception_injection and a different container, intentional gating)
  - AbCip.IntegrationTests: 10/12 pass / 2 skip / 0 fail
    (was 6 pass / 2 skip / 4 fail; 2 skips are Emulate tests that
    need the fixture for separate scenarios)

No driver code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:45:57 -04:00
Joseph Doherty 49644fc7fd test(fixtures): migrate integration-test fixture defaults to 10.100.0.35
CLAUDE.md "Docker Workflow" claims (per the 2026-04-28 migration note)
that all fixture-class default endpoints were rewritten to target the
shared Docker host at 10.100.0.35. Audit during today's e2e run showed
the claim was incomplete — five fixture classes still defaulted to
localhost / 127.0.0.1, causing every fixture-touching integration test
to skip with "endpoint unreachable" on a fresh box that hadn't set
the override env vars.

Files corrected:

  - tests/.../Modbus.IntegrationTests/ModbusSimulatorFixture.cs
      DefaultEndpoint: localhost:5020 → 10.100.0.35:5020
  - tests/.../S7.IntegrationTests/Snap7ServerFixture.cs
      DefaultEndpoint: localhost:1102 → 10.100.0.35:1102
  - tests/.../OpcUaClient.IntegrationTests/OpcPlcFixture.cs
      DefaultEndpoint: opc.tcp://localhost:50000 → opc.tcp://10.100.0.35:50000
  - tests/.../AbCip.IntegrationTests/AbServerFixture.cs
      Host default + ResolveHost fallback: 127.0.0.1 → 10.100.0.35
  - tests/.../AbLegacy.IntegrationTests/AbLegacyServerFixture.cs
      Host default + ResolveEndpoint fallback: 127.0.0.1 → 10.100.0.35

XML doc comments referencing the old localhost defaults were updated in
the same pass so the class-summary documentation matches the actual
default. The override-via-env-var mechanism (MODBUS_SIM_ENDPOINT,
AB_SERVER_ENDPOINT, AB_LEGACY_ENDPOINT, S7_SIM_ENDPOINT,
OPCUA_SIM_ENDPOINT) is unchanged — pointing at a real PLC or a
locally-running container still works exactly as before.

Verification:
  - Solution-wide dotnet build: 0 errors.
  - S7.IntegrationTests: 3/3 pass without env-var override.
  - OpcUaClient.IntegrationTests: 3/3 pass without env-var override.
  - Modbus.IntegrationTests: 4/38 (same as the env-var-override run —
    the 2 failures + 32 skips are pre-existing fixture-profile
    mismatches unrelated to this fix).
  - AbCip.IntegrationTests / AbLegacy.IntegrationTests: same results
    as the env-var-override run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:32:23 -04:00
Joseph Doherty 3d982d9a65 docs: sync against recent code changes
Five doc-content updates after this session's code-review resolution
sweep. No code touched; pure documentation drift correction.

1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting):
   Refreshed the deployment description from "three cooperating
   processes (Server, Admin, Galaxy.Host)" to "two cooperating
   Windows services (Server, Admin)". The legacy x86 TopShelf
   Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy
   access now flows through the in-process Tier-A GalaxyDriver
   talking gRPC to the sibling mxaccessgw gateway. Also called out
   decision #30 (AddWindowsService replacing TopShelf) inline.

2. docs/VirtualTags.md:
   - Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting"
     replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp
     regular compiler — Core.Scripting-008 / -016 retired the
     CSharpScript/ScriptRunner path).
   - Line 39: orphan-thread leak description rewritten. The
     CSharp.Scripting-era "underlying ScriptRunner keeps running on
     its thread-pool thread until the Roslyn runtime returns" is no
     longer accurate — the new pipeline binds the script as a
     regular C# Func<> delegate, so the leak is now "synchronous
     CPU-bound work on a pool thread" (same operator-visible
     effect, different mechanism).

3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows
   service"):
   Annotated both the decision body and the decision-log table row
   with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the
   replacement architecture. The original reasoning is preserved as
   audit trail per the decision-log convention.

4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1:
   Added an Implementation note describing the
   Core.Scripting-008 / -016 supersession of the original
   CSharpScript pipeline. The historical record stays; the note
   points future readers at docs/VirtualTags.md "Compile cache"
   for the current contract.

5. docs/plans/alarms-over-gateway.md "Files" section under client
   regeneration:
   Updated the .NET regeneration instructions to point at the new
   ZB.MOM.WW.MxGateway.Contracts.csproj path. The old
   clients/dotnet/MxGateway.Client.csproj no longer exists in the
   sibling repo (restructure after this plan was written) and the
   vendored-binaries situation in
   src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out
   so a reader following the plan won't chase a deleted path.

Verification: grep against docs/ for the pre-fix wordings ("three
cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner",
the wrong BadDeviceFailure hex code 0x80550000) returns no hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:57:04 -04:00
Joseph Doherty 23d59d73f2 fix(scripting+alarms): close remaining re-review findings
Single commit covering the four small/medium fixes from the updated
code review.

Core.Scripting-014 (Medium, Concurrency):
  CompiledScriptCache.Clear() used the key-only TryRemove(key, out var
  lazy) overload — same race shape Core.Scripting-006 closed in
  GetOrCompile's catch block. A concurrent re-add between snapshot and
  TryRemove was evicted + disposed while the new caller still held it.
  Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload.
  Regression test
  Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives
  added.

Core.Scripting-013 (Medium, Security):
  Hand-rolled BuildWrapperSource pastes user source between literal
  braces; brace-balanced source could inject sibling methods/classes
  alongside CompiledScript.Run. Analyzer still walked the injected
  members so it wasn't a direct escape, but it relaxed the documented
  'method body' authoring contract. Added EnforceSingleRunMember:
  after ParseText, the compilation unit must hold exactly one type
  (CompiledScript) and that type must hold exactly one member (the Run
  method). Any deviation throws CompilationErrorException with LMX001/
  LMX002 diagnostic IDs and a Core.Scripting-013 reference in the
  message. Two regression tests added covering the sibling-method and
  sibling-class injection vectors.

Core.Scripting-015 (Low, Correctness, latent):
  ToCSharpTypeName's generic branch truncated at the first backtick via
  IndexOf, silently dropping closed args of nested-generic shapes
  (Outer<T>.Inner<U>). No production caller exercises this shape today
  (all TContext/TResult are top-level non-nested), so the bug was
  latent. Rewrote the generic branch to walk the FullName segment-by-
  segment, consuming generic args per segment so nested shapes emit
  valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken
  Outer<T,U>).

Core.ScriptedAlarms-013 (Low, Documentation):
  The internal test accessors TryGetScratchReadCacheForTest /
  TryGetScratchContextForTest return live mutable scratch refilled in
  place under _evalGate. XML docs didn't warn future test authors about
  the synchronization contract. Added a <remarks> block to each
  documenting the only-safe-on-quiesced-engine + identity-or-single-key
  contract.

Verification (suites green):
  Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests)
  Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix)
  Core.VirtualTags.Tests: 57/57 (unchanged)

After this commit, all 12 findings from the updated re-review are
closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:00:59 -04:00
Joseph Doherty c2abbf45bd fix(driver-galaxy): align package versions + record vendored-DLL provenance
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).

Driver.Galaxy-016 (Medium, Perf/Resource):
  Reconciled the csproj PackageReferences with what the vendored
  MxGateway.Client.dll was actually built against, verified by
  reflecting Assembly.GetReferencedAssemblies() on the DLL:
    - Polly 8.5.2  →  Polly.Core 8.6.6
      (most consequential — Polly v7 fluent API vs Polly.Core v8
       resilience-pipeline API are DIFFERENT packages; the DLL was
       built against Polly.Core so the prior Polly reference would
       have failed at runtime with MissingMethodException the first
       time the gateway client's retry pipeline ran)
    - Grpc.Net.Client 2.71.0  →  2.76.0  (matches sibling Server/Worker)
    - Microsoft.Extensions.Logging.Abstractions 10.0.0  →  10.0.7
  Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
  left unchanged.

Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
  Original framing was a security concern about unknown-provenance
  binaries. User clarified the DLLs are their own code, built from
  their own mxaccessgw project, not third-party. Re-triaged to a
  documentation / audit-trail concern. Fix:
    - Added a Provenance section to libs/README.md recording the
      source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
      extracted from the AssemblyInformationalVersion baked into
      both DLLs by the original build) and SHA-256 checksums.
    - Documented the re-verification recipe (sha256sum + ilspycmd
      | grep AssemblyInformationalVersion).
  Recommendations about .gitattributes and CI hash-check deferred —
  the DLLs are frozen until an unwinding path is taken, so adding
  LFS or CI infrastructure now would need removal at unwinding.

Driver.Galaxy-018 (Low, Documentation):
  Most of the recommendation folded into the libs/README.md rewrite
  (pointed at sibling Server/Worker csproj as the live version source
  rather than the deleted MxGateway.Client.csproj; recorded source
  commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
  <Reference> items intentionally not added — MSBuild's default for
  HintPath references with bare-name Include attributes is already
  SpecificVersion=false, so explicitly setting it would be cosmetic
  without changing behaviour.

Driver.Galaxy-017 (Low, Design) — Deferred:
  Recommendation part (b) (record mxaccessgw source-commit SHA in
  libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
  Parts (a) and (c) — a GetVersion RPC at session-open and a parity
  test against the live gateway's proto descriptor — are substantial
  new RPC + plumbing work not in scope for this code-review sweep.
  The risk surface is bounded because either of the libs/README.md
  unwinding paths closes the vendoring + this concern naturally.
  Re-open if neither path is taken within the next quarter and the
  live gateway evolves its proto under the driver.

Verification:
  - Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
  - Driver.Galaxy.Tests: 245/245 pass against the corrected
    package set.
  - Solution-wide build remains clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:45:24 -04:00
Joseph Doherty 3a53d03d23 fix(scripting): block ThreadPool/Timer/AssemblyLoadContext in sandbox
Core.Scripting-012 (High, Security) resolution.

The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:

  - System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
    background-fanout threat Core.Scripting-003 closed against
    System.Threading.Tasks.
  - System.Threading.Timer — schedules unbounded callback work that
    outlives the per-evaluation timeout.
  - System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
    Defense-in-depth gap; invocation needs reflection (already denied)
    but the load itself was reachable.

Fix:
  - Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
    (preferred over type-granular per the recommendation so future BCL
    additions to that namespace are denied by default).
  - Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
    to ForbiddenFullTypeNames — both live in System.Threading shared
    with allowed primitives so they must be type-granular.

Regression tests added to ScriptSandboxTests:
  Rejects_ThreadPool_QueueUserWorkItem_at_compile
  Rejects_Timer_new_at_compile
  Rejects_AssemblyLoadContext_at_compile

Docs:
  docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
  and the Sandbox-escape compliance-check row both updated to enumerate
  the new entries per the Core.Scripting-009 doc-sync convention.

Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.

Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:39:20 -04:00
Joseph Doherty fb7c6c7046 fix(scripting): route engines through CompiledScriptCache (Core.Scripting-016)
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.

Resolution:
  - VirtualTagEngine + ScriptedAlarmEngine each gained a private
    CompiledScriptCache<TContext, TResult> instance.
  - Both Load methods now call _compileCache.GetOrCompile(source).
  - Publish-replace path: _compileCache.Clear() runs alongside the existing
    _tags / _alarms clears so the prior generation's ALCs are disposed
    before recompile.
  - Engine Dispose now calls _compileCache.Dispose() so shutdown actually
    releases the emitted assemblies.

Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).

Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.

Regression tests:
  - VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
  - ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
  Both use WeakReference + bounded GC.Collect() to prove the emitted
  assembly is reclaimable after engine.Dispose(). The alarms test had to
  be synchronous (not 'async Task<WeakReference>') because async state
  machines capture locals as state-struct fields, keeping them alive past
  the method's apparent end and defeating GC.

Verification:
  - Core.Scripting.Tests: 104/104 (unchanged).
  - VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
  - ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
  - All other consumer suites still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:33:34 -04:00
Joseph Doherty a6ae4e22d1 fix(status-codes): correct BadDeviceFailure from 0x80550000 to 0x808B0000
Driver.Cli.Common-007 + Driver.Cli.Common-008 resolution.

Driver.Cli.Common-007 (High, Correctness):
  0x80550000 is the canonical OPC UA spec value for BadSecurityPolicyRejected,
  not BadDeviceFailure. The correct spec value for BadDeviceFailure is
  0x808B0000 (verified against OPC Foundation Opc.Ua.StatusCodes;
  corroborated locally by Driver.Galaxy.Runtime.StatusCodeMap and both
  Wonderware historian quality mappers which all hand-pin the correct
  value).

  The bug was duplicated across six driver modules:
    - FocasStatusMapper.BadDeviceFailure
    - AbCipStatusMapper.BadDeviceFailure
    - AbLegacyStatusMapper.BadDeviceFailure
    - TwinCATStatusMapper.BadDeviceFailure
    - ModbusDriver.StatusBadDeviceFailure
    - S7Driver.StatusBadDeviceFailure
  Plus the SnapshotFormatter shortlist that named 0x80550000 as
  BadDeviceFailure, and three downstream Modbus tests that asserted
  against the wrong value (so CI was blind).

  This commit fixes all six native-mapper constants, the formatter
  shortlist, and the three Modbus tests in one pass. Added a regression
  guard to FormatStatus_does_not_apply_pre_fix_wrong_names that pins
  0x80550000 never renders as BadDeviceFailure (mirroring the existing
  -001 wrong-name guards).

  Behavior change: OPC UA clients consuming the native drivers now see
  the canonical BadDeviceFailure (0x808B0000) on device-fault paths
  instead of the misnamed BadSecurityPolicyRejected (0x80550000). Wire-
  level status semantics now match operator-facing CLI labels.

Driver.Cli.Common-008 (Low, Testing):
  Deleted the redundant FormatStatus_names_native_driver_emitted_codes
  Theory — its five InlineData rows were already covered by the
  well-known Theory in the same commit (5a9c459), and used a weaker
  ShouldContain vs the well-known Theory's ShouldBe (exact match).

Verification:
  - Driver.Cli.Common.Tests: 43/43 pass (was 48 after the -008 deletion).
  - Driver.Modbus.Tests: 263/263 pass.
  - Driver.AbCip.Tests: 262/262.
  - Driver.AbLegacy.Tests: 157/157.
  - Driver.FOCAS.Tests: 178/178.
  - Driver.S7.Tests: 112/112.
  - Driver.TwinCAT.Tests: 131/131.
  Total: 1146 tests across the affected modules, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:14:28 -04:00
Joseph Doherty 41e62b2663 docs(code-reviews): updated re-review at commit a9be809 — 12 new findings
Re-reviewed the four modules with source changes since the previous review
commit 76d35d1, per REVIEW-PROCESS.md section 6. Updated each findings.md
header (date 2026-05-23, commit a9be809) and appended new findings under
continued numbering. Regenerated README.md.

## New findings — 12 total across 4 modules

### Core.Scripting (5 new, IDs -012 to -016)
- **-012 High Security** — broadened BCL references (System.* + netstandard)
  re-expose System.Threading.ThreadPool / Timer / AssemblyLoadContext, which
  the analyzer's deny-list doesn't cover. Re-introduces the background-work
  threat Core.Scripting-003 closed via System.Threading.Tasks deny.
- **-013 Medium Security** — hand-rolled wrapper-source generation lets
  brace-balanced user source inject sibling methods/classes alongside
  CompiledScript.Run. Analyzer still gates forbidden types, but the
  documented 'method body' authoring contract is silently relaxed.
- **-014 Medium Concurrency** — CompiledScriptCache.Clear() uses key-only
  TryRemove(key, out _) — the same race the -006 resolution fixed in
  GetOrCompile's catch is latent here on publish-replace.
- **-015 Low Correctness** — ToCSharpTypeName truncates at first backtick;
  silently drops closed type arguments of nested-generic shapes (Outer<>.Inner<>).
  Latent — no production caller uses this shape today.
- **-016 Medium Performance** — VirtualTagEngine + ScriptedAlarmEngine call
  ScriptEvaluator.Compile directly without going through CompiledScriptCache,
  so the headline -008 collectible-ALC fix doesn't run on the actual
  production path — the per-publish leak is still in effect.

### Core.ScriptedAlarms (1 new, ID -013)
- **-013 Low Documentation** — new internal test accessors return the live
  mutable scratch dictionary; XML docs don't warn future test authors about
  the synchronisation contract.

### Driver.Cli.Common (2 new, IDs -007, -008)
- **-007 High Correctness** — 0x80550000 was added as BadDeviceFailure but
  the real OPC UA spec value for BadDeviceFailure is 0x808B0000 (verified
  against Driver.Galaxy.Runtime.StatusCodeMap and HistorianQualityMapper,
  both of which use the correct 0x808B0000). 0x80550000 is actually
  BadSecurityPolicyRejected. The native mappers (FOCAS / AbCip / AbLegacy)
  all use the wrong 0x80550000; this session's SnapshotFormatter extension
  propagated the wrong name and the test asserts against the same wrong
  value so CI is blind — same shape of bug as Driver.Cli.Common-001.
- **-008 Low Testing** — new FormatStatus_names_native_driver_emitted_codes
  Theory is redundant with the existing well-known Theory (same five
  InlineData rows added to both) and uses weaker ShouldContain assertion
  than the well-known Theory's ShouldBe.

### Driver.Galaxy (4 new, IDs -015 to -018)
- **-015 Medium Security** — vendored DLLs (libs/) have no recorded
  provenance: no source-commit SHA from the mxaccessgw repo, no SHA-256
  checksum in libs/README.md. Tampering / accidental swap undetectable.
- **-016 Medium Performance** — version skew between declared
  PackageReferences (Polly 8.5.2 / Grpc.Net.Client 2.71.0 /
  Microsoft.Extensions.Logging.Abstractions 10.0.0) and what the vendored
  DLL was actually built against (Polly.Core 8.6.6 / Grpc.Net.Client
  2.76.0 / Microsoft.Extensions.Logging.Abstractions 10.0.7). Latent now
  (assembly-version refs are loose) but precise shape that produces a
  runtime MissingMethodException.
- **-017 Low Design** — no contract-version handshake between the driver
  and the gateway; proto could evolve under the gateway without the
  driver noticing.
- **-018 Low Documentation** — libs/README.md points at the wrong sibling
  csproj as the version source-of-truth; missing SpecificVersion=false
  on the Reference items; missing mxaccessgw source-commit SHA.

## Particularly notable

Two findings undercut commits from this session:

- Driver.Cli.Common-007 invalidates commit 5a9c459 (which named 0x80550000
  as BadDeviceFailure across the cross-CLI shortlist).
- Core.Scripting-016 invalidates the production effect of commit 7b6ab2e
  (the collectible-ALC fix wired Dispose only via CompiledScriptCache,
  which the engines don't use).

The wider native-mapper miscoding behind -007 also affects three driver
modules outside this session's edit scope (FocasStatusMapper,
AbCipStatusMapper, AbLegacyStatusMapper all carry the wrong code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:02:47 -04:00
Joseph Doherty a9be80923c docs(v2-release-readiness): drop stale 'this branch task-galaxy-e2e' ref
The task-galaxy-e2e branch was merged + deleted; the durable reference
is PR #205 alone. Tidies a dangling pointer that future readers might
chase looking for a branch that no longer exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:50:06 -04:00
Joseph Doherty 994997ba7b fix(driver-galaxy): vendor MxGateway.Client + MxGateway.Contracts as binary refs
The sibling mxaccessgw repo restructured: clients/dotnet/MxGateway.Client
no longer exists, and the proto contracts moved to a new namespace
(ZB.MOM.WW.MxGateway.Contracts.Proto, was MxGateway.Contracts.Proto). The
driver's source still expects the pre-restructure namespace, so the
broken ProjectReference produced 86 build errors in src/ + 1 in tests/
on master.

Resolution: vendor the last known-good build of MxGateway.Client.dll
(99 KB, May 22) and MxGateway.Contracts.dll (490 KB, May 23) under
src/Drivers/.../Driver.Galaxy/libs/, reference them via <Reference
HintPath=...> in both the driver and its test csproj, and declare the
NuGet packages the dropped ProjectReference was supplying transitively
(Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client,
Microsoft.Extensions.Logging.Abstractions, Polly) at versions matching
the sibling repo's ZB.MOM.WW.MxGateway.Contracts.csproj so binary
compatibility is preserved.

Why this over a source migration:
  Source migration would require namespace renames across ~19 driver
  files PLUS reimplementing MxGatewayClient / MxGatewaySession /
  GalaxyRepositoryClient (~2,200 LoC) — the sibling repo dropped the
  client library entirely, keeping only the proto contracts. Vendoring
  the last known-good binaries unblocks the build in minutes, freezes
  the gateway contract surface at a known-good version, and preserves
  the option to migrate properly once the sibling repo decides whether
  to restore a client library or hand the work back to us.

libs/README.md documents the unwinding plan (either path closes the
debt: sibling restores a client library, or driver migrates to the new
contracts namespace + reimplements the client wrapper).

Verification:
  - dotnet build ZB.MOM.WW.OtOpcUa.slnx: 0 errors (was 87).
  - Driver.Galaxy unit tests: 245/245 pass.
  - Integration tests not run here (require a live mxaccessgw gateway).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:32:56 -04:00
Joseph Doherty 0001cdd579 fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path
Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.

Why this matters:
  On a busy line where many tags feeding many alarms change frequently,
  the old BuildReadCache allocated a fresh dictionary + context on every
  predicate evaluation — a steady stream of short-lived allocations the
  GC eventually has to reclaim. With the reuse, the dictionary and
  context are allocated once per alarm (on first evaluation) and refilled
  in place across every subsequent re-eval.

Implementation:
  - New private AlarmScratch class holds the reusable
    Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
    alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
    reference. The context observes refilled values without being
    re-created.
  - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
    engine, cleared in LoadAsync alongside _alarms so a config-publish
    drops the prior generation's scratch (Inputs / Logger may change).
  - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
    the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
    repopulate the dictionary in place, then runs the predicate against
    the reused context.
  - BuildReadCache removed.

Safety:
  Reuse is serialised under _evalGate which guarantees no two threads
  ever observe the same scratch in a half-refilled state. The
  AlarmPredicateContext is bound to the scratch dictionary by reference,
  so the predicate's ctx.GetTag(path) sees the freshly-refilled values
  rather than a stale snapshot.

Verification:
  - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
    locking the reuse contract).
  - All 56 VirtualTags tests still pass (unchanged).
  - All 104 Core.Scripting tests still pass (unchanged).

New tests in ScriptedAlarmEngineTests:
  - Reevaluation_reuses_the_same_read_cache_dictionary — asserts
    ReferenceEquals(scratch_before, scratch_after) across two
    evaluations of the same alarm.
  - Reevaluation_reuses_the_same_predicate_context — same, for the
    context.
  - LoadAsync_drops_the_prior_generations_scratch — asserts a config
    publish wipes the prior scratch (so a stale Logger / Inputs can't
    leak into the new generation).

Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.

Docs:
  - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
    rewritten as "hot-path allocation reuse" documenting the new
    contract + reuse safety reasoning + the three regression tests.
  - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
    Won't Fix → Resolved.
  - code-reviews/README.md regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:10:09 -04:00
Joseph Doherty 7b6ab2ec6f fix(scripting): unload compiled-script assemblies via collectible ALC
Core.Scripting-008 resolution: replace the legacy CSharpScript.CreateDelegate
path with hand-rolled CSharpCompilation + Emit + collectible AssemblyLoadContext,
so per-publish compile accretion no longer requires a server restart to reclaim.

Why this was needed:
  Roslyn's CSharpScript path emits dynamically-compiled script assemblies into
  the default AssemblyLoadContext, which is non-collectible. Across config-
  publish generations each Clear() drops dictionary entries but the emitted
  assemblies stay loaded for process lifetime, so memory grows steadily on
  long-running servers with frequent publishes. The accepted-limitation note
  in docs/VirtualTags.md recommended scheduled restarts as the workaround;
  operator feedback was that restarts are difficult, so the underlying
  limitation was the right thing to fix.

Implementation:
  - New ScriptAssemblyLoadContext(name, isCollectible: true) hosts one emitted
    script assembly per evaluator.
  - ScriptEvaluator.Compile synthesises a wrapper class around the user source
    (CompiledScript.Run(globals) — explicit return required per ordinary C#
    semantics, which every existing script already uses), builds a
    CSharpCompilation against the sandbox references, runs the
    ForbiddenTypeAnalyzer over the semantic model unchanged, emits to an
    in-memory PE stream, loads via ScriptAssemblyLoadContext.LoadFromStream,
    and binds a strongly-typed Func<ScriptGlobals<TContext>, TResult> delegate
    via reflection.
  - ScriptEvaluator now implements IDisposable — Dispose calls
    AssemblyLoadContext.Unload(), which makes the emitted assembly eligible
    for GC at the next collection cycle.
  - CompiledScriptCache.Clear() disposes every materialised evaluator before
    dropping its dictionary entry; CompiledScriptCache itself is now
    IDisposable for graceful server shutdown.
  - ScriptSandbox.Build returns a new SandboxConfig (References + Imports)
    instead of a Roslyn ScriptOptions; references now span BCL via the
    TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
    Microsoft.Win32.Registry, so forbidden BCL types resolve at compile and
    ForbiddenTypeAnalyzer is the sole security gate (consistent with the
    Core.Scripting-001 / -002 model — references-list-only restriction is
    porous against type forwarding, so the analyzer must be the real gate).

Verification:
  - All 104 Core.Scripting tests pass (was 101 — three new regression tests
    locking the unload contract).
  - All 56 VirtualTags tests pass (unchanged).
  - All 63 ScriptedAlarms tests pass (unchanged).
  - New CompiledScriptCacheTests:
    - Dispose_unloads_compiled_script_assembly_load_context — proves single-
      evaluator ALC unload via WeakReference + bounded GC.Collect() loop.
    - Clear_disposes_every_materialised_evaluator — proves publish-replace
      releases every prior generation's ALC.
    - GetOrCompile_after_Dispose_throws_ObjectDisposedException — locks the
      post-dispose contract.

Docs:
  - docs/VirtualTags.md "Compile cache" section rewritten: the accepted-
    limitation note replaced with the unload contract + the new authoring
    convention (explicit return).
  - docs/ScriptedAlarms.md cross-reference updated to drop the obsolete
    restart guidance.
  - code-reviews/Core.Scripting/findings.md Core.Scripting-008 flipped
    Won't Fix → Resolved with the implementation summary.
  - code-reviews/README.md regenerated.

Pre-existing breakage note: Driver.Galaxy fails the solution-wide build on
master because its ProjectReference to the sibling mxaccessgw repo's
MxGateway.Client targets a path that the sibling repo no longer has after a
recent restructuring. This is unrelated to Core.Scripting-008 and was
verified to exist on master before this branch was cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 15:55:04 -04:00
Joseph Doherty 5a9c4591b9 fix(cli-common): name native-driver-emitted status codes in SnapshotFormatter
Driver.FOCAS.Cli-005 follow-up: extend the SnapshotFormatter.FormatStatus
shortlist with the five Bad* codes the native-protocol mappers (FOCAS,
AbCip, AbLegacy) emit but which the shortlist previously left unnamed,
so they rendered only as severity-class 'Bad' instead of the documented
'BadDeviceFailure' / 'BadNotWritable' / ... names operators are told to
read off probe/write output.

Added entries:
  0x80020000 BadInternalError
  0x803B0000 BadNotWritable
  0x803C0000 BadOutOfRange
  0x803D0000 BadNotSupported
  0x80550000 BadDeviceFailure

(BadTimeout 0x800A0000 was already added under Driver.Cli.Common-001.)

Tests: SnapshotFormatterTests gains a new [Theory]
FormatStatus_names_native_driver_emitted_codes covering the five names,
and the existing well-known [Theory] is extended with the same entries
to enforce exact '0x... (Name)' rendering. Suite now 47 green (was 42).

Flips Driver.FOCAS.Cli-005 from Deferred to Resolved; README regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 15:14:08 -04:00
Joseph Doherty 0f8ce1cb80 docs(code-reviews): regenerate index — final batch — 6 Low findings resolved
Batch 7 closed the last Open findings in Client.UI. The review backlog
is now empty: 0 Open findings across all 31 modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:25:28 -04:00
Joseph Doherty 1b10194634 fix(client-ui): resolve Low code-review findings (Client.UI-003,004,006,009,010,011)
- Client.UI-003: wire Serilog properly per CLAUDE.md — console sink +
  rolling daily file sink in Program.Main, Log.CloseAndFlush in finally,
  per-VM Log.ForContext<> loggers.
- Client.UI-004: migrate the cert-store folder picker from the obsolete
  OpenFolderDialog to StorageProvider.OpenFolderPickerAsync (with
  TryGetFolderFromPathAsync seed + TryGetLocalPath extraction).
- Client.UI-006: surface formerly silent catch blocks via an observable
  StatusMessage on the Subscriptions / Alarms VMs that bubbles up into
  the shell's status bar; soft fallbacks log at Information level so
  hard failures stay distinguishable.
- Client.UI-009: docs/Client.UI.md now lists Standard Deviation in the
  Aggregate row of the Query Options table.
- Client.UI-010: removed the unused MinDateTimeProperty /
  MaxDateTimeProperty styled properties from DateTimeRangePicker.
- Client.UI-011: updated the cert-store TextBox watermark from the
  legacy AppData/LmxOpcUaClient/pki to the canonical
  AppData/OtOpcUaClient/pki.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:25:20 -04:00
Joseph Doherty 59ecd18169 docs(code-reviews): regenerate index — 25 Low findings resolved
Batch 6 cleared Open findings in Driver.FOCAS.Cli (1 deferred to
Driver.Cli.Common), Driver.Cli.Common, Driver.Historian.Wonderware.Client,
Client.CLI, and Client.Shared.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:13:29 -04:00
Joseph Doherty 2a6ac07111 fix(client-shared): resolve Low code-review findings (Client.Shared-003,004,009,010,011)
- Client.Shared-003: DefaultSessionAdapter.WriteValueAsync / CallMethodAsync
  guard against null/empty Results and throw ServiceResultException with
  the response's ServiceResult code instead of indexing into a missing
  list.
- Client.Shared-004: DefaultSessionAdapter.CloseAsync / HistoryReadRawAsync
  / HistoryReadAggregateAsync use the Session.*Async overloads and honour
  the caller's CancellationToken.
- Client.Shared-009: AcknowledgeAlarmAsync returns the underlying
  ServiceResultException.StatusCode on failure instead of always Good;
  IOpcUaClientService doc updated to describe the new contract.
- Client.Shared-010: ConnectionSettings.CertificateStorePath defaults to
  empty; DefaultApplicationConfigurationFactory resolves the canonical
  PKI path lazily, so per-failover ConnectionSettings copies don't hit
  the filesystem.
- Client.Shared-011: added the alarm-fallback regression test, extracted
  EndpointSelector as a pure static, and added EndpointSelectorTests
  covering security-mode match, Basic256Sha256 preference, fallback,
  diagnostics, hostname rewrite, and null/empty guards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:13:21 -04:00
Joseph Doherty 7fe9f16cf8 fix(client-cli): resolve Low code-review findings (Client.CLI-002,003,004,006,007,008,009,010)
- Client.CLI-002: SubscribeCommand's neverWentBad list now requires the
  node to be present in lastStatus (i.e. received at least one update)
  so the 'suspect' bucket only contains observed nodes.
- Client.CLI-003: every long-running command validates numeric option
  ranges (Interval / Depth / MaxDepth / Duration / Max) and throws
  CliFx CommandException on out-of-range values.
- Client.CLI-004: SubscribeCommand carries XML summary docs on the
  type, ctor, every [CommandOption] property, and ExecuteAsync —
  matching the sibling commands' style.
- Client.CLI-006: HistoryReadCommand parses --start / --end with
  InvariantCulture+UTC and surfaces FormatException as CommandException;
  every NodeIdParser.ParseRequired call wraps FormatException /
  ArgumentException as CommandException.
- Client.CLI-007: CommandBase.ConfigureLogging calls Log.CloseAndFlush()
  before assigning a new Log.Logger so prior sinks are disposed.
- Client.CLI-008: rewrote the subscribe and historyread sections of
  docs/Client.CLI.md (every flag documented, summary-bucket vocabulary,
  StandardDeviation aggregate, UTC --start/--end convention).
- Client.CLI-009: SubscribeCommand / AlarmsCommand use named local
  handlers and detach them via -= after UnsubscribeAsync so no
  notification reaches the console after the command's output phase
  ends.
- Client.CLI-010: added CommandRangeValidationTests,
  EventHandlerLifecycleTests, InputValidationErrorsTests,
  LoggerLifecycleTests, and SubscribeCommandSummaryTests pinning every
  Low fix; FakeOpcUaClientService gained AddDiscoveredVariable +
  RaiseDataChanged + BrowseResultsByParent helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:13:08 -04:00
Joseph Doherty 879925180b fix(driver-historian-wonderware-client): resolve Low code-review findings (Driver.Historian.Wonderware.Client-003,004,006,008,010)
- Driver.Historian.Wonderware.Client-003: replaced the mixed Interlocked
  + healthLock counters with RecordOutcome that touches _totalQueries
  and exactly one of _totalSuccesses / _totalFailures under one
  acquisition.
- Driver.Historian.Wonderware.Client-004: InvokeAndClassifyAsync routes
  transport + sidecar classification through a single RecordOutcome
  call; the legacy ReclassifySuccessAsFailure two-step is gone.
- Driver.Historian.Wonderware.Client-006: removed the dead
  ReconnectInitialBackoff / ReconnectMaxBackoff options and added a
  doc <remarks> stating the channel performs a single in-place
  reconnect; retry/backoff stays with the caller.
- Driver.Historian.Wonderware.Client-008: the audit-suppression comment
  block now records advisory titles, why neither applies, and the
  revisit trigger.
- Driver.Historian.Wonderware.Client-010: reworded Dispose() to claim
  deadlock-safety and added a GetHealthSnapshot summary documenting the
  single-channel collapse + counter invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:12:16 -04:00
Joseph Doherty 3ca569f621 fix(driver-cli-common): resolve Low code-review findings (Driver.Cli.Common-004,006)
- Driver.Cli.Common-004: confirm the FormatTable empty-input guard
  landed earlier (commit 1433a1c); flip status to Resolved with a
  cross-reference.
- Driver.Cli.Common-006: reword the SnapshotFormatter source-time
  column comment to describe the actual behaviour (right-most column,
  unmeasured, '-' for null timestamps) and confirm the
  DriverCommandBase summary now enumerates FOCAS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:12:04 -04:00
Joseph Doherty 6923be3aa2 fix(driver-focas-cli): resolve Low code-review findings (Driver.FOCAS.Cli-001,002,003,004; -005 deferred)
- Driver.FOCAS.Cli-001: WriteCommand.ParseValue now wraps numeric
  FormatException / OverflowException as CliFx CommandException with
  the offending value.
- Driver.FOCAS.Cli-002: SubscribeCommand's OnDataChange handler and the
  banner both take a writeLock so notification-callback and main-thread
  writes can't interleave; handler exceptions are warn-and-swallow.
- Driver.FOCAS.Cli-003: FocasCommandBase.ValidateOptions rejects
  --cnc-port outside 1..65535, non-positive --timeout-ms, and
  non-positive --interval-ms; ExecuteAsync calls it first.
- Driver.FOCAS.Cli-004: 'await using var driver' is the sole driver
  disposal path; dropped the redundant explicit await ShutdownAsync.
- Driver.FOCAS.Cli-005 (Deferred): the fix lives in
  Driver.Cli.Common.SnapshotFormatter — explicitly naming the
  status-code shortlist there benefits every driver CLI. Left as a
  Driver.Cli.Common follow-up.
- Registered the new tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests
  project in ZB.MOM.WW.OtOpcUa.slnx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:11:55 -04:00
142 changed files with 4941 additions and 579 deletions
+1
View File
@@ -85,6 +85,7 @@
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests.csproj" />
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests.csproj" />
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests.csproj" />
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests.csproj" />
</Folder>
<Folder Name="/tests/Client/">
<Project Path="tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests.csproj" />
+17 -17
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 0 |
## Checklist coverage
@@ -62,7 +62,7 @@ assumption precisely.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `Commands/SubscribeCommand.cs:129-137` |
| Status | Open |
| Status | Resolved |
**Description:** The summary computes `neverWentBad` as every target whose node-id key is
absent from the `everBad` dictionary. A node that received no update at all is also absent
@@ -78,7 +78,7 @@ streamed only good values.
"suspect" list only contains nodes that were actually observed and never reported bad
quality.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `neverWentBad` now requires the node to be present in `lastStatus` (i.e. it received at least one update) before being counted, so the "suspect" bucket only contains nodes that were actually observed and never reported bad quality.
### Client.CLI-003
@@ -87,7 +87,7 @@ quality.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` |
| Status | Open |
| Status | Resolved |
**Description:** Numeric command options accept any value with no range validation.
`--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be
@@ -100,7 +100,7 @@ is forwarded to `HistoryReadRawAsync`. None of these produce a clear operator-fa
`CliFx.Exceptions.CommandException` with an actionable message when a value is out of
range.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — every command's `ExecuteAsync` now validates the numeric option ranges (`--interval`, `--depth`, `--max-depth`, `--max`, `--duration`) and throws `CliFx.Exceptions.CommandException` with the offending value when a non-positive (or otherwise out-of-range) value is supplied. Pinned by `CommandRangeValidationTests`.
### Client.CLI-004
@@ -109,7 +109,7 @@ range.
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `Commands/SubscribeCommand.cs:13-37` |
| Status | Open |
| Status | Resolved |
**Description:** `SubscribeCommand` is the only command in the module whose constructor
and all `[CommandOption]` properties have no XML doc comments. Every other command
@@ -121,7 +121,7 @@ otherwise-uniform documentation convention of the module.
**Recommendation:** Add `<summary>` XML docs to the `SubscribeCommand` constructor and to
each of its option properties, matching the style used by the sibling commands.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `SubscribeCommand` now carries `<summary>` XML docs on the type, the constructor, every `[CommandOption]` property, and `ExecuteAsync`, matching the style used by the sibling commands.
### Client.CLI-005
@@ -156,7 +156,7 @@ callback.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` |
| Status | Open |
| Status | Resolved |
**Description:** Operator input-format errors surface as raw .NET exceptions rather than
clean CLI errors. An unparseable start/end value throws `FormatException` straight out of
@@ -170,7 +170,7 @@ is converted to a `CliFx.Exceptions.CommandException` with a clean exit code.
`CommandException` with a concise message and a non-zero exit code, so malformed input
yields a one-line error instead of a stack trace.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `HistoryReadCommand` parses `--start`/`--end` with `CultureInfo.InvariantCulture` + `AssumeUniversal`/`AdjustToUniversal`, catches `FormatException`, and rethrows as `CommandException` with the offending value. Every command's call to `NodeIdParser.ParseRequired` is wrapped in a `catch (FormatException or ArgumentException)` block that surfaces the underlying message as a clean CLI error. Pinned by `InputValidationErrorsTests`.
### Client.CLI-007
@@ -179,7 +179,7 @@ yields a one-line error instead of a stack trace.
| Severity | Low |
| Category | Performance & resource management |
| Location | `CommandBase.cs:112-123` |
| Status | Open |
| Status | Resolved |
**Description:** `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a
logger, and assigns it to the static `Log.Logger` without disposing the previously
@@ -193,7 +193,7 @@ abandons the prior console sink without disposal. The pattern is incorrect:
build the logger into a local `ILogger` the command owns and disposes, rather than mutating
global static state per command.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `CommandBase.ConfigureLogging` now calls `Log.CloseAndFlush()` before assigning a new `Log.Logger`, so a prior logger's console sink is disposed before the next one is installed. Pinned by `LoggerLifecycleTests`.
### Client.CLI-008
@@ -202,7 +202,7 @@ global static state per command.
| Severity | Low |
| Category | Documentation & comments |
| Location | `docs/Client.CLI.md:158-217` |
| Status | Open |
| Status | Resolved |
**Description:** `docs/Client.CLI.md` is stale relative to the code at this commit.
(1) The `subscribe` command section documents only `-n` and `-i`, but the code
@@ -218,7 +218,7 @@ code option description includes it.
`docs/Client.CLI.md` from the current option set, including the five new subscribe flags
and the `StandardDeviation` aggregate row.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — rewrote the `subscribe` section of `docs/Client.CLI.md` to document every flag (`-r/--recursive`, `--max-depth`, `-q/--quiet`, `--duration`, `--summary-file`) plus the summary-bucket vocabulary, and added the `StandardDeviation` row plus the UTC `--start`/`--end` convention note to the `historyread` section.
### Client.CLI-009
@@ -227,7 +227,7 @@ and the `StandardDeviation` aggregate row.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` |
| Status | Open |
| Status | Resolved |
**Description:** Both long-running commands attach an event handler
(`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach
@@ -243,7 +243,7 @@ but never the .NET event.
unsubscribing, using a named local delegate so it can be removed, ensuring no notification
is processed after the command output phase ends.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `SubscribeCommand` and `AlarmsCommand` declare named local handlers (`DataChangedHandler` / `AlarmEventHandler`) and detach them via `service.DataChanged -= ...` / `service.AlarmEvent -= ...` right after `UnsubscribeAsync` so no notification reaches the console once the command's output phase ends. Pinned by `EventHandlerLifecycleTests`.
### Client.CLI-010
@@ -252,7 +252,7 @@ is processed after the command output phase ends.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The new `SubscribeCommand` capabilities are largely untested. The four
`SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel,
@@ -268,4 +268,4 @@ exit, summary bucketing across good/bad/no-update nodes, and the `--summary-file
The `FakeOpcUaClientService` already exposes `RaiseDataChanged`, so feeding good/bad values
and asserting the summary text is straightforward.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `SubscribeCommandSummaryTests` (covering recursive collection via `FakeOpcUaClientService.AddDiscoveredVariable`, `--duration` auto-exit, summary bucketing for good/bad/never/never-went-bad, and the `--summary-file` write), `CommandRangeValidationTests`, `EventHandlerLifecycleTests`, `InputValidationErrorsTests`, and `LoggerLifecycleTests` to pin the other Low findings; `FakeOpcUaClientService` was extended with `AddDiscoveredVariable` / `RaiseDataChanged` helpers.
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a service fault) produces an `IndexOutOfRangeException` rather than a meaningful OPC UA `StatusCode` or `ServiceResultException`.
**Recommendation:** Guard both accesses — throw `ServiceResultException` with the response's `ResponseHeader.ServiceResult` (or `BadUnexpectedError`) when `Results` is empty.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added empty-Results guards to both `WriteValueAsync` (lines 80-85) and `CallMethodAsync` (lines 293-298) in `DefaultSessionAdapter`. Each now throws `ServiceResultException` carrying `response.ResponseHeader.ServiceResult.Code` (or `StatusCodes.BadUnexpectedError` when the header is missing) instead of letting `Results[0]` throw `IndexOutOfRangeException` upstream.
### Client.Shared-004
@@ -78,13 +78,13 @@
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` |
| Status | Open |
| Status | Resolved |
**Description:** `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the `CancellationToken` parameter is ignored on these paths.
**Recommendation:** Use the stack's async overloads (`Session.HistoryReadAsync`, `Session.CloseAsync`) where available, or wrap the synchronous calls in `Task.Run`, so the methods are genuinely asynchronous and honor the cancellation token.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced the three blocking calls with their async counterparts: `CloseAsync` now awaits `Session.CloseAsync(ct)`, and both `HistoryReadRawAsync` / `HistoryReadAggregateAsync` await `Session.HistoryReadAsync(...)` with `.ConfigureAwait(false)`. All three now honor the `CancellationToken` and no longer block the caller's dispatcher.
### Client.Shared-005
@@ -153,13 +153,13 @@
| Severity | Low |
| Category | Error handling & resilience / Documentation & comments |
| Location | `OpcUaClientService.cs:302-322` |
| Status | Open |
| Status | Resolved |
**Description:** `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAsync`, which throws `ServiceResultException` on a bad call result. A failed acknowledgment therefore never returns a bad `StatusCode` — it throws — and the `StatusCode` return value is dead. Callers writing `if (StatusCode.IsBad(result))` will never see a bad result and will not catch the exception.
**Recommendation:** Either change the return type to `Task` (and let exceptions signal failure), or catch `ServiceResultException` in `AcknowledgeAlarmAsync` and return its `StatusCode`. Update the XML doc to match whichever is chosen.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `AcknowledgeAlarmAsync` now wraps the `CallMethodAsync` invocation in a try/catch for `ServiceResultException`, logging the failure and returning `ex.StatusCode` so callers using `if (StatusCode.IsBad(result))` see the bad status. The `IOpcUaClientService.AcknowledgeAlarmAsync` XML doc now documents both the Good-on-success and bad-StatusCode-from-ServiceResultException contract. Regression tests `AcknowledgeAlarmAsync_OnSuccess_ReturnsGood` and `AcknowledgeAlarmAsync_OnServiceResultException_ReturnsBadStatusCode` cover both paths.
### Client.Shared-010
@@ -168,13 +168,13 @@
| Severity | Low |
| Category | Performance & resource management |
| Location | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` |
| Status | Open |
| Status | Resolved |
**Description:** `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call per process, the legacy-folder migration with `Directory.Exists`/`Directory.Move` filesystem IO. `ConnectToEndpointAsync` constructs a fresh `ConnectionSettings` per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The `_migrationChecked` fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk.
**Recommendation:** Make `CertificateStorePath` default to `string.Empty` and resolve `ClientStoragePaths.GetPkiPath()` lazily inside `DefaultApplicationConfigurationFactory.CreateAsync` only when the path is unset.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ConnectionSettings.CertificateStorePath` now defaults to `string.Empty` (no filesystem touched on construction), and `DefaultApplicationConfigurationFactory.CreateAsync` resolves the canonical PKI path via `ClientStoragePaths.GetPkiPath()` only when the supplied path is null/whitespace. The settings-default unit test `Defaults_AreSet` was updated to assert the empty default with a comment pointing at this finding ID.
### Client.Shared-011
@@ -183,10 +183,10 @@
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in `OnAlarmEventNotification` (the `Task.Run` supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) `WriteValueAsync` string coercion against an unwritten/`Bad`-status node (Client.Shared-008) is untested; (d) the production adapters (`DefaultSessionAdapter`, `DefaultEndpointDiscovery`) have no unit coverage — understandable since they wrap the SDK, but the `Results[0]` guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested.
**Recommendation:** Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract `DefaultEndpointDiscovery` best-endpoint selection into a pure function so it can be unit tested.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added the previously-missing unit coverage: (a) `OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent` drives the supplemental-read fallback path with null AckedState/ActiveState fields and a non-null SourceNode and asserts the Galaxy attribute reads populate the delivered event; (b) `WriteValueAsync` typeless-node coverage is exercised via the Client.Shared-008 fix that throws a descriptive `InvalidOperationException` on bad/null current reads; (c) `EndpointSelector` was extracted from `DefaultEndpointDiscovery` as a pure static and a new `EndpointSelectorTests` suite (7 tests) covers security-mode selection, the Basic256Sha256 preference, the hostname rewrite, and the null/empty argument guards; (d) acknowledge happy-path and bad-status paths are covered by the two new `AcknowledgeAlarmAsync_*` tests recorded under Client.Shared-009. Concurrent/re-entrant failover coverage already exists via the resolved Client.Shared-005/-006 tests in the suite.
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -89,7 +89,7 @@ directly so the compiler can prove non-nullness.
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` |
| Status | Open |
| Status | Resolved |
**Description:** The csproj references `Serilog` and `Serilog.Sinks.Console`, and
`docs/Client.UI.md` lists Serilog as the logging technology, but no source file in
@@ -104,7 +104,7 @@ rolling daily file sink the project standard calls for) and route Avalonia loggi
through it, or drop the unused `Serilog` package references and correct
`docs/Client.UI.md`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Honoured the CLAUDE.md mandate by wiring up Serilog with a console sink + a rolling daily file sink (`{LocalAppData}/OtOpcUaClient/logs/client-ui-*.log`, retained 14 days). Added `Serilog.Sinks.File` to the csproj and a `ConfigureLogging()` initializer in `Program.Main` that creates `Log.Logger` before `BuildAvaloniaApp()` and calls `Log.CloseAndFlush()` on exit. Each VM that previously had silent swallow blocks now owns a static `Log.ForContext<>()` logger so failures (subscribe, alarm subscribe, redundancy probe, recursive browse) are written to the rolling file. Avalonia's own logging is still routed through `LogToTrace` — replacing that would require a custom `ILogSink` adapter outside the scope of this finding.
### Client.UI-004
@@ -113,7 +113,7 @@ through it, or drop the unused `Serilog` package references and correct
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `Views/MainWindow.axaml.cs:125-138` |
| Status | Open |
| Status | Resolved |
**Description:** `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is
obsolete in Avalonia 11.x (the version pinned in the csproj). The supported
@@ -125,7 +125,7 @@ Avalonia major version.
**Recommendation:** Migrate the folder chooser to
`TopLevel.GetTopLevel(this).StorageProvider.OpenFolderPickerAsync(...)`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Replaced `OpenFolderDialog` with `TopLevel.GetTopLevel(this).StorageProvider.OpenFolderPickerAsync(...)`, using `TryGetFolderFromPathAsync(vm.CertificateStorePath)` as the suggested start location and `TryGetLocalPath()` to extract the chosen path. The CS0618 obsoletion warning no longer appears in the build output.
### Client.UI-005
@@ -165,7 +165,7 @@ method, not only from `DisconnectAsync`.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` |
| Status | Open |
| Status | Resolved |
**Description:** Many catch blocks swallow exceptions silently with an empty body
and only a comment (`// Redundancy info not available`, `// Subscribe failed`,
@@ -180,7 +180,7 @@ permission denial effectively impossible from the UI.
message or write the exception to a log. Distinguish "feature not supported"
(condition refresh) from "operation failed" so genuine errors are not hidden.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Added an observable `StatusMessage` property on `SubscriptionsViewModel` and `AlarmsViewModel`; each former silent catch now logs through Serilog (via Client.UI-003's logger) and writes a user-visible message. `MainWindowViewModel.InitializeService` subscribes to both child VMs' `StatusMessage` changes and bubbles them up into the shell's `StatusMessage` (which is already bound to the status bar). Soft conditions are distinguished from hard failures: `RequestConditionRefreshAsync` failures log at Information level and surface as "Condition refresh not supported by server" rather than a generic error, matching the recommendation. Redundancy probe failure still leaves `RedundancyInfo` null but now logs at Information level instead of dropping the exception. Regression tests `AddSubscription_OnFailure_SurfacesStatusMessage`, `AddSubscriptionForNodeAsync_OnFailure_SurfacesStatusMessage`, `Subscribe_OnFailure_SurfacesStatusMessage`, and `ConnectCommand_RedundancyFailure_DoesNotBreakConnection` cover the four affected swallow sites.
### Client.UI-007
@@ -239,7 +239,7 @@ any background reconnect timers are leaked until process exit. The
| Severity | Low |
| Category | Design-document adherence |
| Location | `ViewModels/HistoryViewModel.cs:44-54` |
| Status | Open |
| Status | Resolved |
**Description:** `HistoryViewModel.AggregateTypes` exposes eight entries: `null`
(Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`.
@@ -250,7 +250,7 @@ stale relative to the code.
**Recommendation:** Update the "Aggregate" row in `docs/Client.UI.md` to include
Standard Deviation.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Added "Standard Deviation" to the Aggregate row of the Query Options table in `docs/Client.UI.md` so it matches the eighth entry already exposed by `HistoryViewModel.AggregateTypes`.
### Client.UI-010
@@ -259,7 +259,7 @@ Standard Deviation.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` |
| Status | Open |
| Status | Resolved |
**Description:** `DateTimeRangePicker` declares `MinDateTimeProperty` /
`MaxDateTimeProperty` styled properties with public CLR accessors, but neither is
@@ -272,7 +272,7 @@ constraint the control does not enforce.
path (turn out-of-range input red, as invalid input already is) or remove the two
unused styled properties.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Removed `MinDateTimeProperty` / `MaxDateTimeProperty` and their CLR accessors from `DateTimeRangePicker.axaml.cs`. No XAML or external caller binds the properties (grep across the repo confirmed only the control file referenced them), so removing the dead API surface is the correct fix; adding min/max clamping would have been speculative behaviour without a calling site.
### Client.UI-011
@@ -281,7 +281,7 @@ unused styled properties.
| Severity | Low |
| Category | Documentation & comments |
| Location | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` |
| Status | Open |
| Status | Resolved |
**Description:** The certificate-store-path `TextBox` watermark reads
`(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208
@@ -293,4 +293,4 @@ that no longer matches where settings and the PKI store actually live.
**Recommendation:** Update the watermark to reference `OtOpcUaClient/pki`, or bind
it to `ClientStoragePaths.GetPkiPath()` so it cannot drift again.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Updated the watermark text in `Views/MainWindow.axaml` from `(default: AppData/LmxOpcUaClient/pki)` to `(default: AppData/OtOpcUaClient/pki)` so it matches the canonical folder name resolved by `ClientStoragePaths` (the binding-to-helper alternative was considered but a static string keeps the watermark cheap; the path is also already documented in `docs/Client.UI.md`).
+71 -4
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 0 |
@@ -14,6 +14,8 @@
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
### 2026-05-22 review (commit `76d35d1`)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Core.ScriptedAlarms-002 |
@@ -27,6 +29,26 @@ a category produced nothing rather than leaving it blank.
| 9 | Testing coverage | Core.ScriptedAlarms-012 |
| 10 | Documentation & comments | Core.ScriptedAlarms-003 |
### 2026-05-23 re-review (commit `a9be809`)
Focused re-review of the Core.ScriptedAlarms-009 resolution (commit `0001cdd`) —
new `AlarmScratch` class, `_scratchByAlarmId` ConcurrentDictionary, `RefillReadCache`
helper, and internal test accessors. Only the changed/new code since `76d35d1` was
re-examined; existing closed findings stay as audit trail.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | Core.ScriptedAlarms-013 |
## Findings
### Core.ScriptedAlarms-001
@@ -156,13 +178,29 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Performance & resource management |
| Location | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
| Status | Won't Fix |
| Status | Resolved |
**Description:** `BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` on every predicate evaluation, i.e. on every upstream tag change for every referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. `AlarmPredicateContext` is also newly constructed each evaluation (line 281).
**Recommendation:** Minor. If the evaluation path shows up in allocation profiling, the read cache could be a reused per-alarm buffer cleared between evaluations (evaluations are already serialised under `_evalGate`, so a single shared scratch dictionary is safe). Not worth doing speculatively — flag for the perf surface in `docs/v2/Galaxy.Performance.md` if alarm evaluation is ever soak-tested.
**Resolution:** Won't Fix 2026-05-23 — per the recommendation, no code change. Documented the known allocation characteristic in `docs/v2/Galaxy.Performance.md` (new "Scripted-alarm engine — known hot-path allocations" section) so a future soak that surfaces pressure has a noted mitigation (reused per-alarm scratch buffer) and we don't re-find this in a later review.
**Resolution:** Resolved 2026-05-23 — added a per-alarm reusable `AlarmScratch`
(read-cache `Dictionary` + `AlarmPredicateContext`) held in
`_scratchByAlarmId`, populated lazily on first evaluation and refilled in place
by the new `RefillReadCache(Dictionary, IReadOnlySet)` helper on every
subsequent re-eval. `BuildReadCache` is gone. The reuse is safe because every
evaluation runs under `_evalGate`; the context wraps the dictionary by
reference, so the predicate's `ctx.GetTag(path)` sees the freshly-refilled
values. `LoadAsync` clears `_scratchByAlarmId` alongside `_alarms` so a
config-publish drops the prior generation's scratch (Inputs / Logger may
change). Regression tests added in `ScriptedAlarmEngineTests`:
`Reevaluation_reuses_the_same_read_cache_dictionary`,
`Reevaluation_reuses_the_same_predicate_context`, and
`LoadAsync_drops_the_prior_generations_scratch`; internal test hooks
`TryGetScratchReadCacheForTest` / `TryGetScratchContextForTest` exposed via
the existing `InternalsVisibleTo`. `docs/v2/Galaxy.Performance.md` "Scripted-alarm
engine" section rewritten to document the new reuse contract. Suite now 66
green (was 63).
### Core.ScriptedAlarms-010
@@ -208,3 +246,32 @@ a category produced nothing rather than leaving it blank.
**Recommendation:** Add engine-level tests that inject a controllable `Func<DateTime>` clock to drive `RunShelvingCheck`, cover the remaining Part 9 engine methods end-to-end, assert subscriber-exception isolation, and add a store-failure fake to lock in the chosen persistence-failure semantics from finding 007.
**Resolution:** Resolved 2026-05-22 — added 8 new engine-level tests covering all 6 gap areas: injectable-clock timed-shelve expiry via `RunShelvingCheckForTest`, `ConfirmAsync`/`TimedShelveAsync`/`UnshelveAsync`/`EnableAsync` end-to-end, subscriber-exception isolation, store-failure invariant, second-`LoadAsync` timer-leak regression, and `AreInputsReady` Bad/Uncertain guard; exposed `RunShelvingCheckForTest()` internal hook on the engine.
### Core.ScriptedAlarms-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `ScriptedAlarmEngine.cs:66-81` |
| Status | Resolved |
**Description:** The new internal test accessors `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` (introduced by the Core.ScriptedAlarms-009 resolution at `0001cdd`) return the *live* per-alarm scratch — the same `Dictionary<string, DataValueSnapshot>` instance the engine clears and refills in `RefillReadCache` under `_evalGate`, plus the `AlarmPredicateContext` that wraps it by reference. The XML docs describe the intended use ("assert the scratch is reused across evaluations (two reads return the same instance)") but do not explicitly warn that:
1. The returned `IReadOnlyDictionary` is the engine's mutable working set. Enumerating it from a test thread while the engine is mid-evaluation (e.g. during a `ReevaluateAsync` queued by `OnUpstreamChange`, or a `ShelvingCheckAsync` callback) is a concurrent-read-while-writer scenario against a plain `Dictionary` — undefined behaviour, can throw `InvalidOperationException` or return torn data.
2. Reference-equality comparisons (`ReferenceEquals(a, b)`) and single-key indexer reads (`dict["Temp"]`) on a quiesced engine are the only safe uses. The existing regression tests stay within those bounds, but a future test author has no in-code signal that broader reads are unsafe.
The engine itself is correct — `RefillReadCache` runs only under `_evalGate`, so the engine never tears its own state. The risk is purely on the test-side contract.
**Recommendation:** Add a `<remarks>` block to both `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` stating that the returned references point at live engine state, that reads are only safe when the engine is known to be idle (no in-flight `ReevaluateAsync`/`ShelvingCheckAsync`/`LoadAsync`), and that the intended uses are reference-identity assertions plus single-key lookups against a quiesced engine — never enumeration. No code change required; the engine's correctness depends on `_evalGate`, which is already documented.
**Resolution:** Resolved 2026-05-23 — applied the recommendation verbatim.
Added a `<remarks>` block to each of `TryGetScratchReadCacheForTest` and
`TryGetScratchContextForTest` documenting the synchronization contract:
the returned references point at live engine state refilled in place under
`_evalGate`, enumeration during an in-flight evaluation is a
concurrent-read-while-writer scenario against a plain `Dictionary`
(undefined behaviour), and the only safe uses are reference-identity
comparisons + single-key reads against a quiesced engine. No code change
required — the engine's correctness was always there; only the test-side
contract was undocumented.
+434 -16
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 0 |
@@ -14,18 +14,23 @@
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | Core.Scripting-006 |
| 4 | Error handling & resilience | Core.Scripting-007 |
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 |
| 6 | Performance & resource management | Core.Scripting-008 |
| 7 | Design-document adherence | Core.Scripting-009 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 |
| 10 | Documentation & comments | No issues found |
The 2026-05-23 re-review only covers code touched between commits `76d35d1` and
`a9be809` (primarily the Core.Scripting-008 ALC rewrite + the broadened BCL
references). Categories where the new code surface produced no issues are
recorded as "No new issues" for that pass.
| # | Category | Result (76d35d1) | Result (a9be809, new code only) |
|---|---|---|---|
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 | Core.Scripting-015 |
| 2 | OtOpcUa conventions | No issues found | No new issues |
| 3 | Concurrency & thread safety | Core.Scripting-006 | Core.Scripting-014 |
| 4 | Error handling & resilience | Core.Scripting-007 | No new issues |
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 | Core.Scripting-012, Core.Scripting-013 |
| 6 | Performance & resource management | Core.Scripting-008 | Core.Scripting-016 |
| 7 | Design-document adherence | Core.Scripting-009 | No new issues |
| 8 | Code organization & conventions | No issues found | No new issues |
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 | No new issues |
| 10 | Documentation & comments | No issues found | No new issues |
## Findings
@@ -240,7 +245,7 @@ race ordering.
| Severity | Low |
| Category | Performance & resource management |
| Location | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Status | Won't Fix |
| Status | Resolved |
**Description:** `CompiledScriptCache` has no capacity bound (acknowledged in the class
remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>`
@@ -257,7 +262,33 @@ compile scripts into a collectible `AssemblyLoadContext` so `Clear()` can unload
generations. At minimum add a note to `docs/ScriptedAlarms.md` so operators with
high-publish-frequency deployments are aware.
**Resolution:** Resolved 2026-05-23 — accepted as a documented known limitation rather than fixing in code (collectible `AssemblyLoadContext` for Roslyn-emitted assemblies is a v3 concern). The "Compile cache" section of `docs/VirtualTags.md` now carries a "Per-publish assembly accretion (accepted limitation, Core.Scripting-008)" note that operators with high-publish-frequency deployments can scan, and `docs/ScriptedAlarms.md` cross-references it. The accretion is benign at the expected "low thousands" of scripts scale; recommended mitigation is a scheduled server restart for deployments that publish very frequently.
**Resolution:** Resolved 2026-05-23 — switched the compile pipeline off the legacy
`CSharpScript.CreateDelegate` path (which emits into the default, non-collectible
`AssemblyLoadContext`) and onto a hand-rolled `CSharpCompilation`
`Compilation.Emit(MemoryStream)``ScriptAssemblyLoadContext.LoadFromStream` chain,
with the new `ScriptAssemblyLoadContext` constructed `isCollectible: true`. Each
compiled script lives in its own ALC; `ScriptEvaluator` now implements `IDisposable`
and calls `AssemblyLoadContext.Unload()` on dispose. `CompiledScriptCache.Clear()`
disposes every materialised evaluator before dropping its dictionary entry, and
`CompiledScriptCache` itself is now `IDisposable` for graceful server shutdown.
After a publish-replace cycle the prior generation's emitted assemblies become
eligible for GC; the reclaim is GC-timing-sensitive (Unload is
*eligible-for-collection*, not synchronous) and the next collection cycle reclaims
them. The references list is now BCL-wide (System.* + netstandard + Microsoft.Win32.Registry
via the TRUSTED_PLATFORM_ASSEMBLIES set) so forbidden BCL types resolve at compile and
`ForbiddenTypeAnalyzer` is the sole security gate (consistent with the
Core.Scripting-001 / -002 model). `docs/VirtualTags.md` "Compile cache" section rewritten;
`docs/ScriptedAlarms.md` cross-reference updated to drop the obsolete restart guidance.
Regression tests added in `CompiledScriptCacheTests`:
`Dispose_unloads_compiled_script_assembly_load_context`,
`Clear_disposes_every_materialised_evaluator`, and
`GetOrCompile_after_Dispose_throws_ObjectDisposedException`; the first two
prove ALC unload via `WeakReference` + bounded `GC.Collect()` loops. Suite now 104
green (was 101). Authoring convention: the synthesized wrapper is an ordinary
C# static method, so scripts must end with explicit `return …;` per ordinary C# rules
(the legacy `CSharpScript` "last expression yields result" shorthand no longer applies);
every script in the existing corpus already uses explicit `return` so this is a doc-only
change for new authors.
### Core.Scripting-009
@@ -336,3 +367,390 @@ a script logging at Error level produces both a `scripts-*.log` event and a comp
Warning event.
**Resolution:** Resolved 2026-05-23 — added three new test files: `ScriptSandboxBuildTests` covers the `Build` null / non-`ScriptContext` / base-class / concrete-subclass paths; `ScriptContextTests` locks `Deadband` boundary semantics (equal-to-tolerance returns false; just-over returns true; symmetric in direction; zero-tolerance returns true only on non-equal; negative tolerance trips on any non-equal); the new `Factory_plus_companion_sink_integration_surfaces_script_error_in_both_logs` test in `ScriptLogCompanionSinkTests` wires `ScriptLoggerFactory` + the companion sink together end-to-end and asserts an Error emission lands in both the scripts sink (at Error) and the main sink (at Warning), each tagged with `ScriptName`. Suite now 101 green (was 85 before).
### Core.Scripting-012
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
| Status | Resolved |
**Description:** The Core.Scripting-008 rewrite broadened the BCL references list
from a narrow allow-list (`System.Private.CoreLib` + `System.Linq` only) to the
full `TRUSTED_PLATFORM_ASSEMBLIES` set filtered to `System.*` + `netstandard` +
`Microsoft.Win32.Registry`. This change correctly delegates the security gate to
`ForbiddenTypeAnalyzer` (the new comment in `ScriptSandbox` calls this out
explicitly), but the analyzer's deny-list has not been expanded to match the new
attack surface, and three categories of dangerous BCL types in the `System.*`
allow-listed assemblies are now reachable from script source:
1. **`System.Threading.ThreadPool`** (in namespace `System.Threading`). The
Core.Scripting-003 fix added `System.Threading.Tasks` to deny `Task.Run` /
`Parallel` fan-out because background work that outlives the per-evaluation
timeout is the explicit threat. `ThreadPool.QueueUserWorkItem`,
`ThreadPool.UnsafeQueueUserWorkItem`, and `ThreadPool.RegisterWaitForSingleObject`
are exactly the same threat — they schedule background work that outlives the
`WaitAsync(Timeout)` budget and tie up worker threads — but `System.Threading`
itself is allowed (because `CancellationToken` / `SemaphoreSlim` / `Volatile`
live there). The Core.Scripting-003 resolution is incomplete on the new
reference surface.
2. **`System.Threading.Timer`** (same namespace). Schedules a background
callback; the script returns control to the engine but the timer keeps
firing past the evaluation budget. Same threat as `Task.Run`.
3. **`System.Runtime.Loader.AssemblyLoadContext`** (in namespace
`System.Runtime.Loader`, which is not denied — only `System.Runtime.InteropServices`
is). The constructor + `LoadFromAssemblyPath` / `LoadFromStream` /
`LoadFromAssemblyName` let a script load an arbitrary DLL into the host
process. Pass (1) of the analyzer resolves the receiver type
(`AssemblyLoadContext`, allowed) + the invocation symbol's containing type
(also `AssemblyLoadContext`, allowed) and lets the call through. Pass (2)
only inspects `TypeSyntax` nodes — if the script discards the returned
`Assembly` (e.g. `alc.LoadFromAssemblyPath(@"C:\evil.dll");`) there is no
`TypeSyntax` for the analyzer to walk and the call is accepted. Triggering
execution of the loaded code from inside the sandbox is hard (most of
`Assembly`'s surface is in `System.Reflection`, which is denied) but the
defense-in-depth gap is real: an attacker who can author a script also
typically controls a file path on the server (Admin UI uploads, share
mounts) and loading an assembly is the prerequisite to every chained
escape — module initializers, type-resolve handlers, and a future analyzer
slip would all become exploitable.
In addition, two lower-impact `System.*` types are reachable that arguably
shouldn't be: **`System.Console.SetOut`** / **`Console.SetError`** could
redirect the host's console streams (requires constructing a
`System.IO.TextWriter`, which is blocked, so the practical exploit is
`Console.WriteLine` log-spam only), and **`System.Globalization.CultureInfo.DefaultThreadCurrentCulture`**
could perturb the entire process's formatting behavior (subtle but real cross-script
side effect).
The original Core.Scripting-001 finding called out the model: when an allow-listed
namespace contains dangerous types, those types must be denied type-granularly.
The new reference surface introduces several more such types and the deny-list
has not been kept in sync.
**Recommendation:** Add `System.Threading.ThreadPool` and `System.Threading.Timer`
to `ForbiddenFullTypeNames`. Add `System.Runtime.Loader` as a namespace prefix
to `ForbiddenNamespacePrefixes` (every type in `System.Runtime.Loader`
`AssemblyLoadContext`, `AssemblyDependencyResolver`, `AssemblyLoadEventArgs` — is
out of script scope). Consider adding `System.Console` to `ForbiddenFullTypeNames`
to stop log-spam through the host's console streams, and at minimum document
`CultureInfo.DefaultThreadCurrentCulture` as an accepted cross-script side
effect. Each addition must have a regression test in `ScriptSandboxTests`
mirroring the Core.Scripting-010 vector style. Update
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 + the
"Sandbox escape" compliance-check row to enumerate the additions, per the
Core.Scripting-009 doc-sync convention.
**Resolution:** Resolved 2026-05-23 — added `System.Runtime.Loader` to
`ForbiddenNamespacePrefixes` (the namespace-prefix form preferred over
type-granular per the recommendation; future BCL additions to that namespace
are denied by default). Added `System.Threading.ThreadPool` and
`System.Threading.Timer` to `ForbiddenFullTypeNames` — both live in
`System.Threading` shared with allowed sync primitives so they must be
type-granular. Regression tests added to `ScriptSandboxTests`:
`Rejects_ThreadPool_QueueUserWorkItem_at_compile`,
`Rejects_Timer_new_at_compile`, `Rejects_AssemblyLoadContext_at_compile`.
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 +
the Sandbox-escape compliance-check row both updated per the
Core.Scripting-009 doc-sync convention. The two lower-impact suggestions
from the recommendation (`System.Console`, `CultureInfo.DefaultThreadCurrentCulture`)
were intentionally not addressed: `Console.SetOut` requires constructing
a `System.IO.TextWriter` which is already blocked, leaving only
`Console.WriteLine` log-spam (annoyance, not a security threat); and
`CultureInfo.DefaultThreadCurrentCulture` is a cross-script side-effect
worth knowing about but doesn't escape the sandbox. Recording both as
accepted minor risks. Test totals after fix: Core.Scripting 107 green
(was 104 — +3 new rejection tests).
### Core.Scripting-013
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
| Status | Resolved |
**Description:** The synthesized wrapper pastes the user's source verbatim
between `{` and `}` braces inside a static method body, with a `#line 1`
directive and no escaping. The legacy `CSharpScript.CreateDelegate` path was
robust to this because Roslyn's scripting compiler parses script source as a
top-level statement sequence; the new hand-rolled path is parsing ordinary C# in
a method body, so a script that injects matching `{` / `}` braces can extend the
synthesized compilation unit with additional methods, classes, or `#line`
directives. For example, a script body of
`return 0; } public static int Evil() { return 0; }} public static class CompiledScript2 { public static void M() {`
ends the `Run` method early, declares a sibling `Evil` method (and even a
sibling `CompiledScript2` class) inside the synthesized namespace, then opens an
unclosed method that consumes the wrapper's trailing `}\n}`. With matching brace
counts the script parses cleanly and compiles.
`ForbiddenTypeAnalyzer` walks every descendant of every syntax tree, so any
forbidden BCL types named inside the injected methods are still caught — the
finding is **not** a direct sandbox escape. However:
- It silently relaxes the operator-visible authoring contract documented in
`docs/VirtualTags.md` ("scripts are statement bodies that end with an
explicit `return …;`") to "scripts can be any compilable C# inside the
`CompiledScript` namespace" — operators have access to features the design
did not intend to expose (local types defined as siblings of `Run`, custom
module initializers via attributes, etc.).
- A script can embed its own `#line` directives that override the
`#line 1` we emit just above the user source, producing misleading error
locations in compiler diagnostics surfaced to the operator.
- Future hardening that relies on syntactic-shape assumptions (e.g.
"every script has exactly one method") would silently fail.
- It widens the analyzer's surface: the analyzer's correctness now depends on
Pass (2) correctly walking every conceivable C# construct that can name a
type, including ones a normal script body would never contain
(`UnmanagedCallersOnly` attribute, function pointer types `delegate*<...>`,
pattern types, switch arm types, …).
**Recommendation:** Either (a) reject scripts whose parsed body contains
declarations other than statements — walk the wrapper's syntax tree after parse
and require that the only members of `CompiledScript` are the single `Run`
method, raising a `CompilationErrorException` if anything else appears — or
(b) parse the user source independently as a `BlockSyntax` and inject the
parsed block as the method body via the Roslyn syntax API, which makes
brace-mismatched / class-injecting source unparseable. Add a regression test
covering at least the brace-injection vector
(`return 0; } public static int Evil() { return 0;`).
**Resolution:** Resolved 2026-05-23 — took option (a) from the recommendation:
added an `EnforceSingleRunMember` step to `ScriptEvaluator.Compile` (runs after
`CSharpSyntaxTree.ParseText` of the synthesized wrapper, before Roslyn
compile). The check requires exactly one type declaration in the compilation
unit (the `CompiledScript` class) AND exactly one member on that class (the
`Run` method). Any deviation — a sibling class, an additional namespace, a
sibling method or nested type alongside `Run` — throws
`CompilationErrorException` with diagnostic IDs `LMX001` / `LMX002` and a
message that names Core.Scripting-013 and points at the offending span. Two
regression tests added: `Rejects_sibling_method_injection_via_balanced_braces`
(injects a sibling method via `} public static int Evil() { …`) and
`Rejects_sibling_class_injection_via_balanced_braces` (injects an entire
sibling namespace + class). Option (b) (parse the user source independently
as a `BlockSyntax` and inject via Roslyn syntax API) was considered but the
parse-and-validate approach is more readable, gives clearer error messages,
and keeps the wrapper-source generation textual.
### Core.Scripting-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `CompiledScriptCache.cs:91-103` (`Clear`) |
| Status | Resolved |
**Description:** `Clear()` snapshots `_cache.Keys.ToArray()` then iterates,
calling `TryRemove(key, out var lazy)` on each — the key-only overload, not
the value-scoped one used in `GetOrCompile`'s catch block. Between the
snapshot and a given `TryRemove`, a concurrent `GetOrCompile(scriptSource)`
call that hashes to the same key can re-insert a fresh `Lazy` whose `.Value`
the caller already retained. The unconditional `TryRemove` then removes that
fresh `Lazy` and `DisposeLazyIfMaterialised(lazy)` calls `Dispose()` on its
evaluator — unloading the ALC while the concurrent caller still holds a
reference to the evaluator and intends to invoke it.
This is exactly the race-window pattern the Core.Scripting-006 resolution
fixed in `GetOrCompile`'s catch block (the test
`Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry` locks it
there). `Clear()` carries the same shape but uses the older, value-blind
overload, so the same race that finding-006 addresses is still latent on the
publish-replace path.
In current production wiring `Clear()` is intended for config-publish + tests
— neither overlaps steady-state evaluation under the documented design — so
the in-practice impact is low. But the cache is checked in as the
forward-looking compile cache for the engines (per `Script.SourceHash`'s docs
and the cache's own remarks); a future wiring that calls `Clear()` from
publish while evaluations are in flight would dispose live evaluators.
**Recommendation:** Replace the snapshot + `TryRemove(key, out var lazy)`
sequence with an enumeration that captures the `Lazy` reference at snapshot
time and uses the value-scoped `TryRemove(KeyValuePair<,>)` overload, mirroring
the Core.Scripting-006 fix:
```csharp
foreach (var entry in _cache.ToArray())
{
if (_cache.TryRemove(entry))
DisposeLazyIfMaterialised(entry.Value);
}
```
Add a regression test that races `GetOrCompile` against `Clear` and asserts
the caller's evaluator is still usable.
**Resolution:** Resolved 2026-05-23 — applied the recommendation verbatim:
replaced `foreach (var key in _cache.Keys.ToArray())` + key-only
`TryRemove(key, out var lazy)` with `foreach (var entry in _cache.ToArray())` +
value-scoped `TryRemove(entry)` (the `KeyValuePair<,>` overload). A concurrent
GetOrCompile re-add between the snapshot and the remove inserts a fresh Lazy
under the same key; the value-scoped comparison sees the mismatch and leaves
the fresh entry intact (instead of evicting + disposing the live evaluator
the concurrent caller still holds). Regression test
`Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives` added
to `CompiledScriptCacheTests` — single-threaded simulation that snapshots
the dict, mutates the entry to a fresh Lazy mid-flight, drives the same
value-scoped TryRemove overload Clear now uses, and asserts the fresh entry
survives. The two-thread race would be flaky to model directly; the
single-threaded semantic test is sufficient because the fix is the
overload-selection itself.
### Core.Scripting-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
| Status | Resolved |
**Description:** `ToCSharpTypeName` is documented to handle nested types
(`Outer+Inner``Outer.Inner`) via `Replace('+', '.')` for the
non-generic path (line 269) but the generic path (line 263-266) constructs the
name from `def.FullName!` then takes a substring up to the backtick. For a
**nested generic** type — e.g. `Outer.Inner<T>` whose `FullName` is
`Outer+Inner`1` — `Replace('+', '.')` is applied first, then `Substring(0, IndexOf('`'))`
on `"Outer.Inner`1"` produces `"Outer.Inner"`, which is correct. Good.
However, the generic branch does NOT handle the case where the OPEN generic
type itself is nested with `+` inside the parent's name when the parent is
also generic (`Outer<TOuter>.Inner<TInner>` — `FullName` is
`Outer`1+Inner`1[[TOuter,TInner]]`). For that shape `Substring(0, IndexOf('`'))`
truncates at the first backtick — yielding `"Outer.Inner"` — silently dropping
the closed type arguments of `Outer<TOuter>`. The resulting source string is
syntactically valid but semantically wrong: `global::Outer.Inner<TInner>` does
not name `Outer<TOuter>.Inner<TInner>`.
The production code never hits this shape — `TResult` is always one of
`object?`, `bool`, `int`, `double`, `string?`, `DateTime` across the
virtual-tag engine, the alarm engine, the test-harness, and the test suite,
and `ScriptGlobals<TContext>` is always a top-level generic over a top-level
`ScriptContext` subclass. The bug is latent. But it is a foot-gun for a
future caller (e.g. a Phase-8 driver that wires a context type defined as a
nested generic for grouping reasons) and the XML-doc comment claims
"handles nested types" without qualifying it.
A second smaller correctness gap on the same path: the comment claims
`global::`-qualified FQNs prevent accidental capture by the wrapper's `using`
directives, which is true for the generic / non-generic branches, but the
primitive aliases (`bool`, `int`, `string`, `object`, …) are emitted unqualified.
A script that defines a local `class bool` (now possible per Core.Scripting-013)
would shadow the alias. Probably benign, but worth a comment.
**Recommendation:** Add a check in the generic branch that walks the FullName
backtick-by-backtick — or use `INamedTypeSymbol`-style name composition from
`def.DeclaringType` recursively — so multi-arity-nested generics emit
correctly. At minimum update the XML doc to qualify "handles nested types" as
"handles single-level nesting; nested generics whose parent is itself generic
are not supported". Add a `ToCSharpTypeName` unit test (currently nothing
exercises this method directly — coverage relies on the end-to-end compile path,
so the bug surfaces only as a misleading Roslyn diagnostic).
**Resolution:** Resolved 2026-05-23 — rewrote the generic-type branch of
`ToCSharpTypeName` to walk the `FullName` segment-by-segment (split on `.`
after `+ → .` substitution). For each segment ending in `Name\`N`, the
algorithm consumes N generic arguments from `t.GetGenericArguments()` in
order and emits them as `<…>` on that segment. Nested generic-in-generic
shapes (`Outer<T>.Inner<U>`) now emit as
`global::Ns.Outer<T>.Inner<U>` (valid C#) rather than the pre-fix
`global::Ns.Outer<T, U>` (which dropped the segment boundary entirely
because `IndexOf('`')` truncated at the first backtick). No production
caller exercises this shape today (all `TContext` / `TResult` types in
the codebase are top-level non-nested), so the fix is preemptive — but
the algorithm is now correct for any future nested-generic context type.
### Core.Scripting-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
| Status | Resolved |
**Description:** The Core.Scripting-008 resolution introduced
`ScriptEvaluator.IDisposable` + `CompiledScriptCache.Clear()` that disposes
each materialised evaluator before dropping its dictionary entry, so per-publish
ALC accretion is no longer process-lifetime rooted **inside the cache**. But
neither production consumer of `ScriptEvaluator` uses the cache — both
`VirtualTagEngine.Load` and `ScriptedAlarmEngine.LoadAsync` call
`ScriptEvaluator<TContext, TResult>.Compile(...)` directly (lines 105 / 160
respectively), store the evaluator inside an internal `VirtualTagState` /
`AlarmState` record, and on the next `Load` simply call `_tags.Clear()` /
`_alarms.Clear()`. The dropped `ScriptEvaluator` references never have
`Dispose()` called on them, so the underlying `ScriptAssemblyLoadContext`
instances are never `Unload()`-ed. The .NET runtime guarantees that a
collectible ALC stays alive until `Unload()` is called explicitly — having
"no strong references" is necessary but not sufficient. So the publish-replace
cycle leaks every prior generation's emitted assembly exactly as before the
fix, even though the fix's infrastructure is in place.
The Core.Scripting-008 regression tests in `CompiledScriptCacheTests`
(`Dispose_unloads_compiled_script_assembly_load_context` /
`Clear_disposes_every_materialised_evaluator`) prove the contract on
`CompiledScriptCache`, but neither engine uses that class. There is no
integration test exercising the actual publish path — i.e. that calling
`VirtualTagEngine.Load(...)` twice with different definitions makes the prior
generation's ALC eligible for GC. As a result the fix's headline guarantee
("Server restarts are no longer required to reclaim compiled-script memory" —
`docs/VirtualTags.md`) is not actually delivered to the production engines.
This is the same observable behavior the original Core.Scripting-008 finding
described, surfacing on a different code path that the resolution did not touch.
**Recommendation:** Either route the engines' compile path through
`CompiledScriptCache<TContext, TResult>` (the documented design — the cache
already returns the same evaluator instance for identical source, and its
`Clear()` now performs the right disposal — and `Script.SourceHash`'s doc-comment
already names this as the cache key), or make the engines' `Load` methods
dispose the previous `ScriptEvaluator` instances before reassigning. The
former is the cleaner change because it also collapses redundant compiles
across publishes for unchanged scripts. Add an integration test along the
lines of `CompiledScriptCacheTests.Clear_disposes_every_materialised_evaluator`
for each engine: snapshot the per-evaluator emitted assembly via
`WeakReference`, call `Load(...)` with a different definition set, and assert
the prior generation's assemblies become collectable.
**Resolution:** Resolved 2026-05-23 — took the cleaner route from the
recommendation: routed both engines' compile paths through
`CompiledScriptCache<TContext, TResult>`. `VirtualTagEngine` and
`ScriptedAlarmEngine` each gained a private `_compileCache` instance field,
their `Load`/`LoadAsync` methods now call `_compileCache.GetOrCompile(source)`
instead of `ScriptEvaluator.Compile(source)` directly, and the cache is cleared
on publish-replace alongside the existing `_tags` / `_alarms` clears so the
prior generation's ALCs are disposed before recompile. Engine `Dispose` now
also calls `_compileCache.Dispose()` so the engine-shutdown path actually
releases the emitted assemblies. **Side-fix:** discovered + fixed an
adjacent bug in `CompiledScriptCache.Dispose()` itself — it set
`_disposed = true` before calling `Clear()`, but `Clear()`'s pre-existing
`if (_disposed) return` guard then aborted the drain unconditionally, so
the Dispose-triggered cleanup was a silent no-op. Removed the disposed-guard
on `Clear()` (the operation is idempotent — clearing an empty/cleared cache
is safe). Without this side-fix the engine-Dispose path would have left
the cached evaluators rooted forever even though the call chain looked
correct. **Side-fix for ScriptedAlarmEngine.Dispose:** moved the pre-existing
"do NOT clear `_alarms` here" comment to "clear `_alarms` AFTER the drain"
because the AlarmState records hold the `TimedScriptEvaluator`/`ScriptEvaluator`
delegates that root the emitted assembly — leaving them in `_alarms` after
Dispose was the same root-the-script-forever pattern this finding is about,
just on the engine side rather than the cache side. The `_alarms` clear is
safe after the `Task.WhenAll` drain because that drain guarantees no
background callback is mid-flight. Regression tests added:
`VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly` and
`ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly`
each uses `WeakReference` + bounded `GC.Collect()` to prove the emitted
assembly is reclaimable after `engine.Dispose()`. **Important test pattern
detail:** the alarms test originally failed because its helper was
`async Task<WeakReference>` — async state machines capture locals as
state-struct fields and can keep them alive past the method's apparent end.
Rewrote as a synchronous helper using `LoadAsync(...).GetAwaiter().GetResult()`
inside two cooperating `[MethodImpl(MethodImplOptions.NoInlining)]` helpers
(`CompileAlarmAndCaptureWeak` + `ExtractEmittedAssemblyWeakRef`) so the
intermediate reflection locals die when each helper returns. Test totals
after fix: Core.Scripting 104 green (unchanged); VirtualTags 57 green (was
56 — +1 unload test); ScriptedAlarms 67 green (was 66 — +1 unload test).
+176 -9
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -16,7 +16,7 @@ a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Driver.Cli.Common-001, Driver.Cli.Common-002 |
| 1 | Correctness & logic bugs | Driver.Cli.Common-001, Driver.Cli.Common-002, Driver.Cli.Common-007 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | Driver.Cli.Common-003 |
| 4 | Error handling & resilience | Driver.Cli.Common-004 |
@@ -24,9 +24,47 @@ a category produced nothing rather than leaving it blank.
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Driver.Cli.Common-005 |
| 9 | Testing coverage | Driver.Cli.Common-005, Driver.Cli.Common-008 |
| 10 | Documentation & comments | Driver.Cli.Common-006 |
## Re-review 2026-05-23 (commit `a9be809`)
Delta scope: commit `5a9c459` extends the `FormatStatus` shortlist with five
`Bad*` codes (`BadInternalError` 0x80020000, `BadNotWritable` 0x803B0000,
`BadOutOfRange` 0x803C0000, `BadNotSupported` 0x803D0000, `BadDeviceFailure`
0x80550000) the FOCAS / AbCip / AbLegacy native-protocol mappers emit. Tests
extended with parallel `[InlineData]` rows on the well-known Theory plus a new
`FormatStatus_names_native_driver_emitted_codes` Theory.
Cross-checked the five new hex literals against the OPC Foundation
`Opc.Ua.StatusCodes` table via DeepWiki:
| Name added | Code in shortlist | Spec value | Verdict |
|---|---|---|---|
| `BadInternalError` | `0x80020000` | `0x80020000` | Correct |
| `BadNotWritable` | `0x803B0000` | `0x803B0000` | Correct |
| `BadOutOfRange` | `0x803C0000` | `0x803C0000` | Correct |
| `BadNotSupported` | `0x803D0000` | `0x803D0000` | Correct |
| `BadDeviceFailure` | `0x80550000` | **`0x808B0000`** | **WRONG — `0x80550000` is `BadSecurityPolicyRejected`** |
The `BadDeviceFailure` mismapping is the same shape of bug as the original
Driver.Cli.Common-001 (wrong hex literal copied into the shortlist); recorded
as Driver.Cli.Common-007. The wrong constant also lives in
`FocasStatusMapper.cs`, `AbCipStatusMapper.cs`, `AbLegacyStatusMapper.cs`,
`TwinCATStatusMapper.cs`, `S7Driver.cs`, and `ModbusDriver.cs` — those are in
other modules' review scope but are noted here so future re-reviewers know
this isn't isolated. (`StatusCodeMap.cs` in Driver.Galaxy + the Wonderware
historian mappers use the correct `0x808B0000`, confirming the discrepancy.)
Testing observation: the new `FormatStatus_names_native_driver_emitted_codes`
Theory is fully redundant with the well-known Theory (the five rows were also
added there in the same commit) and uses `ShouldContain` rather than
`ShouldBe` — recorded as Driver.Cli.Common-008.
Other categories (concurrency, security, performance, design-doc adherence,
code organisation, documentation) are unchanged by this delta — no new
issues found.
## Findings
### Driver.Cli.Common-001
@@ -130,7 +168,7 @@ dispose the previous logger if reconfiguration is genuinely intended.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` |
| Status | Open |
| Status | Resolved |
**Description:** `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the
value and status columns) without guarding against empty input. When `tagNames` and
@@ -143,7 +181,13 @@ instead of producing an empty (header-only) table.
separator, or an explicit "no rows" line), or use `DefaultIfEmpty(0).Max(...)` for the
width computations.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `FormatTable` guards each `rows.Max(...)` width
computation with a `rows.Length == 0 ? "<HEADER>".Length : Math.Max(...)` ternary, so
an empty batch read returns the header + separator rows (no data rows) instead of
throwing `InvalidOperationException`. The fix was landed in commit `1433a1c` alongside
the -002 work, and the regression test
`SnapshotFormatterTests.FormatTable_with_empty_input_returns_header_only` (added under
-005) exercises it.
### Driver.Cli.Common-005
@@ -178,7 +222,7 @@ empty-input and `DriverCommandBase` level-selection tests.
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` |
| Status | Open |
| Status | Resolved |
**Description:** Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71`
states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement
@@ -194,4 +238,127 @@ library. The XML doc is stale relative to the shipped driver-CLI set.
right-most and intentionally unpadded rather than claiming fixed width. Add FOCAS to the
`DriverCommandBase` class-summary driver list.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — (1) `SnapshotFormatter.cs:71` comment reworded
to state the source-time column is the right-most one and intentionally not
measured/padded, calling out the null-timestamp `"-"` case explicitly. (2) FOCAS was
added to the `DriverCommandBase` class-summary driver enumeration in commit `7ff356b`
(landed alongside the -003 work).
### Driver.Cli.Common-007
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:129` |
| Status | Resolved |
**Description:** Commit `5a9c459` added `0x80550000u => "BadDeviceFailure"` to the
`FormatStatus` shortlist, but `0x80550000` is the canonical OPC UA spec value for
`BadSecurityPolicyRejected`, not `BadDeviceFailure`. The correct spec value for
`BadDeviceFailure` is `0x808B0000` (verified against the OPC Foundation
`Opc.Ua.StatusCodes` table via DeepWiki; corroborated locally by
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/StatusCodeMap.cs:40`
(`BadDeviceFailure = 0x808B0000u`) and the two Wonderware historian quality mappers,
which all hand-pin the correct value).
This is the same shape of bug Driver.Cli.Common-001 closed: a wrong hex literal
in the shortlist that the test theory (`SnapshotFormatterTests.cs:42`) blindly
asserts against the same wrong value, so the bug is invisible to CI.
Practical impact, two-sided:
1. A driver that returns the real spec `BadDeviceFailure` (`0x808B0000`) — e.g.
the `Driver.Galaxy.StatusCodeMap` path on a deploy-time device fault — falls
through the named shortlist entirely. Since Driver.Cli.Common-002 added the
severity-class fallback, it now renders as `0x808B0000 (Bad)` instead of
`0x808B0000 (BadDeviceFailure)` — operators lose the specific class label
`docs/Driver.FOCAS.Cli.md:153` tells them to read off the output.
2. A driver that returns `0x80550000` (which `FocasStatusMapper`, `AbCipStatusMapper`,
`AbLegacyStatusMapper`, `TwinCATStatusMapper`, `S7Driver`, and `ModbusDriver` all
misuse as "BadDeviceFailure") now renders as `0x80550000 (BadDeviceFailure)`
matching driver intent but contradicting the OPC UA spec, which says any client
that decodes the same payload using the OPC Foundation stack will see
`BadSecurityPolicyRejected`. A security-monitoring tool keying on
`BadSecurityPolicyRejected` will fire on a CPU fault, while real
`BadSecurityPolicyRejected` returns from the secure-channel layer would be
mislabelled as a device fault. Operator-facing CLI output and machine-readable
status semantics disagree.
The deeper bug is the wrong constant in the native-protocol mappers (out of scope
for this module), but the `SnapshotFormatter` shortlist is its own
spec-authoritative reference point — Driver.Cli.Common-001 explicitly framed the
shortlist as canonical, with the in-line "keep [these literals] in sync with [the
Opc.Ua.StatusCodes] table" comment at `SnapshotFormatter.cs:112-113`. That
contract is now broken.
**Recommendation:** Change line 129 to `0x808B0000u => "BadDeviceFailure"`. Update
the matching `[InlineData]` rows in `SnapshotFormatterTests.cs` (line 42 in the
well-known Theory; line 60 in the redundant Theory — see Driver.Cli.Common-008).
Also note in the resolution that the native-protocol mappers (FOCAS / AbCip /
AbLegacy / TwinCAT / S7 / Modbus) need the same fix recorded against their own
module reviews — the constant `0x80550000` should be replaced with `0x808B0000`
everywhere it claims to mean `BadDeviceFailure`. Consider Driver.Cli.Common-001's
original recommendation again: add a CI test that cross-checks every shortlist
entry against `Opc.Ua.StatusCodes` reflection so this class of bug stops
recurring.
**Resolution:** Resolved 2026-05-23 — corrected `SnapshotFormatter.FormatStatus`
to map `0x808B0000u => "BadDeviceFailure"` (was `0x80550000u`). Updated the
`InlineData` row in the well-known Theory accordingly; the redundant native-
emitted Theory was deleted entirely per Driver.Cli.Common-008. Added a regression
row to `FormatStatus_does_not_apply_pre_fix_wrong_names` pinning that
`0x80550000` no longer renders as `BadDeviceFailure` (mirroring the
Driver.Cli.Common-001 wrong-name guards). The underlying constant was also
corrected in all six native-protocol mappers as part of the same commit:
`FocasStatusMapper.BadDeviceFailure`, `AbCipStatusMapper.BadDeviceFailure`,
`AbLegacyStatusMapper.BadDeviceFailure`, `TwinCATStatusMapper.BadDeviceFailure`,
`ModbusDriver.StatusBadDeviceFailure`, `S7Driver.StatusBadDeviceFailure` — all
moved from `0x80550000u` to `0x808B0000u`. The three downstream Modbus tests
(`ModbusExceptionMapperTests` 3 InlineData rows + 1 ShouldBe assertion;
`ExceptionInjectionTests.StatusBadDeviceFailure` constant) updated to expect
the corrected code. **Behavior change:** OPC UA clients consuming the native
drivers now see the canonical `BadDeviceFailure` (0x808B0000) instead of the
misnamed `BadSecurityPolicyRejected` (0x80550000) on device-fault paths —
operator-facing CLI output and machine-readable status semantics now agree.
Suite totals after fix: Driver.Cli.Common.Tests 43 green (was 48 — minus 5
redundant rows); Modbus.Tests 263; AbCip.Tests 262; AbLegacy.Tests 157;
FOCAS.Tests 178; S7.Tests 112; TwinCAT.Tests 131; all green. The Opc.Ua.StatusCodes
cross-check the recommendation suggested is recorded as a follow-up worth
considering but is out of scope for this fix.
### Driver.Cli.Common-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:50-64` |
| Status | Resolved |
**Description:** Commit `5a9c459` adds a new
`FormatStatus_names_native_driver_emitted_codes` `[Theory]` whose five
`[InlineData]` rows are identical to five rows added to the existing
`FormatStatus_names_well_known_status_codes` `[Theory]` in the same commit
(lines 32, 39, 40, 41, 42). The new Theory therefore adds no coverage. It is
also weaker than the Theory it duplicates: it asserts
`output.ShouldContain($"({expectedName})")` (substring match) where the
well-known Theory asserts `output.ShouldBe($"0x{status:X8} ({expectedName})")`
(exact match including the hex prefix). The substring form would not catch a
regression where the hex literal renders wrong but the name is correct.
This is not a correctness problem — both Theories pass — but it's a
copy-paste inconsistency that costs maintainer attention every time someone
reads the test file and wonders which Theory is authoritative.
**Recommendation:** Either (a) delete the new Theory entirely — its five rows
are already covered by the well-known Theory in the same commit — or (b) keep
it but switch to `ShouldBe($"0x{status:X8} ({expectedName})")` so its
assertion strength matches the rest of the file. Option (a) is cleaner: the
commit's "operator workflow" intent is documented well enough in the
well-known Theory comment block; the redundant Theory is dead weight.
**Resolution:** Resolved 2026-05-23 — took option (a): deleted the
`FormatStatus_names_native_driver_emitted_codes` Theory entirely. Its five
`InlineData` rows are covered by the well-known Theory's `ShouldBe` (strict
exact-match assertion), which is the authoritative shortlist test. Landed
alongside the Driver.Cli.Common-007 fix in the same commit.
+59 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -45,7 +45,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Commands/WriteCommand.cs:58-68` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteCommand.ParseValue` parses the numeric `--value` types
(`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse`
@@ -65,7 +65,16 @@ literal — consistent with how `ParseBool` already handles bad boolean input.
The same pattern exists in the sibling S7 CLI; a shared helper in
`Driver.Cli.Common` would fix both.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — wrapped the `ParseValue` numeric switch in
`try/catch (FormatException)` and `try/catch (OverflowException)` that rethrow as
`CliFx.Exceptions.CommandException` with a message naming the `--type` and the
offending value, mirroring the friendly text the `Bit` path already produced.
Added `WriteCommandParseValueTests` with [Theory] cases covering non-numeric
input across `Byte`/`Int16`/`Int32`/`Float32`/`Float64`, overflow edges
(sbyte ±1, short max+1, > int.MaxValue), and an assertion that the exception
message names both the type and the offending value. A shared `Driver.Cli.Common`
helper is the cleaner long-term fix (cross-CLI duplication remains) but is left
to the Driver.Cli.Common review per this module's edit scope.
### Driver.FOCAS.Cli-002
@@ -74,7 +83,7 @@ The same pattern exists in the sibling S7 CLI; a shared helper in
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `Commands/SubscribeCommand.cs:45-51` |
| Status | Open |
| Status | Resolved |
**Description:** The `subscribe` command attaches an `OnDataChange` handler that
calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from
@@ -93,7 +102,15 @@ console writes with a lock shared between the banner and the handler. Optionally
detach the handler in the `finally` block before `ShutdownAsync` for symmetry
with the `handle` teardown already present there.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — introduced a `writeLock` shared between the
`OnDataChange` handler and the banner write so the poll-engine background thread
and the CliFx invocation thread can't interleave partial lines. Added an
explanatory comment above the handler explaining the CliFx-`IConsole` rationale
and the synchronous-on-background-thread design — mirroring the Modbus / S7
copies of this command. Also added a try/catch around the handler body so a
transient stdout error cannot tear down the poll loop, and Serilog-warn-logs the
swallowed exception. Added `SubscribeCommandConsoleHandlerTests` to guard the
`writeLock` + CliFx-`IConsole` rationale against future copy-paste regressions.
### Driver.FOCAS.Cli-003
@@ -102,7 +119,7 @@ with the `handle` teardown already present there.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) |
| Status | Open |
| Status | Resolved |
**Description:** The numeric command options `--cnc-port`, `--timeout-ms`, and
`--interval-ms` are accepted without range validation. A zero or negative
@@ -120,7 +137,17 @@ timeout and interval strictly positive. The same gap exists across the sibling
driver CLIs, so a shared validation helper in `Driver.Cli.Common` is the
cleaner fix.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added a protected `ValidateOptions(int?
intervalMs = null)` helper on `FocasCommandBase` that rejects `--cnc-port`
outside `1..65535`, non-positive `--timeout-ms`, and non-positive
`--interval-ms` (when the caller passes one) with a `CliFx.Exceptions.CommandException`
naming the option and the rejected value. `ProbeCommand` / `ReadCommand` /
`WriteCommand` call `ValidateOptions()` without an interval, `SubscribeCommand`
calls `ValidateOptions(IntervalMs)`. Added `FocasCommandBaseValidationTests`
covering accept-defaults, reject out-of-range port (0, -1, 65536), reject
non-positive timeout / interval, and skip-interval-when-omitted. A shared
helper in `Driver.Cli.Common` is the cleaner cross-CLI fix and is recorded
against that module's review.
### Driver.FOCAS.Cli-004
@@ -129,7 +156,7 @@ cleaner fix.
| Severity | Low |
| Category | Performance & resource management |
| Location | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` |
| Status | Open |
| Status | Resolved |
**Description:** Every command declares `await using var driver = new FocasDriver(...)`
**and** explicitly calls `await driver.ShutdownAsync(CancellationToken.None)` in
@@ -144,7 +171,14 @@ dead weight and obscures intent: a reader cannot tell whether the explicit
and rely on `await using` for disposal, or drop `await using` and keep the
explicit teardown — but not both. The same redundancy exists in the sibling CLIs.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — dropped the explicit
`await driver.ShutdownAsync(CancellationToken.None)` calls from the `finally`
blocks of `ProbeCommand`, `ReadCommand`, `WriteCommand`, and `SubscribeCommand`;
`await using` is now the sole driver-disposal mechanism per command
(`FocasDriver.DisposeAsync` itself runs `ShutdownAsync`). The subscribe command
keeps `UnsubscribeAsync` in its finally because that is a subscription-lifecycle
concern, not driver disposal. Added `CommandDisposalConventionsTests` to guard
the source-level convention against regression.
### Driver.FOCAS.Cli-005
@@ -153,7 +187,7 @@ explicit teardown — but not both. The same redundancy exists in the sibling CL
| Severity | Low |
| Category | Design-document adherence |
| Location | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) |
| Status | Open |
| Status | Resolved |
**Description:** `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and
`BadCommunicationError` as the key diagnostic signals an operator reads off
@@ -180,4 +214,18 @@ actually emit — at minimum `BadNotWritable`, `BadOutOfRange`, `BadNotSupported
because the gap defeats this module documented `probe`/`write` diagnostic
workflow; cross-reference the `Driver.Cli.Common` review.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the cross-CLI fix landed in `Driver.Cli.Common`:
`SnapshotFormatter.FormatStatus` now names `BadInternalError` (0x80020000),
`BadNotWritable` (0x803B0000), `BadOutOfRange` (0x803C0000), `BadNotSupported`
(0x803D0000), and `BadDeviceFailure` (0x80550000) — the five codes the FOCAS /
AbCip / AbLegacy native-protocol mappers all emit but the shortlist previously
left unnamed (the canonical `BadTimeout` 0x800A0000 was already added under
Driver.Cli.Common-001). FOCAS `probe` / `write` against a non-writable parameter,
out-of-range address, unsupported function, busy device, or CNC-handle failure
now renders with the named status the `docs/Driver.FOCAS.Cli.md` workflow
promises, restoring parity between the docs and the shipped behaviour. Regression
`[Theory]` `FormatStatus_names_native_driver_emitted_codes` added to
`SnapshotFormatterTests` so the five names can't silently drop out of the
shortlist again; the existing well-known shortlist `[Theory]` was extended with
the same five entries to enforce the exact `0x... (Name)` rendering. Suite now
47 green (was 42).
+156 -6
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 0 |
@@ -17,12 +17,39 @@
| 2 | OtOpcUa conventions | Driver.Galaxy-005 |
| 3 | Concurrency & thread safety | Driver.Galaxy-006, Driver.Galaxy-007 |
| 4 | Error handling & resilience | Driver.Galaxy-001, Driver.Galaxy-008, Driver.Galaxy-009 |
| 5 | Security | Driver.Galaxy-010 |
| 6 | Performance & resource management | Driver.Galaxy-011, Driver.Galaxy-012 |
| 7 | Design-document adherence | Driver.Galaxy-013 |
| 5 | Security | Driver.Galaxy-010, Driver.Galaxy-015 |
| 6 | Performance & resource management | Driver.Galaxy-011, Driver.Galaxy-012, Driver.Galaxy-016 |
| 7 | Design-document adherence | Driver.Galaxy-013, Driver.Galaxy-017 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Driver.Galaxy-014 |
| 10 | Documentation & comments | Driver.Galaxy-005, Driver.Galaxy-013 |
| 10 | Documentation & comments | Driver.Galaxy-005, Driver.Galaxy-013, Driver.Galaxy-018 |
## Re-review 2026-05-23 (commit `a9be809`)
The only code-affecting change since `76d35d1` was commit `994997b` — the
sibling `mxaccessgw` repo restructured (the `clients/dotnet/MxGateway.Client`
project path and the `MxGateway.Contracts.Proto` namespace both moved), and
the driver's path-based `ProjectReference` started producing 87 build errors
solution-wide. The fix is build-time only: the broken `ProjectReference` was
replaced with `<Reference HintPath="libs\…">` items pointing at vendored
binary copies of `MxGateway.Client.dll` (99 KB, May 2026 known-good build)
and `MxGateway.Contracts.dll` (490 KB), and five `PackageReference`s that
the dropped project was previously providing transitively (`Google.Protobuf`,
`Grpc.Core.Api`, `Grpc.Net.Client`, `Microsoft.Extensions.Logging.Abstractions`,
`Polly`) were declared explicitly. The matching `Tests` csproj got the same
binary `<Reference>` for `MxGateway.Contracts` (replacing its own broken
`ProjectReference`). A `libs/README.md` documents what is vendored and the
two unwinding paths (sibling restores a client library, or driver migrates
to the new `ZB.MOM.WW.MxGateway.Contracts.Proto` namespace + reimplements
the `MxGatewayClient` / `MxGatewaySession` / `GalaxyRepositoryClient`
wrapper, ~2,200 LoC).
No `*.cs` file changed; the re-review walked only the categories that apply
to a build-time/packaging change. Categories with no new findings:
Correctness (1), OtOpcUa conventions (2), Concurrency (3), Error handling
(4), Code organization (8), Testing coverage (9). Four new findings are
recorded below (Driver.Galaxy-015..018) — none Critical, none High; two
Medium, two Low.
## Findings
@@ -235,3 +262,126 @@
**Recommendation:** Add unit/parity tests covering: (a) stream fault -> supervisor reopen -> EventPump restart -> `OnDataChange` resumes; (b) `ReplayAsync` updates `SubscriptionRegistry` with new handles; (c) `StatusCodeMap.FromMxStatus` for both success and failure `MxStatusProxy` rows; (d) `DataTypeMap` for every Galaxy `mx_data_type` code including 64-bit integer.
**Resolution:** Resolved 2026-05-22 — added `GalaxyDriverInfrastructureTests` covering `GetMemoryFootprint` (Driver.Galaxy-011) and `IAsyncDisposable` (Driver.Galaxy-007); (a) stream-fault → supervisor reopen → EventPump restart → `OnDataChange` resumes is covered by `EventPumpStreamFaultTests.StreamFault_DrivesReconnectSupervisorReopenReplay` and `FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch` (landed with Driver.Galaxy-001/008 resolution); (b) post-reconnect `ReplayAsync` rebinds handles is covered by `SubscriptionRegistryTests.Rebind_*` suite; (c) `StatusCodeMap.FromMxStatus` success/failure rows are covered by `StatusCodeMapTests.FromMxStatus_SuccessNonZeroAndCategoryOk_IsGood` and `FromMxStatus_SuccessNonZeroButCategoryNotOk_IsNotGood` (landed with Driver.Galaxy-003); (d) `DataTypeMap` for all seven mx_data_type codes including Int64 is covered by `DataTypeMapTests` (landed with Driver.Galaxy-002).
### Driver.Galaxy-015
| Field | Value |
|---|---|
| Severity | ~~Medium~~ Low (re-triaged 2026-05-23) |
| Category | ~~Security~~ Documentation & comments (re-triaged 2026-05-23) |
| Location | `libs/MxGateway.Client.dll`, `libs/MxGateway.Contracts.dll`, `libs/README.md` |
| Status | Resolved |
**Description:** Commit `994997b` checks in two binary DLLs (`MxGateway.Client.dll`, 99 840 bytes; `MxGateway.Contracts.dll`, 489 984 bytes) under `src/Drivers/.../Driver.Galaxy/libs/` and references them via `<Reference HintPath="…" />`. These are the only checked-in binary build artefacts in the entire repo (a repo-wide `find` for non-`bin/`/`obj/` `*.dll` under `libs/` returns only these two), so the change sets a precedent. The accompanying `libs/README.md` states the DLLs are "byte-for-byte the build output" of the OtOpcUa team's own code against the gateway's open proto contracts, but there is no recorded provenance — no source-commit SHA from the sibling `mxaccessgw` repo that produced the build, no SHA-256/SHA-512 checksum, no `.gitattributes` rule marking these paths as binary (so a future churn-in-place will balloon the pack file). Without a recorded source commit + checksum it is impossible for a future reviewer/auditor to verify the binaries match a specific revision of the sibling repo — the assertion "we built them, not external" is unverifiable after the fact. Tampering or accidental swap (e.g. someone drops in a different DLL of the same name under the same path) would not be detectable.
**Recommendation:** (a) Pin the source provenance: add the sibling `mxaccessgw` commit SHA used to build each DLL to `libs/README.md`. (b) Record a SHA-256 of each `.dll` in `libs/README.md` so a future tamper or accidental update is detectable by running `Get-FileHash`/`sha256sum`. (c) Add a `.gitattributes` rule under `libs/` declaring `*.dll binary` (and consider `filter=lfs diff=lfs merge=lfs -text` if/when these need to be updated, to avoid bloating the pack file on every refresh). (d) Optional: a `dotnet test` time-check that compares the on-disk hash to the recorded hash, so a CI run notices if the file drifts from what the README claims.
**Resolution:** Resolved 2026-05-23. **Severity re-triage:** the original
finding framed this as a security concern about "tampering or accidental
swap by an unknown third party"; the user clarified that the DLLs are
their own code, built from their own `mxaccessgw` project — not third-party
binaries. That moves the concern from security (untrusted provenance) to
documentation (audit trail). Re-classified as Low Documentation &
Comments. Fix: `libs/README.md` now carries a Provenance section that
records the source-commit SHA (`dd7ca1634e2d2b8a866c81f0009bf87ee9427750`,
extracted from the `AssemblyInformationalVersion` baked into both DLLs by
the original build) and SHA-256 checksums of both binaries, plus a
re-verification recipe (`sha256sum libs/*.dll` + `ilspycmd <dll> | grep
AssemblyInformationalVersion`). Recommendations (c) `.gitattributes` and
(d) CI hash-check deferred — the DLLs are essentially frozen until one
of the two unwinding paths is taken, so adding LFS or a CI guard would
add infrastructure that the unwinding step would then have to remove.
Re-open if the vendoring becomes a recurring update target.
### Driver.Galaxy-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47`, `libs/README.md:32-37` |
| Status | Resolved |
**Description:** The five new `PackageReference` versions declared in the csproj (`Google.Protobuf` 3.34.1, `Grpc.Core.Api` 2.76.0, `Grpc.Net.Client` 2.71.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.0, `Polly` 8.5.2) do not all match what the vendored `MxGateway.Client.dll` was built against. The DLL's PE metadata (extracted via `System.Reflection.Metadata`) shows references to `Grpc.Net.Client v2.0.0.0`, `Microsoft.Extensions.Logging.Abstractions v10.0.0.0`, and notably `Polly.Core v8.0.0.0` — and the source csproj just before the sibling-repo rename (commit `bd4a09a` from 2026-04-27) declared `Grpc.Net.Client` 2.76.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.7, and `Polly.Core` 8.6.6 — *not* the meta-package `Polly`. Our driver pulls `Polly` 8.5.2 (which transitively pins `Polly.Core` 8.5.2 per its nuspec dependency), so the vendored client actually loads `Polly.Core` 8.5.2 at runtime against code compiled against 8.6.6. Across an 8.5 ↔ 8.6 minor delta this is usually safe (assembly-version is `v8.0.0.0` for both), but it is exactly the skew shape that surfaces as `MissingMethodException` if a 8.6-only API was used in the client. `libs/README.md` claims "versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj` uses so the gRPC + proto runtime stays binary-compatible" — that statement is correct only for `Google.Protobuf` and `Grpc.Core.Api`; the other three packages do not match.
**Recommendation:** Reconcile the declared package versions with what the vendored DLLs were built against — bump to `Grpc.Net.Client` 2.76.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.7, swap `Polly` for `Polly.Core` 8.6.6 (the driver does not import the `Polly` legacy v7 surface, only Polly.Core via the client). Alternatively, rebuild the vendored DLLs against the same versions the csproj declares and refresh the binaries. Update `libs/README.md` to record the exact versions the DLLs were built against, so the next vendoring refresh has an authoritative reference.
**Resolution:** Resolved 2026-05-23 — took the first option (reconcile
declared packages with what the DLL was built against, verified by
reflecting `Assembly.GetReferencedAssemblies()` on `MxGateway.Client.dll`).
Changes to the csproj: **`Polly` 8.5.2 → `Polly.Core` 8.6.6** (the most
consequential — `Polly` (v7 fluent API) and `Polly.Core` (v8 resilience-
pipeline API) are different packages, and the DLL was built against
`Polly.Core`; the prior `Polly` reference would have failed at runtime
with `MissingMethodException` the first time the gateway client's retry
pipeline ran). Also bumped `Grpc.Net.Client` 2.71.0 → 2.76.0 and
`Microsoft.Extensions.Logging.Abstractions` 10.0.0 → 10.0.7 to match the
sibling Server/Worker projects' current versions. `Google.Protobuf`
3.34.1 and `Grpc.Core.Api` 2.76.0 already matched; left unchanged.
`libs/README.md` rewritten to record what was actually verified
(`Assembly.GetReferencedAssemblies()` output + the resolved package
versions, including the sibling Server/Worker csproj as the version
source-of-truth — the deleted MxGateway.Client.csproj would have been
the original source but no longer exists). Verification: solution-wide
`dotnet build` clean, Driver.Galaxy.Tests 245/245 pass against the
corrected package set.
### Driver.Galaxy-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (no source change), gateway proto contract |
| Status | Deferred |
**Description:** The vendored `MxGateway.Contracts.dll` only carries the OLD `MxGateway.Contracts.Proto[.Galaxy]` namespace (PE-namespace dump confirms — `MxGateway.Client`, `MxGateway.Contracts`, `MxGateway.Contracts.Proto`, `MxGateway.Contracts.Proto.Galaxy` only). The sibling `mxaccessgw` repo's live `Protos/mxaccess_gateway.proto`, `mxaccess_worker.proto`, and `galaxy_repository.proto` files now generate into `ZB.MOM.WW.MxGateway.Contracts.Proto.*`. The proto wire format itself can still evolve (new RPCs, renamed fields, removed fields) and the driver has no contract-version handshake (a repo-wide search for `ContractVersion|ProtocolVersion|ApiVersion|WireVersion` in the driver returns nothing) — so a gateway service that evolves its proto past what the vendored client knows will fail silently at runtime: gRPC `UNIMPLEMENTED` for a renamed RPC, default-value reads for a removed scalar field, or worse, a wire-tag collision if a field number is reused. The risk surface grew with vendoring: previously the `ProjectReference` would have hard-failed at build time if the proto changed shape; now the driver builds green against a frozen contract that may not match the running gateway.
**Recommendation:** (a) Add a single `Ping`/`GetVersion` RPC call at gateway-session open, comparing the gateway's reported contract version against a string baked into `libs/README.md` (or a `GatewayContractVersion` const) and refusing the session on mismatch with a clear log. (b) Document in `libs/README.md` the exact mxaccessgw commit SHA (and proto-file SHA-256s) the vendored DLLs were built from, so a parity-rig operator can grep the live gateway for the matching commit. (c) Add a soak/parity test that asserts the live gateway's proto descriptor still matches what the vendored DLL expects — fail loud rather than degrade.
**Resolution:** Deferred 2026-05-23 — the recommendation's part (b)
(record the mxaccessgw source-commit SHA in `libs/README.md`) is satisfied
by the Driver.Galaxy-015 resolution, which records both DLLs were built
from mxaccessgw commit `dd7ca1634e2d2b8a866c81f0009bf87ee9427750`. Parts
(a) and (c) — adding a `GetVersion` RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial new
RPC + plumbing work that is not in scope for this code-review-resolution
sweep. The risk surface is bounded because either of the two unwinding
paths in `libs/README.md` (sibling repo restores `MxGateway.Client.csproj`,
or this driver migrates to the new namespace) will move the codebase
past the vendoring + close this concern naturally. Re-open if neither
unwinding path is taken within the next quarter and the live gateway
service does evolve its proto under the driver.
### Driver.Galaxy-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `libs/README.md:32-37`, `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47` |
| Status | Resolved |
**Description:** Several small documentation issues in the vendoring artefacts:
1. `libs/README.md` says "Versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj` uses" — but `ZB.MOM.WW.MxGateway.Contracts.csproj` only declares `Google.Protobuf` 3.34.1 and `Grpc.Core.Api` 2.76.0; the other three packages (`Grpc.Net.Client`, `Microsoft.Extensions.Logging.Abstractions`, `Polly`) come from the (now-deleted) `MxGateway.Client.csproj`, not the contracts csproj. The README points at the wrong source-of-truth file. See Driver.Galaxy-016 for the related version-skew issue.
2. `libs/README.md` says the DLLs "are built against net10.0" — accurate, but the README should also pin the source-commit SHA from `mxaccessgw` that produced the build (currently no such reference). Without it, "May 2026" is the only locator and a future refresh has no fixed point to roll back to.
3. The two `<Reference>` items in the csproj omit `<SpecificVersion>false</SpecificVersion>`. The vendored DLLs carry `AssemblyVersion 1.0.0.0`; MSBuild's default for `<Reference HintPath>` items is `SpecificVersion=true` only when the `Include` attribute contains version info, which it does not here, so this is benign — but spelling it out (`<SpecificVersion>false</SpecificVersion>`) would make a future refresh that bumps the AssemblyVersion robust without csproj edits.
4. The csproj `<Reference Include="MxGateway.Client">` value relies on the bare assembly simple-name; an explicit `<Reference Include="MxGateway.Client, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null">` plus `<SpecificVersion>false</SpecificVersion>` would document the contract surface inside the csproj where a reviewer reads it.
**Recommendation:** (a) Update `libs/README.md` to (i) point at `MxGateway.Client.csproj` for the `Grpc.Net.Client`/`Microsoft.Extensions.Logging.Abstractions`/`Polly` version source, (ii) record the mxaccessgw commit SHA the vendored binaries were built from, and (iii) record SHA-256 hashes (see Driver.Galaxy-015). (b) Add `<SpecificVersion>false</SpecificVersion>` to both `<Reference>` items in the csproj to make the intent explicit and refresh-robust.
**Resolution:** Resolved 2026-05-23 — most of (a) was addressed alongside
Driver.Galaxy-015 + -016: `libs/README.md` rewritten to (i) point at the
sibling Server/Worker csproj as the live version source-of-truth (the
`MxGateway.Client.csproj` cited in the recommendation no longer exists —
the deleted-csproj reference would not have been actionable for a
future reader), (ii) record source commit
`dd7ca1634e2d2b8a866c81f0009bf87ee9427750`, and (iii) record SHA-256
checksums of both vendored DLLs. (b) `<SpecificVersion>false</SpecificVersion>`
was intentionally NOT added — the vendored DLL's AssemblyVersion is
`1.0.0.0` and MSBuild's default for `<Reference HintPath>` Include="bare-name"
items is already `SpecificVersion=false`, so the spelling-it-out
recommendation would be cosmetic without changing behaviour. If the
vendored DLLs are ever refreshed against a build with a different
`AssemblyVersion` the explicit attribute could be added then; for now
the existing csproj works correctly.
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -92,7 +92,7 @@ dead-lettered. Until then, document explicitly that this writer never produces
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
| Status | Open |
| Status | Resolved |
**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
@@ -106,7 +106,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
`RecordFailure`) so all six health fields share a single lock.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced the mixed `Interlocked.Increment(ref _totalQueries)` + `_healthLock`-protected outcome counters with a single `RecordOutcome(bool success, string? error)` helper that increments `_totalQueries` and exactly one of `_totalSuccesses` / `_totalFailures` under one `_healthLock` acquisition; `GetHealthSnapshot` documents the invariant that `TotalSuccesses + TotalFailures == TotalQueries` at every observed snapshot. Added the regression test `GetHealthSnapshot_ConcurrentCallsAndReads_CountersAreInternallyConsistent` that runs a polling reader concurrently with 50 calls and asserts the invariant never breaks (fails red against the previous code, passes green now).
### Driver.Historian.Wonderware.Client-004
@@ -115,7 +115,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `WonderwareHistorianClient.cs:203-267` |
| Status | Open |
| Status | Resolved |
**Description:** A sidecar-reported failure is recorded in two non-atomic steps under
separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
@@ -132,7 +132,7 @@ sidecar-level `Success` flag has been checked, or pass the reply success/error i
single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
counters under one lock acquisition.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — eliminated the `RecordSuccess``ReclassifySuccessAsFailure` undo dance. `InvokeAsync` now takes a `Func<TReply, (bool ok, string? error)>` evaluator, evaluates it once when the transport reply lands, and calls `RecordOutcome(bool success, string? error)` exactly once per call under a single `_healthLock` acquisition. A sidecar-reported failure is now classified as a failure on its first and only counter update — no transient "success then undo" state is observable. The read-side `InvokeAndClassifyAsync` wrapper preserves the prior `InvalidOperationException` throw on sidecar failure. Added regression test `GetHealthSnapshot_SidecarFailure_NeverInflatesSuccessCounter` pinning `TotalSuccesses=0`/`TotalFailures=1` after a sidecar-error call.
### Driver.Historian.Wonderware.Client-005
@@ -167,7 +167,7 @@ the reader.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
| Status | Open |
| Status | Resolved |
**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
otherwise propagates. The options expose `ReconnectInitialBackoff` and
@@ -182,7 +182,7 @@ or the options are dead config that misleads operators.
path, or remove the two unused option fields and their XML docs and state plainly that
retry/backoff is owned by the caller (the alarm drain worker / history router).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — removed the dead `ReconnectInitialBackoff`/`ReconnectMaxBackoff` fields (and their `Effective*` accessors) from `WonderwareHistorianClientOptions` and added a `<remarks>` block stating that retry/backoff is owned by the caller (the alarm drain worker and the read-side history router) and that the channel itself performs exactly one in-place reconnect with no delay. Confirmed no consumer referenced the removed fields (only `code-reviews/` references remain). Solution-level build clean — Server picks up the new options shape without change.
### Driver.Historian.Wonderware.Client-007
@@ -218,7 +218,7 @@ deserializing.
| Severity | Low |
| Category | Security |
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
| Status | Open |
| Status | Resolved |
**Description:** The csproj suppresses two NuGet audit advisories
(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
@@ -232,7 +232,7 @@ advisory title, why it does not apply to this module usage, and a revisit trigge
follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
can be dropped.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the suppression block in the csproj (already added under finding 007) records each advisory title (GHSA-37gx-xxp4-5rgx unsafe-dynamic-codegen, GHSA-w3x6-4m5h-cxqf typeless-resolver gadget chain), why neither applies to this module (default `StandardResolver` only, no `TypelessContractlessStandardResolver` / `DynamicUnion` / `DynamicGenericResolver`, plus the 64 KiB per-sample ValueBytes cap in `DeserializeSampleValue` from finding 007), and the revisit trigger ("Revisit once MessagePack 3.x is available and drop these suppressions at that time"). All three pieces the recommendation asked for are present; the single comment block above both `NuGetAuditSuppress` entries was confirmed to satisfy the audit-trail gap.
### Driver.Historian.Wonderware.Client-009
@@ -272,7 +272,7 @@ silent `[Key]` drift between the two duplicated contract sets is caught at build
| Severity | Low |
| Category | Documentation & comments |
| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
| Status | Open |
| Status | Resolved |
**Description:** Two doc/behaviour mismatches.
(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
@@ -291,4 +291,4 @@ node concept. The collapse is reasonable but undocumented.
short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
connection flags to one transport and does not track per-node health.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — (1) reworded the `Dispose()` XML comment to drop the "non-blocking" claim and instead state that the bridge is **deadlock-safe** because the cleanup never awaits a captured `SynchronizationContext` nor takes any lock the caller could hold, while acknowledging that `NamedPipeClientStream` teardown can block briefly on OS handle release. (2) Added a full `<summary>` + `<remarks>` block to `GetHealthSnapshot` explaining the single-channel collapse — both `ProcessConnectionOpen` and `EventConnectionOpen` report the same channel state, and `ActiveProcessNode`/`ActiveEventNode`/`Nodes` are intentionally null/empty because the client has no per-node telemetry. The remarks also pin the finding-003/004 invariant `TotalSuccesses + TotalFailures == TotalQueries`.
+55 -44
View File
@@ -12,26 +12,26 @@ Each module's `findings.md` is the source of truth; this file is generated from
|---|---|---|---|---|---|---|
| [Admin](Admin/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 10 |
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 11 |
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 10 |
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 13 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 16 |
| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Driver.AbCip](Driver.AbCip/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 15 |
| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Driver.AbLegacy](Driver.AbLegacy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 6 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 8 |
| [Driver.FOCAS](Driver.FOCAS/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 5 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 14 |
| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 5 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 18 |
| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 10 |
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 10 |
| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.Modbus.Addressing](Driver.Modbus.Addressing/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 9 |
| [Driver.Modbus.Cli](Driver.Modbus.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
@@ -46,39 +46,7 @@ Each module's `findings.md` is the source of truth; this file is generated from
Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.CLI-002 | Low | Correctness & logic bugs | `Commands/SubscribeCommand.cs:129-137` | The summary computes `neverWentBad` as every target whose node-id key is absent from the `everBad` dictionary. A node that received no update at all is also absent from `everBad`, so it is counted in `neverWentBad` and printed under the he… |
| Client.CLI-003 | Low | Correctness & logic bugs | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` | Numeric command options accept any value with no range validation. `--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be supplied as `0` or a negative number. A negative `--depth`/`--max-depth` silently d… |
| Client.CLI-004 | Low | OtOpcUa conventions | `Commands/SubscribeCommand.cs:13-37` | `SubscribeCommand` is the only command in the module whose constructor and all `[CommandOption]` properties have no XML doc comments. Every other command (`ConnectCommand`, `ReadCommand`, `WriteCommand`, `BrowseCommand`, `AlarmsCommand`, `… |
| Client.CLI-006 | Low | Error handling & resilience | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` | Operator input-format errors surface as raw .NET exceptions rather than clean CLI errors. An unparseable start/end value throws `FormatException` straight out of `DateTime.Parse`; an invalid node id throws `FormatException`/`ArgumentExcept… |
| Client.CLI-007 | Low | Performance & resource management | `CommandBase.cs:112-123` | `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a logger, and assigns it to the static `Log.Logger` without disposing the previously assigned logger. For a single CLI invocation this leaks at most one logger and the… |
| Client.CLI-008 | Low | Documentation & comments | `docs/Client.CLI.md:158-217` | `docs/Client.CLI.md` is stale relative to the code at this commit. (1) The `subscribe` command section documents only `-n` and `-i`, but the code (`SubscribeCommand`) also exposes `-r/--recursive`, `--max-depth`, `-q/--quiet`, `--duration`… |
| Client.CLI-009 | Low | Code organization & conventions | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` | Both long-running commands attach an event handler (`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach it. Because the handler closes over `console`, the captured console and the closure remain refere… |
| Client.CLI-010 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` | The new `SubscribeCommand` capabilities are largely untested. The four `SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel, disconnect-in-finally, and the subscription message. There is no test for the `--recurs… |
| Client.Shared-003 | Low | Correctness & logic bugs | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` | `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a servic… |
| Client.Shared-004 | Low | OtOpcUa conventions | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` | `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchron… |
| Client.Shared-009 | Low | Error handling & resilience / Documentation & comments | `OpcUaClientService.cs:302-322` | `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAs… |
| Client.Shared-010 | Low | Performance & resource management | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` | `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call p… |
| Client.Shared-011 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` | The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race… |
| Client.UI-003 | Low | OtOpcUa conventions | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` | The csproj references `Serilog` and `Serilog.Sinks.Console`, and `docs/Client.UI.md` lists Serilog as the logging technology, but no source file in the module uses Serilog. `Program.BuildAvaloniaApp()` uses Avalonia's `LogToTrace()` and th… |
| Client.UI-004 | Low | OtOpcUa conventions | `Views/MainWindow.axaml.cs:125-138` | `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is obsolete in Avalonia 11.x (the version pinned in the csproj). The supported replacement is the `StorageProvider` API (`StorageProvider.OpenFolderPickerAsync`). Using the obsolete… |
| Client.UI-006 | Low | Error handling & resilience | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` | Many catch blocks swallow exceptions silently with an empty body and only a comment (`// Redundancy info not available`, `// Subscribe failed`, `// Subscription failed; no item added`, and others). When a subscribe, alarm-subscribe, or red… |
| Client.UI-009 | Low | Design-document adherence | `ViewModels/HistoryViewModel.cs:44-54` | `HistoryViewModel.AggregateTypes` exposes eight entries: `null` (Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`. `docs/Client.UI.md` ("Query Options" table) lists only "Raw (default), Average, Minimum, Maxi… |
| Client.UI-010 | Low | Code organization & conventions | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` | `DateTimeRangePicker` declares `MinDateTimeProperty` / `MaxDateTimeProperty` styled properties with public CLR accessors, but neither is read anywhere in the control. `TryParseDateTime`, `OnStartLostFocus`, and `OnEndLostFocus` never clamp… |
| Client.UI-011 | Low | Documentation & comments | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` | The certificate-store-path `TextBox` watermark reads `(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208 folder name. Per `CLAUDE.md` / `docs/Client.UI.md` the canonical path is now `{LocalAppData}/OtOpcUaClient/`… |
| Driver.Cli.Common-004 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` | `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the value and status columns) without guarding against empty input. When `tagNames` and `snapshots` are both empty (equal length, so the mismatch check at line 56 passes),… |
| Driver.Cli.Common-006 | Low | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` | Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71` states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement needed" — true only when every snapshot has a non-null `SourceTimestampUtc`. `For… |
| Driver.FOCAS.Cli-001 | Low | Error handling & resilience | `Commands/WriteCommand.cs:58-68` | `WriteCommand.ParseValue` parses the numeric `--value` types (`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse` / etc. These throw raw `FormatException` or `OverflowException` for malformed or out-of-range inpu… |
| Driver.FOCAS.Cli-002 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:45-51` | The `subscribe` command attaches an `OnDataChange` handler that calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from the driver's `PollGroupEngine` tick thread, while the command's main flow writes the "Subscribe… |
| Driver.FOCAS.Cli-003 | Low | Error handling & resilience | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) | The numeric command options `--cnc-port`, `--timeout-ms`, and `--interval-ms` are accepted without range validation. A zero or negative `--cnc-port` produces an invalid `focas://host:<n>` string; `--timeout-ms 0` yields a zero `TimeSpan` o… |
| Driver.FOCAS.Cli-004 | Low | Performance & resource management | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` | Every command declares `await using var driver = new FocasDriver(...)` |
| Driver.FOCAS.Cli-005 | Low | Design-document adherence | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) | `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and `BadCommunicationError` as the key diagnostic signals an operator reads off `probe` / `write` output ("A `BadCommunicationError` means ... `BadDeviceFailure` after a successful co… |
| Driver.Historian.Wonderware.Client-003 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` | `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but read inside `GetHealthSnapshot` under `_healthLock`, and every other counter (`_totalSuccesses`, `_totalFailures`, `_consecutiveFailures`) is mutated only under `_hea… |
| Driver.Historian.Wonderware.Client-004 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:203-267` | A sidecar-reported failure is recorded in two non-atomic steps under separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the caller calls `ThrowIfFailed` which calls `ReclassifySuccessAsFailure()` (line 256), d… |
| Driver.Historian.Wonderware.Client-006 | Low | Error handling & resilience | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` | `PipeChannel.InvokeAsync` retries exactly once on transport failure and otherwise propagates. The options expose `ReconnectInitialBackoff` and `ReconnectMaxBackoff` and `WonderwareHistorianClientOptions` documents them as exponential backo… |
| Driver.Historian.Wonderware.Client-008 | Low | Security | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` | The csproj suppresses two NuGet audit advisories (`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency with no inline comment recording why the suppression is safe, who reviewed it, or when it should be re… |
| Driver.Historian.Wonderware.Client-010 | Low | Documentation & comments | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` | Two doc/behaviour mismatches. (1) The `Dispose()` XML comment asserts the underlying channel async cleanup is non-blocking so the `GetAwaiter()/GetResult()` bridge is safe. `PipeChannel.DisposeAsync` calls `ResetTransport()`, which invokes… |
_No pending findings._
## Closed findings
@@ -107,6 +75,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.AlarmHistorian-006 | High | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:103,135-216` |
| Core.ScriptedAlarms-001 | High | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:175`, `ScriptedAlarmEngine.cs:178`, `ScriptedAlarmEngine.cs:73`, `ScriptedAlarmEngine.cs:368` |
| Core.Scripting-002 | High | Resolved | Security | `ForbiddenTypeAnalyzer.cs:70` |
| Core.Scripting-012 | High | Resolved | Security | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
| Core.VirtualTags-001 | High | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:306` |
| Driver.AbCip-001 | High | Resolved | Correctness & logic bugs | `AbCipDriver.cs:111`, `AbCipDriver.cs:163-167` |
| Driver.AbCip-002 | High | Resolved | Correctness & logic bugs | `AbCipStatusMapper.cs:65-78` |
@@ -115,6 +84,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.AbLegacy-001 | High | Resolved | Correctness & logic bugs | `AbLegacyAddress.cs:54`, `AbLegacyDriver.cs:368-374` |
| Driver.AbLegacy-006 | High | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:107-158`, `AbLegacyDriver.cs:162-234`, `LibplctagLegacyTagRuntime.cs` |
| Driver.Cli.Common-001 | High | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:106-119` |
| Driver.Cli.Common-007 | High | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:129` |
| Driver.FOCAS-001 | High | Resolved | Correctness & logic bugs | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` |
| Driver.FOCAS-002 | High | Resolved | Correctness & logic bugs | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` |
| Driver.Galaxy-002 | High | Resolved | Correctness & logic bugs | `Browse/DataTypeMap.cs:13`, `Runtime/MxValueDecoder.cs:9` |
@@ -181,6 +151,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.Scripting-004 | Medium | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:73` |
| Core.Scripting-007 | Medium | Resolved | Error handling & resilience | `TimedScriptEvaluator.cs:60` |
| Core.Scripting-010 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
| Core.Scripting-013 | Medium | Resolved | Security | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
| Core.Scripting-014 | Medium | Resolved | Concurrency & thread safety | `CompiledScriptCache.cs:91-103` (`Clear`) |
| Core.Scripting-016 | Medium | Resolved | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
| Core.VirtualTags-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
| Core.VirtualTags-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
| Core.VirtualTags-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
@@ -218,6 +191,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.Galaxy-009 | Medium | Resolved | Error handling & resilience | `GalaxyDriver.cs:354-371` |
| Driver.Galaxy-011 | Medium | Resolved | Performance & resource management | `GalaxyDriver.cs:411` |
| Driver.Galaxy-014 | Medium | Resolved | Testing coverage | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
| Driver.Galaxy-016 | Medium | Resolved | Performance & resource management | `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47`, `libs/README.md:32-37` |
| Driver.Historian.Wonderware-002 | Medium | Resolved | Correctness and logic bugs | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
| Driver.Historian.Wonderware-003 | Medium | Resolved | Correctness and logic bugs | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
| Driver.Historian.Wonderware-006 | Medium | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:120-128` |
@@ -273,6 +247,25 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Analyzers-004 | Low | Resolved | Performance & resource management | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` |
| Analyzers-005 | Low | Resolved | Design-document adherence | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` |
| Analyzers-007 | Low | Resolved | Documentation & comments | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` |
| Client.CLI-002 | Low | Resolved | Correctness & logic bugs | `Commands/SubscribeCommand.cs:129-137` |
| Client.CLI-003 | Low | Resolved | Correctness & logic bugs | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` |
| Client.CLI-004 | Low | Resolved | OtOpcUa conventions | `Commands/SubscribeCommand.cs:13-37` |
| Client.CLI-006 | Low | Resolved | Error handling & resilience | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` |
| Client.CLI-007 | Low | Resolved | Performance & resource management | `CommandBase.cs:112-123` |
| Client.CLI-008 | Low | Resolved | Documentation & comments | `docs/Client.CLI.md:158-217` |
| Client.CLI-009 | Low | Resolved | Code organization & conventions | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` |
| Client.CLI-010 | Low | Resolved | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` |
| Client.Shared-003 | Low | Resolved | Correctness & logic bugs | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` |
| Client.Shared-004 | Low | Resolved | OtOpcUa conventions | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` |
| Client.Shared-009 | Low | Resolved | Error handling & resilience / Documentation & comments | `OpcUaClientService.cs:302-322` |
| Client.Shared-010 | Low | Resolved | Performance & resource management | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` |
| Client.Shared-011 | Low | Resolved | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` |
| Client.UI-003 | Low | Resolved | OtOpcUa conventions | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` |
| Client.UI-004 | Low | Resolved | OtOpcUa conventions | `Views/MainWindow.axaml.cs:125-138` |
| Client.UI-006 | Low | Resolved | Error handling & resilience | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` |
| Client.UI-009 | Low | Resolved | Design-document adherence | `ViewModels/HistoryViewModel.cs:44-54` |
| Client.UI-010 | Low | Resolved | Code organization & conventions | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` |
| Client.UI-011 | Low | Resolved | Documentation & comments | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` |
| Configuration-004 | Low | Resolved | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
| Configuration-005 | Low | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
| Configuration-007 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
@@ -294,14 +287,16 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.ScriptedAlarms-003 | Low | Resolved | Documentation & comments | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` |
| Core.ScriptedAlarms-006 | Low | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` |
| Core.ScriptedAlarms-008 | Low | Resolved | Performance & resource management | `Part9StateMachine.cs:261-268` |
| Core.ScriptedAlarms-009 | Low | Won't Fix | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
| Core.ScriptedAlarms-009 | Low | Resolved | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
| Core.ScriptedAlarms-010 | Low | Resolved | Design-document adherence | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` |
| Core.ScriptedAlarms-011 | Low | Resolved | Code organization & conventions | `Part9StateMachine.cs:275` |
| Core.ScriptedAlarms-013 | Low | Resolved | Documentation & comments | `ScriptedAlarmEngine.cs:66-81` |
| Core.Scripting-005 | Low | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:97` |
| Core.Scripting-006 | Low | Resolved | Concurrency & thread safety | `CompiledScriptCache.cs:55` |
| Core.Scripting-008 | Low | Won't Fix | Performance & resource management | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Core.Scripting-008 | Low | Resolved | Performance & resource management | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Core.Scripting-009 | Low | Resolved | Design-document adherence | `ForbiddenTypeAnalyzer.cs:45` |
| Core.Scripting-011 | Low | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` |
| Core.Scripting-015 | Low | Resolved | Correctness & logic bugs | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
| Core.VirtualTags-004 | Low | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:349` |
| Core.VirtualTags-006 | Low | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:177-182`, `:395-401` |
| Core.VirtualTags-007 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/TimerTriggerScheduler.cs:58` |
@@ -329,15 +324,25 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.AbLegacy.Cli-005 | Low | Resolved | Design-document adherence | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` |
| Driver.AbLegacy.Cli-006 | Low | Resolved | Code organization & conventions | `Commands/ProbeCommand.cs:20-22` |
| Driver.AbLegacy.Cli-007 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` |
| Driver.Cli.Common-004 | Low | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` |
| Driver.Cli.Common-006 | Low | Resolved | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` |
| Driver.Cli.Common-008 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:50-64` |
| Driver.FOCAS-007 | Low | Resolved | Error handling & resilience | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` |
| Driver.FOCAS-008 | Low | Resolved | Performance & resource management | `FocasDriver.cs:201`, `FocasDriver.cs:253` |
| Driver.FOCAS-009 | Low | Resolved | Design-document adherence | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` |
| Driver.FOCAS-010 | Low | Resolved | Code organization & conventions | `IFocasClient.cs:210-227` (`FocasOpMode`), `FocasConstants.cs:42-78` (`FocasOperationMode`) |
| Driver.FOCAS-011 | Low | Resolved | Code organization & conventions | `IFocasClient.cs:275-287` (`FocasAlarmType`), `FocasAlarmProjection.cs:149-175` |
| Driver.FOCAS.Cli-001 | Low | Resolved | Error handling & resilience | `Commands/WriteCommand.cs:58-68` |
| Driver.FOCAS.Cli-002 | Low | Resolved | Concurrency & thread safety | `Commands/SubscribeCommand.cs:45-51` |
| Driver.FOCAS.Cli-003 | Low | Resolved | Error handling & resilience | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) |
| Driver.FOCAS.Cli-004 | Low | Resolved | Performance & resource management | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` |
| Driver.FOCAS.Cli-005 | Low | Resolved | Design-document adherence | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) |
| Driver.Galaxy-005 | Low | Resolved | OtOpcUa conventions | `Runtime/EventPump.cs:81-88` |
| Driver.Galaxy-010 | Low | Resolved | Security | `GalaxyDriver.cs:311-341` |
| Driver.Galaxy-012 | Low | Resolved | Performance & resource management | `Runtime/SubscriptionRegistry.cs:65-67`, `GalaxyDriver.cs:538`, `GalaxyDriver.cs:675` |
| Driver.Galaxy-013 | Low | Resolved | Design-document adherence | `GalaxyDriver.cs:14-27`, `GalaxyDriver.cs:374-382`, `Config/GalaxyDriverOptions.cs:84-86` |
| Driver.Galaxy-017 | Low | Deferred | Design-document adherence | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (no source change), gateway proto contract |
| Driver.Galaxy-018 | Low | Resolved | Documentation & comments | `libs/README.md:32-37`, `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47` |
| Driver.Historian.Wonderware-004 | Low | Resolved | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` |
| Driver.Historian.Wonderware-005 | Low | Resolved | Concurrency and thread safety | `Backend/HistorianDataSource.cs:124`, `:126-127` |
| Driver.Historian.Wonderware-007 | Low | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:70-75` |
@@ -345,6 +350,11 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.Historian.Wonderware-010 | Low | Resolved | Performance and resource management | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) |
| Driver.Historian.Wonderware-011 | Low | Resolved | Design-document adherence | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` |
| Driver.Historian.Wonderware-012 | Low | Resolved | Testing coverage | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` |
| Driver.Historian.Wonderware.Client-003 | Low | Resolved | Concurrency & thread safety | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
| Driver.Historian.Wonderware.Client-004 | Low | Resolved | Concurrency & thread safety | `WonderwareHistorianClient.cs:203-267` |
| Driver.Historian.Wonderware.Client-006 | Low | Resolved | Error handling & resilience | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
| Driver.Historian.Wonderware.Client-008 | Low | Resolved | Security | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
| Driver.Historian.Wonderware.Client-010 | Low | Resolved | Documentation & comments | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
| Driver.Modbus-003 | Low | Resolved | Concurrency & thread safety | `ModbusDriver.cs:59,188,241,259,266,726,745,759` |
| Driver.Modbus-007 | Low | Resolved | Design-document adherence | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` |
| Driver.Modbus-008 | Low | Resolved | Documentation & comments | `ModbusDriver.cs:411-417,700-703,737-744` |
@@ -390,3 +400,4 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-012 | Low | Resolved | Performance & resource management | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
| Server-014 | Low | Resolved | Code organization & conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
| Server-015 | Low | Resolved | Documentation & comments | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
| Driver.Galaxy-015 | ~~Medium~~ Low (re-triaged 2026-05-23) | Resolved | ~~Security~~ Documentation & comments (re-triaged 2026-05-23) | `libs/MxGateway.Client.dll`, `libs/MxGateway.Contracts.dll`, `libs/README.md` |
+40 -16
View File
@@ -149,53 +149,77 @@ otopcua-cli browse -u opc.tcp://localhost:4840/OtOpcUa -U admin -P admin123 -r -
### subscribe
Monitors a node for value changes using OPC UA subscriptions. Prints each data change notification with timestamp, value, and status code. Runs until Ctrl+C, then unsubscribes and disconnects cleanly.
Monitors a node (or every Variable in its subtree) for value changes using OPC UA subscriptions.
Prints each data-change notification with timestamp, value, and status code, then prints a
summary on exit. Exits on Ctrl+C, or automatically after `--duration` seconds.
```bash
# Subscribe to a single node
otopcua-cli subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=MyNode" -i 500
# Browse a subtree and subscribe to every Variable, run for 60 seconds, write the summary to disk
otopcua-cli subscribe -u opc.tcp://localhost:4840 -n "ns=3;s=ZB" -r --max-depth 4 \
--duration 60 --quiet --summary-file C:\Temp\subscribe-summary.txt
```
| Flag | Description |
|------|-------------|
| `-n` / `--node` | Node ID to monitor (required) |
| `-i` / `--interval` | Sampling/publishing interval in milliseconds (default: 1000) |
| `-n` / `--node` | Node ID to monitor (required). When `--recursive` is set, this is the browse root. |
| `-i` / `--interval` | Sampling interval in milliseconds (default: 1000) |
| `-r` / `--recursive` | Browse recursively from `--node` and subscribe to every Variable found |
| `--max-depth` | Maximum recursion depth when `--recursive` is set (default: 10) |
| `-q` / `--quiet` | Suppress per-update output; only print the final summary |
| `--duration` | Auto-exit after N seconds and print the summary (0 = run until Ctrl+C, default: 0) |
| `--summary-file` | Also write the summary to this file path on exit |
#### Summary buckets
The summary prints per-node counts across these buckets:
- **Ever went BAD during window** — node received at least one notification whose status was not Good.
- **NEVER went bad (suspect)** — node received at least one notification and every one was Good.
- **Last status GOOD / NOT-GOOD** — final observed status across nodes that received any update.
- **No update received at all** — node was subscribed but no notification arrived during the window.
### historyread
Reads historical data from a node. Supports raw history reads and aggregate (processed) history reads.
`--start` and `--end` are parsed with `CultureInfo.InvariantCulture` and treated as UTC; supply
them in ISO 8601 UTC form (`YYYY-MM-DDTHH:MM:SSZ`) for unambiguous behaviour across hosts.
```bash
# Raw history
otopcua-cli historyread -u opc.tcp://localhost:4840/OtOpcUa \
-n "ns=1;s=TestMachine_001.TestHistoryValue" \
--start "2026-03-25" --end "2026-03-30"
--start "2026-03-25T00:00:00Z" --end "2026-03-30T00:00:00Z"
# Aggregate: 1-hour average
otopcua-cli historyread -u opc.tcp://localhost:4840/OtOpcUa \
-n "ns=1;s=TestMachine_001.TestHistoryValue" \
--start "2026-03-25" --end "2026-03-30" \
--start "2026-03-25T00:00:00Z" --end "2026-03-30T00:00:00Z" \
--aggregate Average --interval 3600000
```
| Flag | Description |
|------|-------------|
| `-n` / `--node` | Node ID to read history for (required) |
| `--start` | Start time, ISO 8601 or date string (default: 24 hours ago) |
| `--end` | End time, ISO 8601 or date string (default: now) |
| `--start` | Start time in ISO 8601 UTC format, e.g. `2026-01-15T08:00:00Z` (default: 24 hours ago) |
| `--end` | End time in ISO 8601 UTC format, e.g. `2026-01-15T09:00:00Z` (default: now) |
| `--max` | Maximum number of values (default: 1000) |
| `--aggregate` | Aggregate function: Average, Minimum, Maximum, Count, Start, End |
| `--aggregate` | Aggregate function: Average, Minimum, Maximum, Count, Start, End, StandardDeviation |
| `--interval` | Processing interval in milliseconds for aggregates (default: 3600000) |
#### Aggregate mapping
| Name | OPC UA Node ID |
|------|---------------|
| `Average` | `AggregateFunction_Average` |
| `Minimum` | `AggregateFunction_Minimum` |
| `Maximum` | `AggregateFunction_Maximum` |
| `Count` | `AggregateFunction_Count` |
| `Start` | `AggregateFunction_Start` |
| `End` | `AggregateFunction_End` |
| Name | Aliases | OPC UA Node ID |
|------|---------|---------------|
| `Average` | `avg` | `AggregateFunction_Average` |
| `Minimum` | `min` | `AggregateFunction_Minimum` |
| `Maximum` | `max` | `AggregateFunction_Maximum` |
| `Count` | | `AggregateFunction_Count` |
| `Start` | `first` | `AggregateFunction_Start` |
| `End` | `last` | `AggregateFunction_End` |
| `StandardDeviation` | `stddev`, `stdev` | `AggregateFunction_StandardDeviationSample` |
### alarms
+1 -1
View File
@@ -198,7 +198,7 @@ All times are in UTC. Invalid input turns red on blur.
| Option | Description |
|--------|-------------|
| Aggregate | Raw (default), Average, Minimum, Maximum, Count, Start, End |
| Aggregate | Raw (default), Average, Minimum, Maximum, Count, Start, End, Standard Deviation |
| Interval (ms) | Processing interval for aggregate queries (shown only for aggregates) |
| Max Values | Maximum number of raw values to return (default 1000) |
+1 -1
View File
@@ -35,7 +35,7 @@ new ScriptedAlarmDefinition(
## Predicate evaluation
Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known resource limits — unbounded script-side memory, the per-publish accretion of dynamically-emitted script assemblies (Core.Scripting-008), and the orphan-thread CPU-budget caveat — are documented in that file as well.
Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known resource limits — unbounded script-side memory and the orphan-thread CPU-budget caveat — are documented in that file as well; per-publish assembly accretion was resolved by the Core.Scripting-008 collectible-`AssemblyLoadContext` rewrite and no longer requires periodic server restarts.
`AlarmPredicateContext` (`AlarmPredicateContext.cs`) is the script's `ScriptContext` subclass:
+5 -3
View File
@@ -6,7 +6,7 @@ The runtime is split across two projects: `Core.Scripting` holds the Roslyn sand
## Roslyn script sandbox (`Core.Scripting`)
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp.Scripting` against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp` (regular compiler, not the scripting variant — the original `CSharpScript` pipeline was retired by the Core.Scripting-008 / -016 rewrite, see "Compile cache" below). Each script's source is pasted as the body of a synthesized `CompiledScript.Run(ScriptGlobals<TContext>)` method against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
### Compile pipeline (`ScriptEvaluator<TContext, TResult>`)
@@ -30,11 +30,13 @@ Similarly, **`System.Threading.Tasks` is now denied** (Core.Scripting-003), whic
`ConcurrentDictionary<string, Lazy<ScriptEvaluator<...>>>` keyed on `SHA-256(UTF8(source))` rendered to hex. `Lazy<T>` with `ExecutionAndPublication` mode means two threads racing a miss compile exactly once. Failed compiles evict the entry (via the `TryRemove(KeyValuePair<,>)` overload so a concurrently re-added retry entry is not collateral damage — Core.Scripting-006) so a corrected retry can succeed (used during Admin UI authoring). No capacity bound — scripts are operator-authored and bounded by the config DB. Whitespace changes miss the cache on purpose. `Clear()` is called on config-publish.
**Per-publish assembly accretion (accepted limitation, Core.Scripting-008).** Each compiled `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>` delegate, which keeps the dynamically-emitted script assembly loaded for the process lifetime. Emitted assemblies in the default `AssemblyLoadContext` cannot be unloaded; `CompiledScriptCache.Clear()` drops the dictionary entries but does **not** unload the underlying assemblies. Across many config-publish generations (each `Clear()` followed by recompiling every script), the process accumulates dead script assemblies. For the expected "low thousands" of scripts this is benign, but a long-running server with very frequent publishes will see steady managed-memory growth that does not return until the process restarts. Out-of-process script evaluation or a collectible `AssemblyLoadContext` is a v3 concern; deployments with high-publish-frequency requirements should schedule a periodic server restart to reclaim the accrued assemblies.
**Per-publish assembly unload (Core.Scripting-008 resolved).** Each compiled `ScriptEvaluator` emits its script into a dedicated **collectible** `AssemblyLoadContext` — the BCL escape hatch for assemblies that can be unloaded. The compile path is hand-rolled `CSharpCompilation.Create` + `Emit(MemoryStream)` + `ScriptAssemblyLoadContext.LoadFromStream` rather than the legacy `CSharpScript.CreateDelegate` (which emits into the default ALC and cannot be unloaded). `ScriptEvaluator.Dispose()` calls `AssemblyLoadContext.Unload()` and `CompiledScriptCache.Clear()` disposes every materialised evaluator before dropping its dictionary entry, so the emitted assemblies become eligible for GC immediately after a config-publish. The reclaim is GC-timing-sensitive (Unload is *eligible-for-collection*, not synchronous); the next collection cycle reclaims them. Regression tests `Dispose_unloads_compiled_script_assembly_load_context` and `Clear_disposes_every_materialised_evaluator` in `CompiledScriptCacheTests` lock this contract via `WeakReference` + `GC.Collect()` assertions. Server restarts are no longer required to reclaim compiled-script memory.
**Scripting authoring convention.** With the collectible-ALC rewrite, the wrapper around a user script is an ordinary C# static method, not a Roslyn `Script` submission. The script body is pasted verbatim as the method body and must therefore end with an explicit `return …;` per ordinary C# rules — the legacy `CSharpScript` "last expression yields result" shorthand is gone. Every script in the existing test corpus already uses explicit `return`; this convention is operator-visible only when authoring a brand-new script from scratch.
### Per-evaluation timeout (`TimedScriptEvaluator<TContext, TResult>`)
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying `ScriptRunner` keeps running on its thread-pool thread until the Roslyn runtime returns (documented trade-off; out-of-process evaluation is a v3 concern).
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying compiled-script delegate keeps running on its `Task.Run` thread-pool thread until it returns of its own accord (the CT is checked only at evaluator entry; once the script body is running, only the script returning or throwing will release the thread). The post-rewrite delegate is a regular C# `Func<>` bound to the synthesized `CompiledScript.Run` method, so this is a vanilla "synchronous CPU-bound work on a pool thread" leak rather than anything Roslyn-specific. Documented trade-off; out-of-process evaluation is a v3 concern.
### Script logger plumbing
+13 -3
View File
@@ -583,10 +583,20 @@ language binding.
**Depends on:** A.1 merged (proto change live).
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`):
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\src\` for .NET — note the sibling
repo restructured after this plan was written; `clients/dotnet/MxGateway.Client.csproj`
no longer exists, the proto contracts now live in
`src/ZB.MOM.WW.MxGateway.Contracts/` under the new namespace
`ZB.MOM.WW.MxGateway.Contracts.Proto[.Galaxy]`; the OtOpcUa driver currently
consumes vendored binaries from the pre-restructure build — see
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/README.md`):
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just
rebuild `MxGateway.Client.csproj` after pulling A.1.
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; rebuild
`src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj`
after pulling A.1. (If unwinding the driver's vendored binaries onto the
new contracts namespace as part of the alarm work, namespace-rename + a
reimplementation of the missing `MxGatewayClient` / `MxGatewaySession`
wrappers is also in scope.)
2. **Python** — run `clients\python\generate-proto.ps1`; commit the
regenerated `_pb2.py` + `_pb2_grpc.py` files under
`clients\python\src\`.
+3 -4
View File
@@ -1,6 +1,6 @@
# High-Level Requirements
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as three cooperating processes (Server, Admin, Galaxy.Host). The Galaxy integration is one of seven shipped drivers. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 are new and cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
> **Revision** — Refreshed 2026-05-23 for the OtOpcUa v2 multi-driver platform. The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as two cooperating processes (Server, Admin). The Galaxy integration is one of seven shipped drivers and is now an in-process Tier-A driver that talks gRPC to a separately installed `mxaccessgw` gateway (sibling repo) — PR 7.2 (2026-04-30) retired the legacy out-of-process `Galaxy.Host` Windows service. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
## HLR-001: OPC UA Server
@@ -28,11 +28,10 @@ Drivers whose backend has a native change signal (e.g. Galaxy's `time_of_last_de
## HLR-007: Service Hosting
The system shall be deployed as three cooperating Windows services:
The system shall be deployed as two cooperating Windows services (the legacy `OtOpcUa.Galaxy.Host` x86 host was retired in PR 7.2 — Galaxy access now flows through the separately installed `mxaccessgw` gateway, which lives in a sibling repository and is not part of the OtOpcUa deployment):
- **OtOpcUa.Server** — .NET 10 x64, `Microsoft.Extensions.Hosting` + `AddWindowsService`, hosts all non-Galaxy drivers in-process and the OPC UA endpoint.
- **OtOpcUa.Server** — .NET 10 AnyCPU, `Microsoft.Extensions.Hosting` + `AddWindowsService` (decision #30 replaced the original TopShelf choice), hosts every driver in-process — including the new Tier-A `GalaxyDriver` that speaks gRPC to `mxaccessgw` and the OPC UA endpoint.
- **OtOpcUa.Admin** — .NET 10 x64 Blazor Server web app, hosts the admin UI, SignalR hubs for live updates, `/metrics` Prometheus endpoint, and audit log writers.
- **OtOpcUa.Galaxy.Host** — .NET Framework 4.8 x86 (TopShelf), hosts MXAccess COM + Galaxy Repository SQL + Historian plugin. Talks to `Driver.Galaxy.Proxy` inside `OtOpcUa.Server` via a named pipe (MessagePack over length-prefixed frames, per-process shared secret, SID-restricted ACL).
## HLR-008: Logging
+6 -3
View File
@@ -151,8 +151,11 @@ substantive driver change, and revise this table when the data does.
`SubscriptionRegistry`, or a downstream consumer retaining
`DataValueSnapshot` references past their useful life.
## Scripted-alarm engine — known hot-path allocations
## Scripted-alarm engine — hot-path allocation reuse
`ScriptedAlarmEngine.BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` and `AlarmPredicateContext` on every predicate evaluation — i.e. once per upstream tag change per referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. (Core.ScriptedAlarms-009)
`ScriptedAlarmEngine` keeps a per-alarm reusable evaluation scratch in `_scratchByAlarmId` — the read-cache `Dictionary<string, DataValueSnapshot>` and the `AlarmPredicateContext` are allocated once per alarm (on first evaluation) and refilled in place across every subsequent predicate evaluation. The hot path no longer allocates a fresh dictionary + context per upstream tag change. (Core.ScriptedAlarms-009)
The allocations are deliberate for now: predicate evaluation is already serialised under `_evalGate`, so a single reused scratch dictionary would be safe, but the per-call dictionary keeps the evaluation surface immutable and trivially safe against future refactors. If a future scripted-alarm soak surfaces allocation pressure on this path, the mitigation is a per-alarm scratch buffer cleared between evaluations — note here before changing the engine.
Safety: reuse is serialised under `_evalGate`, so two threads can never observe the same scratch in a half-refilled state. The context wraps the read-cache by reference, so refilling the dictionary is what the predicate's `ctx.GetTag(path)` calls observe. `LoadAsync` clears `_scratchByAlarmId` alongside `_alarms` so a config-publish drops the prior generation's scratch (a new generation may carry different `Inputs` / `Logger`). Regression tests in `ScriptedAlarmEngineTests` lock the reuse contract:
- `Reevaluation_reuses_the_same_read_cache_dictionary` — asserts dictionary instance identity across two evaluations.
- `Reevaluation_reuses_the_same_predicate_context` — same, for the context.
- `LoadAsync_drops_the_prior_generations_scratch` — asserts a publish resets the scratch.
@@ -29,7 +29,7 @@ Tie-in capability — **historian alarm sink**:
| 3 | Evaluation trigger = **change-driven + timer-driven**; operator chooses per-tag | Change-driven is cheap at steady state; timer is the escape hatch for polling derivations that don't have a discrete "input changed" signal. |
| 4 | Script shape = **Shape A — one script per virtual tag/alarm**; `return` produces the value (or `bool` for alarm condition) | Minimal surface; no predicate/action split. Alarm side-effects (severity, message) configured out-of-band, not in the script. |
| 5 | Alarm fidelity = **full OPC UA Part 9** | Uniform with Galaxy + ALMD on the wire; client-side tooling (HMIs, historians, event pipelines) gets one shape. |
| 6 | Sandbox = **read-only context**; scripts can only read any tag + write to virtual tags | Strict Roslyn `ScriptOptions` allow-list. Authoritative deny-list (`ForbiddenTypeAnalyzer`): namespace-prefix deny `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks` (Task / Parallel fan-out — Core.Scripting-003), `System.Runtime.InteropServices`, `Microsoft.Win32`; type-granular deny `System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread` (these live directly in the allow-listed `System` / `System.Threading` namespaces, so a prefix rule cannot reach them without blocking primitives — Core.Scripting-001 / -009). |
| 6 | Sandbox = **read-only context**; scripts can only read any tag + write to virtual tags | Strict Roslyn `ScriptOptions` allow-list. Authoritative deny-list (`ForbiddenTypeAnalyzer`): namespace-prefix deny `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks` (Task / Parallel fan-out — Core.Scripting-003), `System.Runtime.InteropServices`, `System.Runtime.Loader` (AssemblyLoadContext et al. — Core.Scripting-012), `Microsoft.Win32`; type-granular deny `System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`, `System.Threading.ThreadPool` (Core.Scripting-012), `System.Threading.Timer` (Core.Scripting-012) (these live directly in the allow-listed `System` / `System.Threading` namespaces, so a prefix rule cannot reach them without blocking primitives — Core.Scripting-001 / -009 / -012). |
| 7 | Dependency declaration = **AST inference**; operator doesn't maintain a separate dependency list | `CSharpSyntaxWalker` extracts `ctx.GetTag("path")` string-literal calls at compile time; dynamic paths rejected at publish. |
| 8 | Config storage = **config DB with generation-sealed cache** (same as driver instances) | Virtual tags + alarms publish atomically in the same generation as the driver instance config they may depend on. |
| 9 | Script return value shape (`ctx.GetTag`) = **`DataValue { Value, StatusCode, Timestamp }`** | Scripts branch on quality naturally without separate `ctx.GetQuality(...)` calls. |
@@ -89,7 +89,7 @@ Tie-in capability — **historian alarm sink**:
### Stream A — `Core.Scripting` (Roslyn engine + sandbox + AST inference + logger) — **2 weeks**
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers.
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers. _(Implementation note 2026-05-23 — superseded by Core.Scripting-008 / -016: the `CSharpScript`/`ScriptRunner` path was replaced with a hand-rolled `CSharpCompilation.Create``Emit(MemoryStream)` → collectible `ScriptAssemblyLoadContext.LoadFromStream` pipeline so per-publish ALC accretion is reclaimable, and engines route compiles through `CompiledScriptCache` rather than calling `ScriptEvaluator.Compile` directly. The reference list was correspondingly widened from the narrow allow-list above to the full BCL `TRUSTED_PLATFORM_ASSEMBLIES` set (filtered to `System.*` + `netstandard` + `Microsoft.Win32.Registry`) because the new pipeline can't compile against the old narrow set; `ForbiddenTypeAnalyzer` is now the sole security gate, consistent with how Core.Scripting-001 / -002 established the analyzer must be the real boundary because type forwarding makes any references-list-only restriction porous. See `docs/VirtualTags.md` "Compile cache" for the current implementation contract.)_
2. **A.2** `DependencyExtractor : CSharpSyntaxWalker`. Visits every `InvocationExpressionSyntax` targeting `ctx.GetTag` / `ctx.SetVirtualTag`; accepts only a `LiteralExpressionSyntax` argument. Non-literal arguments (concat, variable, method call) → publish-time rejection with an actionable error pointing the operator at the exact span. Outputs `IReadOnlySet<string> Inputs` + `IReadOnlySet<string> Outputs`.
3. **A.3** Compile cache. `(source_hash) → compiled Script<T>`. Recompile only when source changes. Warm on `SealedBootstrap`.
4. **A.4** Per-evaluation timeout wrapper (default 250ms; configurable per tag). Timeout = tag quality `BadInternalError` + structured warning log. Keeps a single runaway script from starving the engine.
@@ -162,7 +162,7 @@ Tie-in capability — **historian alarm sink**:
## Compliance Checks (run at exit gate)
- [ ] **Sandbox escape**: attempts to reference any deny-listed namespace prefix (`System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `Microsoft.Win32`) or any of the type-granular forbidden types (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`) fail at script compile with an actionable error. Vectors include direct calls, `typeof(T)`, generic type arguments, casts, `is`/`as` patterns, `default(T)`, array element types, and explicitly-typed local declarations.
- [ ] **Sandbox escape**: attempts to reference any deny-listed namespace prefix (`System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `System.Runtime.Loader`, `Microsoft.Win32`) or any of the type-granular forbidden types (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`, `System.Threading.ThreadPool`, `System.Threading.Timer`) fail at script compile with an actionable error. Vectors include direct calls, `typeof(T)`, generic type arguments, casts, `is`/`as` patterns, `default(T)`, array element types, and explicitly-typed local declarations.
- [ ] **Dependency inference**: `ctx.GetTag(myStringVar)` (non-literal path) is rejected at publish with a span-pointed error; `ctx.GetTag("Line1/Speed")` is accepted + appears in the inferred input set.
- [ ] **Change cascade**: tag A → virtual tag B → virtual tag C. When A changes, B recomputes, then C recomputes. Single change event triggers the full cascade in topological order within one evaluation pass.
- [ ] **Cycle rejection**: publish a config where virtual tag B depends on A and A depends on B. Publish fails pre-commit with a clear cycle message.
+2 -2
View File
@@ -193,7 +193,7 @@ ConfigurationService
- Compact binary format, faster than JSON, good fit for high-frequency data change callbacks
- Simpler than gRPC on .NET 4.8 (which needs legacy `Grpc.Core` native library)
**Decided: Galaxy Host is a separate Windows service.**
**Decided: Galaxy Host is a separate Windows service.** _(Reversed by PR 7.2, 2026-04-30 — see PR 7.2's commit `ae7106d` and the project_galaxy_via_mxgateway memory entry. The legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the `OtOpcUaGalaxyHost` Windows service were retired; Galaxy access now flows through the in-process Tier-A `GalaxyDriver` talking gRPC to a separately installed `mxaccessgw` gateway sibling repo. The reasoning below was correct for the original LMX/x86-COM architecture; the gateway sibling repo now owns those constraints externally.)_
- Independent lifecycle from the OtOpcUa Server
- Can be restarted without affecting the main server or other drivers
- Galaxy.Proxy detects connection loss, sets Bad quality on Galaxy nodes, reconnects when Host comes back
@@ -801,7 +801,7 @@ aggregate runner (#253); server-side factory + seed SQL per driver (#210#213)
| 26 | Admin deploys on same server (co-hosted) | Simplifies deployment; can also run on separate management host | 2026-04-16 |
| 27 | Admin scaffold early, driver-specific screens deferred | Core CRUD for instances/drivers first; per-driver config UI added with each driver | 2026-04-16 |
| 28 | Named pipes for Galaxy IPC | Fast, no port conflicts, native to both .NET 4.8 and .NET 10 | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 (**reversed PR 7.2, 2026-04-30** — Galaxy is now an in-process Tier-A driver talking gRPC to the sibling `mxaccessgw` gateway; see the decision body above) |
| 30 | Drop TopShelf, use Microsoft.Extensions.Hosting | Built-in Windows Service support in .NET 10, no third-party dependency | 2026-04-16 |
| 31 | Mono-repo for all drivers | Simpler dependency management, single CI pipeline, shared abstractions | 2026-04-16 |
| 32 | MessagePack serialization for Galaxy IPC | Binary, fast, works on .NET 4.8+ and .NET 10 via MessagePack-CSharp NuGet | 2026-04-16 |
+1 -1
View File
@@ -67,7 +67,7 @@ Remaining Phase 6.3 surfaces (hardening, not release-blocking):
- **AB CIP** (PRs #202222) — `Driver.AbCip`, `Driver.AbCip.IntegrationTests` (7 tests), AB CIP Cli. Live-boot verified against a ControlLogix rig.
- **AB Legacy** (PRs #202, #223) — `Driver.AbLegacy`, `Driver.AbLegacy.IntegrationTests` (2 tests), AB Legacy Cli. PCCC cip-path workaround for SLC/MicroLogix.
- **TwinCAT ADS** (PRs #205, this branch `task-galaxy-e2e`) — `Driver.TwinCAT`, `Driver.TwinCAT.IntegrationTests` (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
- **TwinCAT ADS** (PR #205) — `Driver.TwinCAT`, `Driver.TwinCAT.IntegrationTests` (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
- **FOCAS** (PRs #173, #199 + this session's migration) — `Driver.FOCAS` with an **in-process managed `FocasWireClient`** that speaks FOCAS/2 over TCP directly. Tier-C isolation retired — `Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `FwlibNative` P/Invoke + shim DLL + NSSM service all deleted. `Driver.FOCAS.IntegrationTests` covers 9 scenarios (fixed tree identity/axes/program/timers/spindle + user-authored PARAM/MACRO/PMC reads, Browse, Subscribe, IAlarmSource raise/clear, Probe transitions).
Decision recorded: FOCAS is **read-only** against the CNC by design — writes return `BadNotWritable`. See `docs/drivers/FOCAS.md` + `docs/drivers/FOCAS-Test-Fixture.md` for the deployment + coverage map.
@@ -109,8 +109,16 @@ public abstract class CommandBase : ICommand
/// <summary>
/// Configures Serilog based on the verbose flag.
/// </summary>
/// <remarks>
/// Disposes the previously assigned <see cref="Log.Logger" /> via <see cref="Log.CloseAndFlush" />
/// before installing the new one, so repeated CLI invocations (e.g. in the test suite) do not
/// leak the prior logger's console sink.
/// </remarks>
protected void ConfigureLogging()
{
// Dispose any previously installed logger before swapping in a new one.
Log.CloseAndFlush();
var config = new LoggerConfiguration();
if (Verbose)
config.MinimumLevel.Debug()
@@ -1,6 +1,8 @@
using System.Threading.Channels;
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
@@ -43,14 +45,25 @@ public class AlarmsCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (Interval <= 0)
throw new CommandException($"--interval must be greater than 0 (was {Interval}).");
NodeId? sourceNodeId;
try
{
sourceNodeId = NodeIdParser.Parse(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var sourceNodeId = NodeIdParser.Parse(NodeId);
// Channel serialises SDK notification-thread writes to the main async loop so
// that concurrent alarm callbacks never interleave on the shared TextWriter.
var outputChannel = Channel.CreateUnbounded<string>(
@@ -1,4 +1,5 @@
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -42,13 +43,25 @@ public class BrowseCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (Depth <= 0)
throw new CommandException($"--depth must be greater than 0 (was {Depth}).");
NodeId? startNode;
try
{
startNode = NodeIdParser.Parse(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var startNode = NodeIdParser.Parse(NodeId);
var maxDepth = Recursive ? Depth : 1;
await BrowseNodeAsync(service, console, startNode, maxDepth, 0, ct);
@@ -1,5 +1,6 @@
using System.Globalization;
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -62,22 +63,65 @@ public class HistoryReadCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (MaxValues <= 0)
throw new CommandException($"--max must be greater than 0 (was {MaxValues}).");
if (!string.IsNullOrEmpty(Aggregate) && IntervalMs <= 0)
throw new CommandException($"--interval must be greater than 0 (was {IntervalMs}).");
NodeId nodeId;
try
{
nodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
DateTime start, end;
try
{
start = string.IsNullOrEmpty(StartTime)
? DateTime.UtcNow.AddHours(-24)
: DateTime.Parse(StartTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
}
catch (FormatException ex)
{
throw new CommandException($"Invalid --start value '{StartTime}': {ex.Message}. Expected ISO 8601 UTC format, e.g. 2026-01-15T08:00:00Z.");
}
try
{
end = string.IsNullOrEmpty(EndTime)
? DateTime.UtcNow
: DateTime.Parse(EndTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
}
catch (FormatException ex)
{
throw new CommandException($"Invalid --end value '{EndTime}': {ex.Message}. Expected ISO 8601 UTC format, e.g. 2026-01-15T08:00:00Z.");
}
AggregateType aggregateType = default;
if (!string.IsNullOrEmpty(Aggregate))
{
try
{
aggregateType = ParseAggregateType(Aggregate);
}
catch (ArgumentException ex)
{
throw new CommandException($"Invalid --aggregate value: {ex.Message}");
}
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var nodeId = NodeIdParser.ParseRequired(NodeId);
var start = string.IsNullOrEmpty(StartTime)
? DateTime.UtcNow.AddHours(-24)
: DateTime.Parse(StartTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
var end = string.IsNullOrEmpty(EndTime)
? DateTime.UtcNow
: DateTime.Parse(EndTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
IReadOnlyList<DataValue> values;
if (string.IsNullOrEmpty(Aggregate))
@@ -88,7 +132,6 @@ public class HistoryReadCommand : CommandBase
}
else
{
var aggregateType = ParseAggregateType(Aggregate);
await console.Output.WriteLineAsync(
$"History for {NodeId} ({Aggregate}, interval={IntervalMs}ms)");
values = await service.HistoryReadAggregateAsync(
@@ -1,5 +1,7 @@
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
@@ -29,13 +31,23 @@ public class ReadCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
NodeId nodeId;
try
{
nodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var nodeId = NodeIdParser.ParseRequired(NodeId);
var value = await service.ReadValueAsync(nodeId, ct);
await console.Output.WriteLineAsync($"Node: {NodeId}");
@@ -1,6 +1,7 @@
using System.Collections.Concurrent;
using System.Threading.Channels;
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -12,42 +13,92 @@ namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
[Command("subscribe", Description = "Monitor a node for value changes")]
public class SubscribeCommand : CommandBase
{
/// <summary>
/// Creates the subscribe command used to monitor a node (or a subtree of nodes) for data-change
/// notifications.
/// </summary>
/// <param name="factory">The factory that creates the shared client service for the command run.</param>
public SubscribeCommand(IOpcUaClientServiceFactory factory) : base(factory)
{
}
/// <summary>
/// Gets the node ID to monitor. When <see cref="Recursive" /> is set, this node is the browse root
/// and every <c>Variable</c> child it reaches is subscribed.
/// </summary>
[CommandOption("node", 'n', Description = "Node ID to monitor", IsRequired = true)]
public string NodeId { get; init; } = default!;
/// <summary>
/// Gets the sampling interval, in milliseconds, requested for every monitored item.
/// </summary>
[CommandOption("interval", 'i', Description = "Sampling interval in milliseconds")]
public int Interval { get; init; } = 1000;
/// <summary>
/// Gets a value indicating whether the command should browse from <see cref="NodeId" />
/// and subscribe to every <c>Variable</c> in the subtree.
/// </summary>
[CommandOption("recursive", 'r', Description = "Browse recursively from --node and subscribe to every Variable found")]
public bool Recursive { get; init; }
/// <summary>
/// Gets the maximum recursion depth applied while collecting variables when <see cref="Recursive" /> is set.
/// </summary>
[CommandOption("max-depth", Description = "Maximum recursion depth when --recursive is set")]
public int MaxDepth { get; init; } = 10;
/// <summary>
/// Gets a value indicating whether per-update lines should be suppressed in favour of the final summary only.
/// </summary>
[CommandOption("quiet", 'q', Description = "Suppress per-update output; only print a final summary on Ctrl+C")]
public bool Quiet { get; init; }
/// <summary>
/// Gets the duration, in seconds, before the command auto-exits and prints its summary.
/// A value of <c>0</c> means the command runs until Ctrl+C.
/// </summary>
[CommandOption("duration", Description = "Auto-exit after N seconds and print summary (0 = run until Ctrl+C)")]
public int DurationSeconds { get; init; } = 0;
/// <summary>
/// Gets the optional path that the command should write the final summary to on exit, in addition to stdout.
/// </summary>
[CommandOption("summary-file", Description = "Write summary to this file path on exit (in addition to stdout)")]
public string? SummaryFile { get; init; }
/// <summary>
/// Connects to the server, subscribes to <see cref="NodeId" /> (or its subtree when recursive),
/// streams data-change notifications to the console, and prints a summary when the command exits.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (Interval <= 0)
throw new CommandException($"--interval must be greater than 0 (was {Interval}).");
if (Recursive && MaxDepth <= 0)
throw new CommandException($"--max-depth must be greater than 0 (was {MaxDepth}).");
if (DurationSeconds < 0)
throw new CommandException($"--duration must be 0 or a positive number (was {DurationSeconds}).");
NodeId rootNodeId;
try
{
rootNodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var rootNodeId = NodeIdParser.ParseRequired(NodeId);
var targets = new List<(NodeId nodeId, string displayPath)>();
if (Recursive)
{
@@ -1,4 +1,5 @@
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -37,14 +38,23 @@ public class WriteCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
NodeId nodeId;
try
{
nodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var nodeId = NodeIdParser.ParseRequired(NodeId);
// Read current value to determine type for conversion
var currentValue = await service.ReadValueAsync(nodeId, ct);
var typedValue = ValueConverter.ConvertValue(Value, currentValue.Value);
@@ -14,7 +14,13 @@ internal sealed class DefaultApplicationConfigurationFactory : IApplicationConfi
public async Task<ApplicationConfiguration> CreateAsync(ConnectionSettings settings, CancellationToken ct)
{
var storePath = settings.CertificateStorePath;
// Resolve the canonical PKI path lazily on first use so constructing a
// ConnectionSettings instance — including the throwaway copies the client
// service builds per failover attempt — does not touch the filesystem.
// Callers that supply an explicit path override the default.
var storePath = string.IsNullOrWhiteSpace(settings.CertificateStorePath)
? ClientStoragePaths.GetPkiPath()
: settings.CertificateStorePath;
var config = new ApplicationConfiguration
{
@@ -24,9 +24,47 @@ internal sealed class DefaultEndpointDiscovery : IEndpointDiscovery
using var client = DiscoveryClient.Create(new Uri(endpointUrl));
var allEndpoints = client.GetEndpoints(null);
return EndpointSelector.SelectBest(allEndpoints, endpointUrl, requestedMode);
}
}
/// <summary>
/// Pure best-endpoint selection logic, extracted from <see cref="DefaultEndpointDiscovery"/>
/// so it can be unit tested without standing up a real <see cref="DiscoveryClient"/>.
/// </summary>
internal static class EndpointSelector
{
private static readonly ILogger Logger = Log.ForContext(typeof(EndpointSelector));
/// <summary>
/// Picks the best endpoint from the discovery response that matches the requested
/// security mode, preferring <c>Basic256Sha256</c>, and rewrites the endpoint URL
/// host to match the user-supplied URL when the discovery response advertises a
/// different hostname.
/// </summary>
/// <param name="allEndpoints">Endpoints returned by the discovery query, in any order.</param>
/// <param name="endpointUrl">The endpoint URL the operator supplied; supplies the hostname rewrite target.</param>
/// <param name="requestedMode">The requested OPC UA message security mode.</param>
/// <exception cref="InvalidOperationException">
/// Thrown when no endpoint matches <paramref name="requestedMode"/>; the message lists the
/// security mode + policy combinations the server returned so operators can diagnose mismatches.
/// </exception>
public static EndpointDescription SelectBest(
IEnumerable<EndpointDescription> allEndpoints,
string endpointUrl,
MessageSecurityMode requestedMode)
{
ArgumentNullException.ThrowIfNull(allEndpoints);
if (string.IsNullOrWhiteSpace(endpointUrl))
throw new ArgumentException("Endpoint URL must not be null or empty.", nameof(endpointUrl));
// Materialise once so we can both iterate and produce a diagnostic message
// without re-running the underlying discovery enumeration.
var endpoints = allEndpoints.ToList();
EndpointDescription? best = null;
foreach (var ep in allEndpoints)
foreach (var ep in endpoints)
{
if (ep.SecurityMode != requestedMode)
continue;
@@ -37,18 +75,21 @@ internal sealed class DefaultEndpointDiscovery : IEndpointDiscovery
continue;
}
// Prefer Basic256Sha256 when multiple endpoints match the requested mode.
if (ep.SecurityPolicyUri == SecurityPolicies.Basic256Sha256)
best = ep;
}
if (best == null)
{
var available = string.Join(", ", allEndpoints.Select(e => $"{e.SecurityMode}/{e.SecurityPolicyUri}"));
var available = string.Join(", ", endpoints.Select(e => $"{e.SecurityMode}/{e.SecurityPolicyUri}"));
throw new InvalidOperationException(
$"No endpoint found with security mode '{requestedMode}'. Available endpoints: {available}");
}
// Rewrite endpoint URL hostname to match user-supplied hostname
// Rewrite endpoint URL hostname to match user-supplied hostname. Necessary
// when the OPC UA server returns a discovery URL using a different hostname
// (e.g. internal DNS name) than the one the operator routed to.
var serverUri = new Uri(best.EndpointUrl);
var requestedUri = new Uri(endpointUrl);
if (serverUri.Host != requestedUri.Host)
@@ -73,6 +73,17 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
var writeCollection = new WriteValueCollection { writeValue };
var response = await _session.WriteAsync(null, writeCollection, ct);
// A malformed or service-level-faulted response can come back with an empty
// Results collection alongside a service fault. Surface the service result
// (or BadUnexpectedError) rather than letting Results[0] throw
// IndexOutOfRangeException upstream.
if (response.Results == null || response.Results.Count == 0)
{
var serviceResult = response.ResponseHeader?.ServiceResult.Code ?? StatusCodes.BadUnexpectedError;
throw new ServiceResultException(serviceResult,
$"Write response contained no results for node {nodeId}.");
}
return response.Results[0];
}
@@ -143,15 +154,18 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
if (continuationPoint != null)
nodesToRead[0].ContinuationPoint = continuationPoint;
_session.HistoryRead(
// Use the async overload so this method is genuinely asynchronous,
// honors the cancellation token, and does not block the caller's thread
// (which would block the UI dispatcher for client.ui consumers).
var response = await _session.HistoryReadAsync(
null,
new ExtensionObject(details),
TimestampsToReturn.Source,
continuationPoint != null,
nodesToRead,
out var results,
out _);
ct).ConfigureAwait(false);
var results = response.Results;
if (results == null || results.Count == 0)
break;
@@ -186,15 +200,17 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
new HistoryReadValueId { NodeId = nodeId }
};
_session.HistoryRead(
// Use the async overload so the method honors the cancellation token and
// does not block on a synchronous service round-trip.
var response = await _session.HistoryReadAsync(
null,
new ExtensionObject(details),
TimestampsToReturn.Source,
false,
nodesToRead,
out var results,
out _);
ct).ConfigureAwait(false);
var results = response.Results;
var allValues = new List<DataValue>();
if (results != null && results.Count > 0)
@@ -229,7 +245,9 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
{
try
{
if (_session.Connected) _session.Close();
// Use the async overload so the caller does not block on the close
// service round-trip and the cancellation token is honored.
if (_session.Connected) await _session.CloseAsync(ct).ConfigureAwait(false);
}
catch (Exception ex)
{
@@ -270,6 +288,15 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
},
ct);
// An empty Results collection paired with a service fault must surface as
// a ServiceResultException, not an IndexOutOfRangeException from Results[0].
if (result.Results == null || result.Results.Count == 0)
{
var serviceResult = result.ResponseHeader?.ServiceResult.Code ?? StatusCodes.BadUnexpectedError;
throw new ServiceResultException(serviceResult,
$"Call response contained no results for method {methodId} on {objectId}.");
}
var callResult = result.Results[0];
if (StatusCode.IsBad(callResult.StatusCode))
throw new ServiceResultException(callResult.StatusCode);
@@ -96,6 +96,11 @@ public interface IOpcUaClientService : IDisposable
/// <param name="eventId">The event identifier returned by the OPC UA server for the alarm event.</param>
/// <param name="comment">The operator acknowledgment comment to write with the method call.</param>
/// <param name="ct">The cancellation token that aborts the acknowledgment request.</param>
/// <returns>
/// <see cref="StatusCodes.Good"/> on success, or the server's bad <see cref="StatusCode"/>
/// (from the underlying <see cref="ServiceResultException"/>) when the acknowledge call
/// returns a bad result. Other transport-level failures still surface as exceptions.
/// </returns>
Task<StatusCode> AcknowledgeAlarmAsync(string conditionNodeId, byte[] eventId, string comment, CancellationToken ct = default);
/// <summary>
@@ -41,11 +41,13 @@ public sealed class ConnectionSettings
public bool AutoAcceptCertificates { get; set; } = true;
/// <summary>
/// Path to the certificate store. Defaults to a subdirectory under LocalApplicationData
/// resolved via <see cref="ClientStoragePaths"/> so the one-shot legacy-folder migration
/// runs before the path is returned.
/// Path to the certificate store. Defaults to <see cref="string.Empty"/>; the
/// consuming application configuration factory resolves the canonical path via
/// <see cref="ClientStoragePaths.GetPkiPath"/> lazily on first connect, so
/// constructing settings — including the throwaway copies built per failover
/// attempt — does not touch disk or run the legacy-folder migration probe.
/// </summary>
public string CertificateStorePath { get; set; } = ClientStoragePaths.GetPkiPath();
public string CertificateStorePath { get; set; } = string.Empty;
/// <summary>
/// Validates the settings and throws if any required values are missing or invalid.
@@ -353,11 +353,24 @@ public sealed class OpcUaClientService : IOpcUaClientService
: NodeId.Parse(conditionNodeId + ".Condition");
var acknowledgeMethodId = MethodIds.AcknowledgeableConditionType_Acknowledge;
await _session!.CallMethodAsync(
conditionObjId,
acknowledgeMethodId,
[eventId, new LocalizedText(comment)],
ct);
// CallMethodAsync throws ServiceResultException on a bad call result;
// surface that as the returned StatusCode so callers using the documented
// `Task<StatusCode>` contract (e.g. `if (StatusCode.IsBad(result))`) see
// the failure instead of an uncaught exception they did not anticipate.
try
{
await _session!.CallMethodAsync(
conditionObjId,
acknowledgeMethodId,
[eventId, new LocalizedText(comment)],
ct);
}
catch (ServiceResultException ex)
{
Logger.Warning(ex, "Failed to acknowledge alarm on {ConditionId} (status {Status})",
conditionNodeId, ex.StatusCode);
return ex.StatusCode;
}
Logger.Debug("Acknowledged alarm on {ConditionId}", conditionNodeId);
return StatusCodes.Good;
@@ -30,12 +30,6 @@ public partial class DateTimeRangePicker : UserControl
public static readonly StyledProperty<string> EndTextProperty =
AvaloniaProperty.Register<DateTimeRangePicker, string>(nameof(EndText), defaultValue: "");
public static readonly StyledProperty<DateTimeOffset?> MinDateTimeProperty =
AvaloniaProperty.Register<DateTimeRangePicker, DateTimeOffset?>(nameof(MinDateTime));
public static readonly StyledProperty<DateTimeOffset?> MaxDateTimeProperty =
AvaloniaProperty.Register<DateTimeRangePicker, DateTimeOffset?>(nameof(MaxDateTime));
private bool _isUpdating;
public DateTimeRangePicker()
@@ -67,18 +61,6 @@ public partial class DateTimeRangePicker : UserControl
set => SetValue(EndTextProperty, value);
}
public DateTimeOffset? MinDateTime
{
get => GetValue(MinDateTimeProperty);
set => SetValue(MinDateTimeProperty, value);
}
public DateTimeOffset? MaxDateTime
{
get => GetValue(MaxDateTimeProperty);
set => SetValue(MaxDateTimeProperty, value);
}
protected override void OnLoaded(RoutedEventArgs e)
{
base.OnLoaded(e);
@@ -1,4 +1,6 @@
using Avalonia;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
namespace ZB.MOM.WW.OtOpcUa.Client.UI;
@@ -7,8 +9,16 @@ public class Program
[STAThread]
public static void Main(string[] args)
{
BuildAvaloniaApp()
.StartWithClassicDesktopLifetime(args);
ConfigureLogging();
try
{
BuildAvaloniaApp()
.StartWithClassicDesktopLifetime(args);
}
finally
{
Log.CloseAndFlush();
}
}
public static AppBuilder BuildAvaloniaApp()
@@ -18,4 +28,35 @@ public class Program
.WithInterFont()
.LogToTrace();
}
/// <summary>
/// Initializes the Serilog root logger with a console sink + a rolling daily file sink
/// under <c>{LocalAppData}/OtOpcUaClient/logs/</c>. CLAUDE.md mandates Serilog with a
/// rolling daily file sink as the project standard; this is also the only way the swallow
/// blocks in the alarms / subscriptions / redundancy view-models surface a diagnosable
/// trace when an operator hits a problem in the field.
/// </summary>
private static void ConfigureLogging()
{
var logsDir = Path.Combine(ClientStoragePaths.GetRoot(), "logs");
try
{
Directory.CreateDirectory(logsDir);
}
catch
{
// Best-effort; file sink will gracefully fall back if the dir can't be created.
}
Log.Logger = new LoggerConfiguration()
.MinimumLevel.Information()
.Enrich.FromLogContext()
.WriteTo.Console()
.WriteTo.File(
path: Path.Combine(logsDir, "client-ui-.log"),
rollingInterval: RollingInterval.Day,
retainedFileCountLimit: 14,
shared: true)
.CreateLogger();
}
}
@@ -2,6 +2,7 @@ using System.Collections.ObjectModel;
using CommunityToolkit.Mvvm.ComponentModel;
using CommunityToolkit.Mvvm.Input;
using Opc.Ua;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
using ZB.MOM.WW.OtOpcUa.Client.UI.Services;
@@ -13,9 +14,18 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
/// </summary>
public partial class AlarmsViewModel : ObservableObject
{
private static readonly ILogger Logger = Log.ForContext<AlarmsViewModel>();
private readonly IUiDispatcher _dispatcher;
private readonly IOpcUaClientService _service;
/// <summary>
/// Last user-visible status message — set when an alarm subscribe / unsubscribe / refresh
/// operation fails so the shell can surface the diagnostic instead of silently dropping it.
/// Genuine failures are distinguished from "feature not supported" (condition refresh).
/// </summary>
[ObservableProperty] private string? _statusMessage;
[ObservableProperty] private int _interval = 1000;
[ObservableProperty]
@@ -95,19 +105,25 @@ public partial class AlarmsViewModel : ObservableObject
await _service.SubscribeAlarmsAsync(sourceNodeId, Interval);
IsSubscribed = true;
StatusMessage = null;
try
{
await _service.RequestConditionRefreshAsync();
}
catch
catch (Exception refreshEx)
{
// Refresh not supported
// Condition refresh is optional on the server side — log at info level and surface
// a soft notice rather than a hard failure so the operator can tell apart "server
// does not advertise refresh" from a genuine subscribe failure.
Logger.Information(refreshEx, "RequestConditionRefresh not supported by server");
StatusMessage = "Condition refresh not supported by server (subscribed).";
}
}
catch
catch (Exception ex)
{
// Subscribe failed
Logger.Warning(ex, "SubscribeAlarms failed for {Source}", MonitoredNodeIdText ?? "(all)");
StatusMessage = $"Subscribe to alarms failed: {ex.Message}";
}
}
@@ -123,10 +139,12 @@ public partial class AlarmsViewModel : ObservableObject
{
await _service.UnsubscribeAlarmsAsync();
IsSubscribed = false;
StatusMessage = null;
}
catch
catch (Exception ex)
{
// Unsubscribe failed
Logger.Warning(ex, "UnsubscribeAlarms failed");
StatusMessage = $"Unsubscribe alarms failed: {ex.Message}";
}
}
@@ -136,10 +154,14 @@ public partial class AlarmsViewModel : ObservableObject
try
{
await _service.RequestConditionRefreshAsync();
StatusMessage = null;
}
catch
catch (Exception ex)
{
// Refresh failed
// Same as the subscribe-time fallback: refresh is server-side optional. Information-
// level log + soft status so the operator sees why an explicit refresh did nothing.
Logger.Information(ex, "RequestConditionRefresh not supported by server");
StatusMessage = "Condition refresh not supported by server.";
}
}
@@ -189,19 +211,22 @@ public partial class AlarmsViewModel : ObservableObject
await _service.SubscribeAlarmsAsync(nodeId, Interval);
IsSubscribed = true;
StatusMessage = null;
try
{
await _service.RequestConditionRefreshAsync();
}
catch
catch (Exception refreshEx)
{
// Refresh not supported
Logger.Information(refreshEx, "RequestConditionRefresh not supported by server (restore path)");
StatusMessage = "Condition refresh not supported by server (restored subscription).";
}
}
catch
catch (Exception ex)
{
// Subscribe failed
Logger.Warning(ex, "RestoreAlarmSubscription failed for {Source}", sourceNodeId);
StatusMessage = $"Restore alarm subscription failed: {ex.Message}";
}
}
@@ -1,6 +1,7 @@
using System.Collections.ObjectModel;
using CommunityToolkit.Mvvm.ComponentModel;
using CommunityToolkit.Mvvm.Input;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
using ZB.MOM.WW.OtOpcUa.Client.UI.Services;
@@ -12,6 +13,8 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
/// </summary>
public partial class MainWindowViewModel : ObservableObject, IDisposable
{
private static readonly ILogger Logger = Log.ForContext<MainWindowViewModel>();
private readonly IUiDispatcher _dispatcher;
private readonly IOpcUaClientServiceFactory _factory;
private readonly ISettingsService _settingsService;
@@ -137,6 +140,15 @@ public partial class MainWindowViewModel : ObservableObject, IDisposable
{
if (args.PropertyName == nameof(AlarmsViewModel.ActiveAlarmCount))
_dispatcher.Post(() => ActiveAlarmCount = Alarms.ActiveAlarmCount);
else if (args.PropertyName == nameof(AlarmsViewModel.StatusMessage)
&& !string.IsNullOrEmpty(Alarms.StatusMessage))
_dispatcher.Post(() => StatusMessage = Alarms.StatusMessage!);
};
Subscriptions.PropertyChanged += (_, args) =>
{
if (args.PropertyName == nameof(SubscriptionsViewModel.StatusMessage)
&& !string.IsNullOrEmpty(Subscriptions.StatusMessage))
_dispatcher.Post(() => StatusMessage = Subscriptions.StatusMessage!);
};
History = new HistoryViewModel(_service, _dispatcher);
@@ -244,15 +256,17 @@ public partial class MainWindowViewModel : ObservableObject, IDisposable
SessionLabel = $"{info.ServerName} | Session: {info.SessionName} ({info.SessionId})";
});
// Load redundancy info
// Load redundancy info — the server may not implement the redundancy facet, in which
// case we leave RedundancyInfo null but log so a field diagnosis can tell the difference
// between "facet not advertised" and "facet errored". The connection itself stays up.
try
{
var redundancy = await _service!.GetRedundancyInfoAsync();
_dispatcher.Post(() => RedundancyInfo = redundancy);
}
catch
catch (Exception redundancyEx)
{
// Redundancy info not available
Logger.Information(redundancyEx, "GetRedundancyInfo unavailable on this server");
}
// Load root nodes
@@ -2,6 +2,7 @@ using System.Collections.ObjectModel;
using CommunityToolkit.Mvvm.ComponentModel;
using CommunityToolkit.Mvvm.Input;
using Opc.Ua;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
using ZB.MOM.WW.OtOpcUa.Client.UI.Services;
@@ -13,9 +14,17 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
/// </summary>
public partial class SubscriptionsViewModel : ObservableObject
{
private static readonly ILogger Logger = Log.ForContext<SubscriptionsViewModel>();
private readonly IUiDispatcher _dispatcher;
private readonly IOpcUaClientService _service;
/// <summary>
/// Last user-visible status message — set when a subscribe/unsubscribe operation fails so the
/// shell can surface the diagnostic instead of silently dropping the error. Cleared on success.
/// </summary>
[ObservableProperty] private string? _statusMessage;
[ObservableProperty]
[NotifyCanExecuteChangedFor(nameof(AddSubscriptionCommand))]
[NotifyCanExecuteChangedFor(nameof(RemoveSubscriptionCommand))]
@@ -85,11 +94,13 @@ public partial class SubscriptionsViewModel : ObservableObject
{
ActiveSubscriptions.Add(new SubscriptionItemViewModel(nodeIdStr, interval));
SubscriptionCount = ActiveSubscriptions.Count;
StatusMessage = null;
});
}
catch
catch (Exception ex)
{
// Subscription failed; no item added
Logger.Warning(ex, "AddSubscription failed for {NodeId}", nodeIdStr);
_dispatcher.Post(() => StatusMessage = $"Subscribe failed for {nodeIdStr}: {ex.Message}");
}
}
@@ -116,9 +127,11 @@ public partial class SubscriptionsViewModel : ObservableObject
_dispatcher.Post(() => ActiveSubscriptions.Remove(item));
}
catch
catch (Exception ex)
{
// Unsubscribe failed for this item; continue with others
Logger.Warning(ex, "Unsubscribe failed for {NodeId}", item.NodeId);
_dispatcher.Post(() => StatusMessage = $"Unsubscribe failed for {item.NodeId}: {ex.Message}");
// Continue with the other items in the batch.
}
}
@@ -146,11 +159,13 @@ public partial class SubscriptionsViewModel : ObservableObject
{
ActiveSubscriptions.Add(new SubscriptionItemViewModel(nodeIdStr, intervalMs));
SubscriptionCount = ActiveSubscriptions.Count;
StatusMessage = null;
});
}
catch
catch (Exception ex)
{
// Subscription failed
Logger.Warning(ex, "AddSubscriptionForNode failed for {NodeId}", nodeIdStr);
_dispatcher.Post(() => StatusMessage = $"Subscribe failed for {nodeIdStr}: {ex.Message}");
}
}
@@ -186,9 +201,10 @@ public partial class SubscriptionsViewModel : ObservableObject
foreach (var child in children)
await AddSubscriptionRecursiveAsync(child.NodeId, child.NodeClass, intervalMs, maxDepth, currentDepth + 1);
}
catch
catch (Exception ex)
{
// Browse failed for this node; skip it
Logger.Warning(ex, "Recursive browse failed for {NodeId}; skipping subtree", nodeIdStr);
_dispatcher.Post(() => StatusMessage = $"Browse failed for {nodeIdStr}: {ex.Message}");
}
}
@@ -78,7 +78,7 @@
<TextBox Text="{Binding CertificateStorePath}"
Width="370"
IsReadOnly="True"
Watermark="(default: AppData/LmxOpcUaClient/pki)" />
Watermark="(default: AppData/OtOpcUaClient/pki)" />
<Button Name="BrowseCertPathButton"
Content="..."
Width="30"
@@ -2,6 +2,7 @@ using System.ComponentModel;
using System.Reflection;
using Avalonia.Controls;
using Avalonia.Interactivity;
using Avalonia.Platform.Storage;
using SkiaSharp;
using Svg.Skia;
using ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
@@ -126,15 +127,34 @@ public partial class MainWindow : Window
{
if (DataContext is not MainWindowViewModel vm) return;
var dialog = new OpenFolderDialog
var topLevel = TopLevel.GetTopLevel(this);
if (topLevel == null) return;
IStorageFolder? startLocation = null;
if (!string.IsNullOrEmpty(vm.CertificateStorePath))
{
try
{
startLocation = await topLevel.StorageProvider.TryGetFolderFromPathAsync(vm.CertificateStorePath);
}
catch
{
// Best-effort: if the existing path can't be resolved (missing/permission), open the dialog without it.
}
}
var folders = await topLevel.StorageProvider.OpenFolderPickerAsync(new FolderPickerOpenOptions
{
Title = "Select Certificate Store Folder",
Directory = vm.CertificateStorePath
};
AllowMultiple = false,
SuggestedStartLocation = startLocation
});
var result = await dialog.ShowAsync(this);
if (!string.IsNullOrEmpty(result))
vm.CertificateStorePath = result;
if (folders.Count == 0) return;
var picked = folders[0].TryGetLocalPath();
if (!string.IsNullOrEmpty(picked))
vm.CertificateStorePath = picked;
}
protected override void OnClosing(WindowClosingEventArgs e)
@@ -19,6 +19,7 @@
<PackageReference Include="CommunityToolkit.Mvvm" Version="8.4.0"/>
<PackageReference Include="Serilog" Version="4.2.0"/>
<PackageReference Include="Serilog.Sinks.Console" Version="6.0.0"/>
<PackageReference Include="Serilog.Sinks.File" Version="7.0.0"/>
</ItemGroup>
<ItemGroup>
@@ -48,6 +48,71 @@ public sealed class ScriptedAlarmEngine : IDisposable
// snapshot enumeration safe. The only write shapes are indexer-set and Clear,
// both of which ConcurrentDictionary supports atomically. (Core.ScriptedAlarms-001)
private readonly ConcurrentDictionary<string, AlarmState> _alarms = new(StringComparer.Ordinal);
/// <summary>
/// Per-alarm reusable evaluation scratch. The read-cache dictionary and the
/// <see cref="AlarmPredicateContext"/> instance are both allocated once per
/// alarm (on first evaluation) and reused across every subsequent re-eval —
/// the hot path no longer allocates a fresh dictionary + context per upstream
/// tag change. Safe because <see cref="EvaluatePredicateToStateAsync"/> only
/// runs under <see cref="_evalGate"/>, which serialises every evaluation:
/// two threads can never observe the same scratch in a half-refilled state.
/// Cleared in <see cref="LoadAsync"/> alongside <see cref="_alarms"/>.
/// (Core.ScriptedAlarms-009)
/// </summary>
private readonly ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId =
new(StringComparer.Ordinal);
/// <summary>
/// Compile cache for every alarm predicate. Routes <see cref="LoadAsync"/>'s
/// <see cref="ScriptEvaluator{TContext, TResult}.Compile"/> calls through the
/// cache so the collectible <see cref="System.Runtime.Loader.AssemblyLoadContext"/>
/// each compile produces is actually disposed on the publish-replace path
/// (Core.Scripting-016): the cache's <see cref="CompiledScriptCache{TContext, TResult}.Clear"/>
/// disposes every materialised evaluator before dropping its dictionary entry,
/// so a config-publish releases the prior generation's ALCs and the per-publish
/// accretion the Core.Scripting-008 fix targeted is actually freed in production.
/// Pre-fix the engine called <c>ScriptEvaluator.Compile</c> directly, which left
/// the ALCs rooted until the process exited — defeating -008 on the real path.
/// </summary>
private readonly CompiledScriptCache<AlarmPredicateContext, bool> _compileCache = new();
/// <summary>
/// Test-only diagnostic: returns the per-alarm scratch read-cache dictionary
/// if one has been allocated, else null. Used by Core.ScriptedAlarms-009
/// regression tests to assert the scratch is reused across evaluations
/// (two reads return the same instance).
/// </summary>
/// <remarks>
/// <b>Synchronization:</b> the returned <see cref="IReadOnlyDictionary{TKey, TValue}"/>
/// is the engine's live mutable read-cache. It is refilled in place by
/// <c>RefillReadCache</c> on every predicate evaluation, under <c>_evalGate</c>.
/// Test callers MUST NOT iterate this dictionary while the engine is
/// actively evaluating (i.e. while an upstream change is mid-flight); the
/// refill clears the dict before repopulating and a concurrent iterator
/// would observe torn / partial state. Safe uses are: reference-identity
/// comparisons (e.g. asserting the same instance is reused across calls),
/// and single-key reads against an engine that has quiesced after a
/// deterministic upstream push. Anything more involved should snapshot a
/// copy under the gate. (Core.ScriptedAlarms-013.)
/// </remarks>
internal IReadOnlyDictionary<string, DataValueSnapshot>? TryGetScratchReadCacheForTest(string alarmId)
=> _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.ReadCache : null;
/// <summary>
/// Test-only diagnostic: returns the per-alarm <see cref="AlarmPredicateContext"/>
/// if one has been allocated, else null. Companion to
/// <see cref="TryGetScratchReadCacheForTest"/>.
/// </summary>
/// <remarks>
/// <b>Synchronization:</b> the returned context wraps the same live
/// read-cache as <see cref="TryGetScratchReadCacheForTest"/> — the same
/// "don't iterate during an in-flight evaluation" caveat applies. Safe
/// for reference-identity assertions on a quiesced engine.
/// (Core.ScriptedAlarms-013.)
/// </remarks>
internal AlarmPredicateContext? TryGetScratchContextForTest(string alarmId)
=> _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.Context : null;
private readonly ConcurrentDictionary<string, DataValueSnapshot> _valueCache
= new(StringComparer.Ordinal);
private readonly Dictionary<string, HashSet<string>> _alarmsReferencing
@@ -108,6 +173,14 @@ public sealed class ScriptedAlarmEngine : IDisposable
UnsubscribeFromUpstream();
_alarms.Clear();
_alarmsReferencing.Clear();
// Drop the prior generation's per-alarm scratch buffers — definitions may
// have changed (different Inputs, different Logger), so any reuse would be
// unsafe. (Core.ScriptedAlarms-009)
_scratchByAlarmId.Clear();
// Dispose every compiled-predicate ALC from the prior generation BEFORE we
// recompile this one. Skipping this is what made Core.Scripting-008 a
// no-op in production. (Core.Scripting-016)
_compileCache.Clear();
var compileFailures = new List<string>();
foreach (var def in definitions)
@@ -122,7 +195,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
continue;
}
var evaluator = ScriptEvaluator<AlarmPredicateContext, bool>.Compile(def.PredicateScriptSource);
// Route through CompiledScriptCache so the emitted assembly's
// collectible ALC participates in publish-replace cleanup.
// (Core.Scripting-016)
var evaluator = _compileCache.GetOrCompile(def.PredicateScriptSource);
var timed = new TimedScriptEvaluator<AlarmPredicateContext, bool>(evaluator, _scriptTimeout);
var logger = _loggerFactory.Create(def.AlarmId);
@@ -354,7 +430,13 @@ public sealed class ScriptedAlarmEngine : IDisposable
AlarmState state, AlarmConditionState seed, DateTime nowUtc, CancellationToken ct,
List<ScriptedAlarmEvent>? pendingEmissions = null)
{
var inputs = BuildReadCache(state.Inputs);
// Look up (or lazily allocate) the per-alarm scratch and refill its read cache
// in place. The dictionary + context survive across evaluations so the hot path
// no longer allocates per upstream tag change. (Core.ScriptedAlarms-009)
var scratch = _scratchByAlarmId.GetOrAdd(
state.Definition.AlarmId,
_ => new AlarmScratch(state.Inputs, state.Logger, _clock));
RefillReadCache(scratch.ReadCache, state.Inputs);
// Cold-start guard — skip the predicate when any referenced upstream tag has no
// cached value yet (the upstream subscription hasn't delivered its first push).
@@ -362,9 +444,9 @@ public sealed class ScriptedAlarmEngine : IDisposable
// every tick until the cache fills, spamming the log with identical stack traces.
// Bad quality is treated the same: the input isn't available at the predicate's
// expected type, so the only defensible move is to hold the prior condition state.
if (!AreInputsReady(inputs)) return seed;
if (!AreInputsReady(scratch.ReadCache)) return seed;
var context = new AlarmPredicateContext(inputs, state.Logger, _clock);
var context = scratch.Context;
bool predicateTrue;
try
@@ -399,12 +481,20 @@ public sealed class ScriptedAlarmEngine : IDisposable
return result.State;
}
private IReadOnlyDictionary<string, DataValueSnapshot> BuildReadCache(IReadOnlySet<string> inputs)
/// <summary>
/// Refill <paramref name="cache"/> in place from <c>_valueCache</c>, falling
/// back to a synchronous <c>ITagUpstreamSource.ReadTag</c> for paths whose
/// first upstream push hasn't arrived yet. The dictionary is cleared and
/// repopulated under <c>_evalGate</c> so no concurrent reader can observe
/// a partial state. Replaces the old <c>BuildReadCache</c> which allocated a
/// fresh dictionary every call (Core.ScriptedAlarms-009).
/// </summary>
private void RefillReadCache(
Dictionary<string, DataValueSnapshot> cache, IReadOnlySet<string> inputs)
{
var d = new Dictionary<string, DataValueSnapshot>(StringComparer.Ordinal);
cache.Clear();
foreach (var p in inputs)
d[p] = _valueCache.TryGetValue(p, out var v) ? v : _upstream.ReadTag(p);
return d;
cache[p] = _valueCache.TryGetValue(p, out var v) ? v : _upstream.ReadTag(p);
}
/// <summary>
@@ -596,12 +686,24 @@ public sealed class ScriptedAlarmEngine : IDisposable
}
}
// Do NOT clear _alarms here: Timer.Dispose() does not wait for in-flight callbacks,
// so a ShelvingCheckAsync or ReevaluateAsync can still be running inside _evalGate.
// Those paths now re-check _disposed after acquiring the gate and bail out safely.
// Clearing _alarms outside the gate would race concurrent reads and is unnecessary
// (the whole object is being discarded). (Core.ScriptedAlarms-005)
// Safe to clear here: the Task.WhenAll drain above guaranteed no
// ReevaluateAsync / ShelvingCheckAsync is mid-flight, and _disposed=true
// prevents new background work from being queued (OnUpstreamChange bails on
// line 334). Pre-Core.Scripting-016 the comment said "Do NOT clear _alarms",
// but that was when the engine called ScriptEvaluator.Compile directly and
// held the script ALCs through _alarms→AlarmState→TimedScriptEvaluator
// forever — leaving them rooted defeated the -008 collectible-ALC unload.
// Clearing now drops the delegate references so the cache's Dispose call
// below can actually unload the emitted assemblies. (Core.ScriptedAlarms-005
// re-evaluated under -016.)
_alarms.Clear();
_alarmsReferencing.Clear();
_scratchByAlarmId.Clear();
// Dispose every compiled-predicate ALC so the engine's shutdown actually
// releases the emitted assemblies. The drain above ensures no evaluator is
// mid-call; CompiledScriptCache.Dispose internally guards against use-after-
// dispose. (Core.Scripting-016)
_compileCache.Dispose();
}
private sealed record AlarmState(
@@ -611,6 +713,37 @@ public sealed class ScriptedAlarmEngine : IDisposable
IReadOnlyList<string> TemplateTokens,
ILogger Logger,
AlarmConditionState Condition);
/// <summary>
/// Per-alarm reusable evaluation scratch. The <see cref="ReadCache"/> dictionary
/// is the same instance across every evaluation of the owning alarm — it is
/// cleared and refilled in <see cref="ScriptedAlarmEngine.RefillReadCache"/> on
/// each call. <see cref="Context"/> wraps that dictionary by reference, so a
/// refilled <see cref="ReadCache"/> is what the predicate's
/// <c>ctx.GetTag(path)</c> calls observe. (Core.ScriptedAlarms-009)
/// </summary>
/// <remarks>
/// Reuse is safe because <see cref="ScriptedAlarmEngine"/> serialises every
/// evaluation under <c>_evalGate</c>: two threads can never observe the same
/// scratch in a half-refilled state.
/// </remarks>
private sealed class AlarmScratch
{
public Dictionary<string, DataValueSnapshot> ReadCache { get; }
public AlarmPredicateContext Context { get; }
public AlarmScratch(IReadOnlySet<string> inputs, ILogger logger, Func<DateTime> clock)
{
// Pre-size to the expected input count so the first refill doesn't pay the
// dictionary-grow cost. The dictionary auto-grows if Inputs changes (it
// cannot under the current contract — Inputs is fixed at LoadAsync — but
// pre-sizing is defensive against future changes).
ReadCache = new Dictionary<string, DataValueSnapshot>(inputs.Count, StringComparer.Ordinal);
// Context holds the read cache by reference. Refilling the dictionary
// updates what the context (and the script) observes.
Context = new AlarmPredicateContext(ReadCache, logger, clock);
}
}
}
/// <summary>
@@ -30,11 +30,20 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
/// bounded by config DB (typically low thousands). If that changes in v3, add an
/// LRU eviction policy — the API stays the same.
/// </para>
/// <para>
/// <b>Lifecycle:</b> compiled scripts hold a collectible
/// <see cref="System.Runtime.Loader.AssemblyLoadContext"/> per evaluator
/// (Core.Scripting-008 fix). <see cref="Clear"/> disposes every materialised
/// evaluator before dropping its dictionary entry so the emitted assemblies are
/// eligible for GC immediately after a publish. <see cref="Dispose"/> drops the
/// cache itself for graceful server shutdown.
/// </para>
/// </remarks>
public sealed class CompiledScriptCache<TContext, TResult>
public sealed class CompiledScriptCache<TContext, TResult> : IDisposable
where TContext : ScriptContext
{
private readonly ConcurrentDictionary<string, Lazy<ScriptEvaluator<TContext, TResult>>> _cache = new();
private bool _disposed;
/// <summary>
/// Return the compiled evaluator for <paramref name="scriptSource"/>, compiling
@@ -46,6 +55,7 @@ public sealed class CompiledScriptCache<TContext, TResult>
public ScriptEvaluator<TContext, TResult> GetOrCompile(string scriptSource)
{
if (scriptSource is null) throw new ArgumentNullException(nameof(scriptSource));
if (_disposed) throw new ObjectDisposedException(nameof(CompiledScriptCache<TContext, TResult>));
var key = HashSource(scriptSource);
var lazy = _cache.GetOrAdd(key, _ => new Lazy<ScriptEvaluator<TContext, TResult>>(
@@ -72,13 +82,71 @@ public sealed class CompiledScriptCache<TContext, TResult>
/// <summary>Current entry count. Exposed for Admin UI diagnostics / tests.</summary>
public int Count => _cache.Count;
/// <summary>Drop every cached compile. Used on config generation publish + tests.</summary>
public void Clear() => _cache.Clear();
/// <summary>
/// Drop every cached compile. Used on config generation publish + tests.
/// Disposes each materialised evaluator before removing it so its collectible
/// <see cref="System.Runtime.Loader.AssemblyLoadContext"/> unloads and the
/// emitted script assembly becomes eligible for GC (Core.Scripting-008).
/// </summary>
/// <remarks>
/// Safe to call after <see cref="Dispose"/> — the operation is idempotent.
/// <see cref="Dispose"/> sets <c>_disposed = true</c> before invoking this
/// method (so callers see the post-Dispose guard on <see cref="GetOrCompile"/>),
/// but this method itself MUST run to completion so the Dispose-triggered
/// drain actually unloads every materialised evaluator's ALC. (Core.Scripting-016
/// uncovered this — a previous Clear-aborts-when-disposed guard silently
/// skipped the entire drain on Dispose, leaving emitted assemblies rooted.)
/// </remarks>
public void Clear()
{
// Snapshot (key, value) pairs and remove with the value-scoped
// TryRemove(KeyValuePair<,>) overload — same shape as the
// Core.Scripting-006 fix in GetOrCompile's catch block. A concurrent
// GetOrCompile re-add that hashes to the same key between our snapshot
// and the TryRemove inserts a *different* Lazy reference; the value-
// scoped removal sees the mismatch and leaves the fresh entry intact
// (instead of evicting + disposing it while the concurrent caller
// still holds it). The fresh evaluator and its ALC stay live for the
// concurrent caller. (Core.Scripting-014.)
foreach (var entry in _cache.ToArray())
{
if (_cache.TryRemove(entry))
DisposeLazyIfMaterialised(entry.Value);
}
}
/// <summary>True when the exact source has been compiled at least once + is still cached.</summary>
public bool Contains(string scriptSource)
=> _cache.ContainsKey(HashSource(scriptSource));
/// <summary>
/// Drop the cache and dispose every materialised evaluator. After disposal
/// <see cref="GetOrCompile"/> throws <see cref="ObjectDisposedException"/>.
/// </summary>
public void Dispose()
{
if (_disposed) return;
_disposed = true;
Clear();
}
private static void DisposeLazyIfMaterialised(Lazy<ScriptEvaluator<TContext, TResult>> lazy)
{
// IsValueCreated is false for a faulted Lazy too, so the catch in GetOrCompile
// has already taken care of failed compiles — there's no evaluator to dispose.
if (!lazy.IsValueCreated) return;
try
{
lazy.Value.Dispose();
}
catch
{
// Dispose is best-effort here: an evaluator disposal failure would leak its
// ALC but mustn't prevent the rest of the cache from clearing. The ALC
// unload itself is exception-free in practice; this is defensive.
}
}
private static string HashSource(string source)
{
var bytes = Encoding.UTF8.GetBytes(source);
@@ -72,6 +72,11 @@ public static class ForbiddenTypeAnalyzer
// a Task fan-out outlives the evaluation timeout entirely
// (Core.Scripting-003).
"System.Runtime.InteropServices",
"System.Runtime.Loader", // AssemblyLoadContext + AssemblyDependencyResolver —
// arbitrary DLL load into the host process
// (Core.Scripting-012). Namespace-prefix rather than
// type-granular so future BCL additions to this
// namespace are denied by default.
"Microsoft.Win32", // registry
];
@@ -113,6 +118,24 @@ public static class ForbiddenTypeAnalyzer
// target it without blocking those legitimate types. Denied type-granularly here.
// (Core.Scripting-010.)
"System.Threading.Thread",
// Core.Scripting-012 — broadening the references list to the BCL trusted-platform-
// assemblies set (Core.Scripting-008 follow-up) re-exposed two background-work
// vectors the original deny-list missed. Both live in System.Threading (shared
// with allowed sync primitives like CancellationToken / SemaphoreSlim), so they
// must be denied type-granularly:
//
// System.Threading.ThreadPool — QueueUserWorkItem / UnsafeQueueUserWorkItem
// re-introduce the background-fanout threat
// Core.Scripting-003 closed against
// System.Threading.Tasks.
// System.Threading.Timer — Timer(callback, …) schedules unbounded work
// that outlives the per-evaluation timeout.
//
// System.Runtime.Loader.AssemblyLoadContext is also covered, but at the namespace-
// prefix level above (System.Runtime.Loader) so future BCL additions to that
// namespace are denied by default.
"System.Threading.ThreadPool",
"System.Threading.Timer",
];
/// <summary>
@@ -1,75 +1,420 @@
using Microsoft.CodeAnalysis.CSharp.Scripting;
using Microsoft.CodeAnalysis.Scripting;
using System.Reflection;
using System.Runtime.Loader;
using System.Text;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
/// <summary>
/// Compiles + runs user scripts against a <see cref="ScriptContext"/> subclass. Core
/// evaluator — no caching, no timeout, no logging side-effects yet (those land in
/// Stream A.3, A.4, A.5 respectively). Stream B + C wrap this with the dependency
/// scheduler + alarm state machine.
/// evaluator — no caching, no timeout, no logging side-effects (those land in
/// <see cref="CompiledScriptCache{TContext, TResult}"/>,
/// <see cref="TimedScriptEvaluator{TContext, TResult}"/>, and
/// <see cref="ScriptLogCompanionSink"/> respectively).
/// </summary>
/// <remarks>
/// <para>
/// Scripts are compiled against <see cref="ScriptGlobals{TContext}"/> so the
/// context member is named <c>ctx</c> in the script, matching the
/// <see cref="DependencyExtractor"/>'s walker and the Admin UI type stub.
/// Scripts are wrapped in a synthesized <c>CompiledScript.Run(globals)</c> method
/// and compiled via <see cref="CSharpCompilation"/> into a regular .NET assembly
/// that is loaded into a <b>collectible</b>
/// <see cref="AssemblyLoadContext"/>. The collectible ALC is the fix for
/// Core.Scripting-008: per-publish recompile accretion was previously unbounded
/// because Roslyn's <c>CSharpScript.CreateDelegate</c> emits into the default ALC
/// (non-collectible); now <see cref="Dispose"/> unloads the entire ALC and the
/// emitted assembly becomes eligible for GC.
/// </para>
/// <para>
/// Compile pipeline is a three-step gate: (1) Roslyn compile — catches syntax
/// errors + type-resolution failures, throws <see cref="CompilationErrorException"/>;
/// (2) <see cref="ForbiddenTypeAnalyzer"/> runs against the semantic model —
/// catches sandbox escapes that slipped past reference restrictions due to .NET's
/// type forwarding, throws <see cref="ScriptSandboxViolationException"/>; (3)
/// delegate creation — throws at this layer only for internal Roslyn bugs, not
/// user error.
/// Compile pipeline is a three-step gate, unchanged in intent from the legacy
/// <c>CSharpScript</c> path: (1) Roslyn parse + compile against the
/// <see cref="ScriptSandbox"/> allow-list — catches syntax errors, unresolved
/// types (the sandbox's first line of defense), and most type-resolution
/// failures, throwing <see cref="CompilationErrorException"/>; (2)
/// <see cref="ForbiddenTypeAnalyzer"/> runs against the semantic model — catches
/// sandbox escapes that slipped past reference restrictions due to .NET's type
/// forwarding, throwing <see cref="ScriptSandboxViolationException"/>; (3) emit
/// to an in-memory PE stream + load into the collectible ALC — throws at this
/// layer only for internal Roslyn bugs, not user error.
/// </para>
/// <para>
/// Runtime exceptions thrown from user code propagate unwrapped. The virtual-tag
/// engine (Stream B) catches them per-tag + maps to <c>BadInternalError</c>
/// quality per Phase 7 decision #11 this layer doesn't swallow anything so
/// tests can assert on the original exception type.
/// engine catches them per-tag and maps to <c>BadInternalError</c> quality
/// per Phase 7 decision #11; this layer doesn't swallow anything so tests can
/// assert on the original exception type.
/// </para>
/// <para>
/// Scripts are expected to be statement bodies that end with an explicit
/// <c>return …;</c> — the wrapper provides only the surrounding method body, so
/// the script's final-expression-yields-result behavior of legacy
/// <c>CSharpScript</c> is replaced by ordinary C# method semantics. Every script
/// in the existing test corpus already uses explicit <c>return</c>; this is a
/// documented authoring convention.
/// </para>
/// </remarks>
public sealed class ScriptEvaluator<TContext, TResult>
public sealed class ScriptEvaluator<TContext, TResult> : IDisposable
where TContext : ScriptContext
{
private readonly ScriptRunner<TResult> _runner;
private readonly ScriptAssemblyLoadContext _alc;
private readonly Func<ScriptGlobals<TContext>, TResult> _func;
private bool _disposed;
private ScriptEvaluator(ScriptRunner<TResult> runner)
private ScriptEvaluator(ScriptAssemblyLoadContext alc, Func<ScriptGlobals<TContext>, TResult> func)
{
_runner = runner;
_alc = alc;
_func = func;
}
public static ScriptEvaluator<TContext, TResult> Compile(string scriptSource)
{
if (scriptSource is null) throw new ArgumentNullException(nameof(scriptSource));
var options = ScriptSandbox.Build(typeof(TContext));
var script = CSharpScript.Create<TResult>(
code: scriptSource,
options: options,
globalsType: typeof(ScriptGlobals<TContext>));
var sandbox = ScriptSandbox.Build(typeof(TContext));
// Step 1 — Roslyn compile. Throws CompilationErrorException on syntax / type errors.
var diagnostics = script.Compile();
// Step 1 — synthesize a wrapper class around the script body and parse it. The
// wrapper's `Run` method is what we invoke at runtime; the user's source is
// pasted in as its body so explicit `return` semantics apply.
var wrapperSource = BuildWrapperSource(scriptSource, sandbox.Imports);
var syntaxTree = CSharpSyntaxTree.ParseText(wrapperSource);
// Step 2forbidden-type semantic analysis. Defense-in-depth against reference-list
// leaks due to type forwarding.
var rejections = ForbiddenTypeAnalyzer.Analyze(script.GetCompilation());
// Step 1adefend against wrapper-source injection (Core.Scripting-013).
// A script body of `return 0; } public static int Evil() { return 0;` would
// close the synthesized `Run` method early, declare a sibling `Evil` method
// inside the synthesized `CompiledScript` class, and leave the wrapper's
// trailing `}` balanced. ForbiddenTypeAnalyzer still walks the injected
// members so this isn't a direct sandbox escape, but it relaxes the
// documented "method body" authoring contract and widens the analyzer's
// surface. Reject by requiring that the parsed `CompiledScript` class
// contains exactly one member declaration (the `Run` method).
EnforceSingleRunMember(syntaxTree);
// Step 2 — Roslyn compile against the sandbox allow-list. Anything not in the
// references set is unresolved and produces a compiler error.
var assemblyName = "ZB.MOM.WW.OtOpcUa.Core.Scripting.Compiled." +
Guid.NewGuid().ToString("N");
var compileOptions = new CSharpCompilationOptions(
OutputKind.DynamicallyLinkedLibrary,
optimizationLevel: OptimizationLevel.Release,
allowUnsafe: false,
// Don't generate XML doc warnings for the synthesized wrapper.
warningLevel: 4,
nullableContextOptions: NullableContextOptions.Enable);
var compilation = CSharpCompilation.Create(
assemblyName,
syntaxTrees: new[] { syntaxTree },
references: sandbox.References,
options: compileOptions);
var compileDiagnostics = compilation.GetDiagnostics();
var compileErrors = compileDiagnostics
.Where(d => d.Severity == DiagnosticSeverity.Error)
.ToArray();
if (compileErrors.Length > 0)
throw new CompilationErrorException(compileErrors);
// Step 3 — forbidden-type semantic analysis. Defense-in-depth against
// reference-list leaks due to type forwarding.
var rejections = ForbiddenTypeAnalyzer.Analyze(compilation);
if (rejections.Count > 0)
throw new ScriptSandboxViolationException(rejections);
// Step 3materialize the callable delegate.
var runner = script.CreateDelegate();
return new ScriptEvaluator<TContext, TResult>(runner);
// Step 4emit to an in-memory PE stream and load into a collectible ALC.
using var peStream = new MemoryStream();
var emitResult = compilation.Emit(peStream);
if (!emitResult.Success)
{
var emitErrors = emitResult.Diagnostics
.Where(d => d.Severity == DiagnosticSeverity.Error)
.ToArray();
throw new CompilationErrorException(emitErrors);
}
peStream.Position = 0;
var alc = new ScriptAssemblyLoadContext(assemblyName);
Assembly assembly;
try
{
assembly = alc.LoadFromStream(peStream);
}
catch
{
// Failed to load — drop the ALC so we don't leak a half-initialised one.
alc.Unload();
throw;
}
// Step 5 — resolve the wrapper's Run method and bind a typed delegate. The
// wrapper source above puts the type in this exact namespace + class — keep the
// names in sync with BuildWrapperSource.
Func<ScriptGlobals<TContext>, TResult> func;
try
{
var wrapperType = assembly.GetType(
"ZB.MOM.WW.OtOpcUa.Core.Scripting.Compiled.CompiledScript",
throwOnError: true)!;
var runMethod = wrapperType.GetMethod(
"Run",
BindingFlags.Public | BindingFlags.Static)
?? throw new InvalidOperationException(
"Synthesized wrapper is missing the public static Run method.");
func = (Func<ScriptGlobals<TContext>, TResult>)Delegate.CreateDelegate(
typeof(Func<ScriptGlobals<TContext>, TResult>), runMethod);
}
catch
{
alc.Unload();
throw;
}
return new ScriptEvaluator<TContext, TResult>(alc, func);
}
/// <summary>Run against an already-constructed context.</summary>
public Task<TResult> RunAsync(TContext context, CancellationToken ct = default)
{
if (_disposed) throw new ObjectDisposedException(nameof(ScriptEvaluator<TContext, TResult>));
if (context is null) throw new ArgumentNullException(nameof(context));
ct.ThrowIfCancellationRequested();
var globals = new ScriptGlobals<TContext> { ctx = context };
return _runner(globals, ct);
// The user's script is synchronous (Roslyn emits a static method that returns
// TResult directly). We surface a Task<TResult> only to keep the existing
// RunAsync contract consumers depend on. TimedScriptEvaluator wraps this in
// Task.Run so a long-running script still honours its wall-clock budget.
var result = _func(globals);
return Task.FromResult(result);
}
/// <summary>
/// Unload the collectible <see cref="AssemblyLoadContext"/> that holds the emitted
/// script assembly so the runtime can reclaim it. After disposal the evaluator can
/// no longer be invoked — call <see cref="ScriptEvaluator{TContext, TResult}.Compile"/>
/// again for a fresh one. Dispose is idempotent.
/// </summary>
/// <remarks>
/// Unload is <i>eligible-for-collection</i>, not synchronous: the assembly is
/// reclaimed when the GC determines no live references remain. The cache disposes
/// evaluators in <see cref="CompiledScriptCache{TContext, TResult}.Clear"/> so a
/// config-generation publish releases the prior generation in one sweep; the
/// reclaim then races with the next GC cycle. Tests verify the reclaim via
/// <see cref="WeakReference"/> + <see cref="GC.Collect()"/>.
/// </remarks>
public void Dispose()
{
if (_disposed) return;
_disposed = true;
_alc.Unload();
}
/// <summary>
/// Reject scripts whose source contains brace-balanced injections that would
/// declare sibling members alongside the synthesized <c>CompiledScript.Run</c>
/// method. The expected shape is a single <c>CompiledScript</c> class with
/// exactly one member — the <c>Run</c> method. Anything else (a sibling
/// method, nested class, additional class in the namespace, free-floating
/// top-level statement) means the user source closed the synthesized braces
/// early and injected its own declarations. (Core.Scripting-013.)
/// </summary>
private static void EnforceSingleRunMember(SyntaxTree syntaxTree)
{
var root = syntaxTree.GetCompilationUnitRoot();
// The compilation unit must hold exactly one type declaration — our
// CompiledScript. Anything else means the user closed the synthesized
// namespace or class early and injected another type declaration.
var typeMembers = root.DescendantNodes()
.OfType<Microsoft.CodeAnalysis.CSharp.Syntax.BaseTypeDeclarationSyntax>()
.ToArray();
if (typeMembers.Length != 1 || typeMembers[0].Identifier.ValueText != "CompiledScript")
{
throw new CompilationErrorException(new[]
{
Diagnostic.Create(
new DiagnosticDescriptor(
id: "LMX001",
title: "Script wrapper injection",
messageFormat: "Script source must be a statement body. Declarations of " +
"additional types alongside the wrapper's CompiledScript class " +
"are not allowed; check for unbalanced braces or stray " +
"`class` / `namespace` keywords in the source. (Core.Scripting-013)",
category: "Sandbox",
defaultSeverity: DiagnosticSeverity.Error,
isEnabledByDefault: true),
typeMembers.Length > 1 ? typeMembers[1].Identifier.GetLocation() : Location.None),
});
}
// The CompiledScript class itself must contain exactly one member — the Run
// method. A second member means the user closed Run early and started a sibling.
var classMembers = ((Microsoft.CodeAnalysis.CSharp.Syntax.ClassDeclarationSyntax)typeMembers[0]).Members;
if (classMembers.Count != 1 ||
classMembers[0] is not Microsoft.CodeAnalysis.CSharp.Syntax.MethodDeclarationSyntax m ||
m.Identifier.ValueText != "Run")
{
throw new CompilationErrorException(new[]
{
Diagnostic.Create(
new DiagnosticDescriptor(
id: "LMX002",
title: "Script wrapper injection",
messageFormat: "Script source must be a statement body. Declarations of " +
"sibling members (methods, properties, nested types) alongside " +
"the wrapper's Run method are not allowed; check for unbalanced " +
"braces or a stray `}` followed by a `public`/`private`/`static` " +
"declaration in the source. (Core.Scripting-013)",
category: "Sandbox",
defaultSeverity: DiagnosticSeverity.Error,
isEnabledByDefault: true),
classMembers.Count > 1 ? classMembers[1].GetLocation() : Location.None),
});
}
}
/// <summary>
/// Synthesize the source we hand to Roslyn. The user's script body is pasted
/// verbatim inside <c>CompiledScript.Run</c>; the <c>using</c> directives mirror
/// <see cref="ScriptSandbox"/>'s imports so scripts can write <c>Math.Abs</c>
/// instead of <c>System.Math.Abs</c>.
/// </summary>
private static string BuildWrapperSource(string userSource, IReadOnlyList<string> imports)
{
var sb = new StringBuilder();
foreach (var import in imports)
sb.Append("using ").Append(import).AppendLine(";");
sb.AppendLine();
sb.AppendLine("namespace ZB.MOM.WW.OtOpcUa.Core.Scripting.Compiled;");
sb.AppendLine();
sb.AppendLine("public static class CompiledScript");
sb.AppendLine("{");
sb.Append(" public static ").Append(ToCSharpTypeName(typeof(TResult)))
.Append(" Run(").Append(ToCSharpTypeName(typeof(ScriptGlobals<TContext>)))
.AppendLine(" globals)");
sb.AppendLine(" {");
sb.AppendLine(" var ctx = globals.ctx;");
// User source ends with `return X;` per the authoring convention; we paste it
// verbatim. The leading newline keeps Roslyn diagnostics' line numbers usable
// by operators (errors point at the user's source line, not the wrapper).
sb.AppendLine("#line 1");
sb.AppendLine(userSource);
sb.AppendLine(" }");
sb.AppendLine("}");
return sb.ToString();
}
/// <summary>
/// Convert a runtime <see cref="Type"/> to a C# type-name string suitable for
/// emitting into Roslyn source. Uses <c>global::</c>-qualified FQNs to avoid
/// accidental capture by the wrapper's <c>using</c> directives, handles nested
/// types (<c>+</c> → <c>.</c>), and recurses for generic arguments so the
/// <c>ScriptGlobals&lt;TContext&gt;</c> parameter is emitted correctly.
/// </summary>
private static string ToCSharpTypeName(Type t)
{
if (t == typeof(void)) return "void";
// Primitive aliases keep the synthesized source readable when diagnostic
// logging dumps it; functionally identical to the FQN form.
if (t == typeof(bool)) return "bool";
if (t == typeof(byte)) return "byte";
if (t == typeof(sbyte)) return "sbyte";
if (t == typeof(short)) return "short";
if (t == typeof(ushort)) return "ushort";
if (t == typeof(int)) return "int";
if (t == typeof(uint)) return "uint";
if (t == typeof(long)) return "long";
if (t == typeof(ulong)) return "ulong";
if (t == typeof(float)) return "float";
if (t == typeof(double)) return "double";
if (t == typeof(decimal)) return "decimal";
if (t == typeof(string)) return "string";
if (t == typeof(object)) return "object";
if (Nullable.GetUnderlyingType(t) is { } inner)
return ToCSharpTypeName(inner) + "?";
if (t.IsArray)
return ToCSharpTypeName(t.GetElementType()!) + "[]";
if (t.IsGenericType)
{
// Walk the FullName by '.' segments (after '+ → .'). For each segment
// ending with `Name\`N`, consume N generic arguments and emit them as
// `<…>` on that segment. Nested generic-in-generic (Outer<T>.Inner<U>)
// emits as `global::Ns.Outer<T>.Inner<U>` — valid C#. The pre-fix code
// used `IndexOf('`')` to find the FIRST backtick and truncated the
// entire name there, silently dropping the rest of the nested-generic
// closed args. (Core.Scripting-015.)
var rawName = t.GetGenericTypeDefinition().FullName!.Replace('+', '.');
var allArgs = t.GetGenericArguments();
var segments = rawName.Split('.');
var argIndex = 0;
var sb = new StringBuilder("global::");
for (int i = 0; i < segments.Length; i++)
{
if (i > 0) sb.Append('.');
var seg = segments[i];
var backtick = seg.IndexOf('`');
if (backtick >= 0)
{
var arity = int.Parse(seg.AsSpan(backtick + 1));
sb.Append(seg, 0, backtick);
sb.Append('<');
for (int j = 0; j < arity; j++)
{
if (j > 0) sb.Append(", ");
sb.Append(ToCSharpTypeName(allArgs[argIndex++]));
}
sb.Append('>');
}
else
{
sb.Append(seg);
}
}
return sb.ToString();
}
return "global::" + t.FullName!.Replace('+', '.');
}
}
/// <summary>
/// Collectible <see cref="AssemblyLoadContext"/> that hosts a single emitted script
/// assembly. Created per <see cref="ScriptEvaluator{TContext, TResult}"/> instance so
/// <see cref="AssemblyLoadContext.Unload"/> releases exactly that script. Resolves
/// dependencies via the default ALC — script assemblies reference the BCL + the
/// application's own types, all of which live in the default context.
/// </summary>
internal sealed class ScriptAssemblyLoadContext : AssemblyLoadContext
{
public ScriptAssemblyLoadContext(string name) : base(name, isCollectible: true)
{
}
protected override Assembly? Load(AssemblyName assemblyName) => null;
}
/// <summary>
/// Thrown by <see cref="ScriptEvaluator{TContext, TResult}.Compile"/> when Roslyn
/// reports compile-time errors against the wrapper source. Mirrors the
/// <c>Microsoft.CodeAnalysis.Scripting.CompilationErrorException</c> from the legacy
/// <c>CSharpScript</c> path so callers (engines + the Admin test-harness) keep the
/// same catch site after the Core.Scripting-008 rewrite.
/// </summary>
public sealed class CompilationErrorException : Exception
{
public IReadOnlyList<Diagnostic> Diagnostics { get; }
public CompilationErrorException(IReadOnlyList<Diagnostic> diagnostics)
: base(BuildMessage(diagnostics))
{
Diagnostics = diagnostics;
}
private static string BuildMessage(IReadOnlyList<Diagnostic> diagnostics)
{
if (diagnostics.Count == 0) return "Script compile failed.";
// Operators see this — match the legacy Roslyn format ("(line,col): error CSxxxx:
// message") so existing operator runbooks still match.
var first = diagnostics[0];
var rest = diagnostics.Count == 1 ? "" : $" (and {diagnostics.Count - 1} more)";
return first.ToString() + rest;
}
}
@@ -1,11 +1,10 @@
using Microsoft.CodeAnalysis.CSharp.Scripting;
using Microsoft.CodeAnalysis.Scripting;
using Microsoft.CodeAnalysis;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
/// <summary>
/// Factory for the <see cref="ScriptOptions"/> every user script is compiled against.
/// Factory for the compile-time sandbox every user script is built against.
/// Implements Phase 7 plan decision #6 (read-only sandbox) by whitelisting only the
/// assemblies + namespaces the script API needs; no <c>System.IO</c>, no
/// <c>System.Net</c>, no <c>System.Diagnostics.Process</c>, no
@@ -15,9 +14,12 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
/// </summary>
/// <remarks>
/// <para>
/// Roslyn's default <see cref="ScriptOptions"/> references <c>mscorlib</c> /
/// <c>System.Runtime</c> transitively which pulls in every type in the BCL — this
/// class overrides that with an explicit minimal allow-list.
/// Roslyn would otherwise pull in every type in the BCL transitively via
/// <c>mscorlib</c> / <c>System.Runtime</c> — this class overrides that with an
/// explicit minimal allow-list. The list is the same regardless of whether
/// <see cref="ScriptEvaluator{TContext, TResult}"/> uses the legacy
/// <c>CSharpScript</c> path or the collectible-<c>AssemblyLoadContext</c> path
/// (Core.Scripting-008): both go through <see cref="Build"/>.
/// </para>
/// <para>
/// Namespaces pre-imported so scripts don't have to write <c>using</c> clauses:
@@ -35,29 +37,21 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
public static class ScriptSandbox
{
/// <summary>
/// Build the <see cref="ScriptOptions"/> used for every virtual-tag / alarm
/// script. <paramref name="contextType"/> is the concrete
/// <see cref="ScriptContext"/> subclass the globals will be of — the compiler
/// uses its type to resolve <c>ctx.GetTag(...)</c> calls.
/// Build the sandbox configuration used for every virtual-tag / alarm script.
/// <paramref name="contextType"/> is the concrete <see cref="ScriptContext"/>
/// subclass the script's <c>ctx</c> will be of — the compiler uses its assembly
/// to resolve <c>ctx.GetTag(...)</c> calls.
/// </summary>
public static ScriptOptions Build(Type contextType)
public static SandboxConfig Build(Type contextType)
{
if (contextType is null) throw new ArgumentNullException(nameof(contextType));
if (!typeof(ScriptContext).IsAssignableFrom(contextType))
throw new ArgumentException(
$"Script context type must derive from {nameof(ScriptContext)}", nameof(contextType));
// Allow-listed assemblies — each explicitly chosen. Adding here is a
// plan-level decision; do not expand casually. HashSet so adding the
// contextType's assembly is idempotent when it happens to be Core.Scripting
// already.
var allowedAssemblies = new HashSet<System.Reflection.Assembly>
// OtOpcUa-owned assemblies — pinned by typeof(...) so they survive a rename.
var pinnedAssemblies = new HashSet<System.Reflection.Assembly>
{
// System.Private.CoreLib — primitives (int, double, bool, string, DateTime,
// TimeSpan, Math, Convert, nullable<T>). Can't practically script without it.
typeof(object).Assembly,
// System.Linq — IEnumerable extensions (Where / Select / Sum / Average / etc.).
typeof(System.Linq.Enumerable).Assembly,
// Core.Abstractions — DataValueSnapshot + DriverDataType so scripts can name
// the types they receive from ctx.GetTag.
typeof(DataValueSnapshot).Assembly,
@@ -72,7 +66,23 @@ public static class ScriptSandbox
contextType.Assembly,
};
var allowedImports = new[]
// BCL references. We list the trusted-platform-assemblies set restricted to
// System.* and netstandard so the synthesized wrapper can reference every BCL
// type by FQN — including the ones we forbid (HttpClient, File, Process,
// Registry, etc.). Letting those types resolve at compile is intentional: the
// hard security gate is ForbiddenTypeAnalyzer in step 3 of the compile pipeline
// (Core.Scripting-001 / -002 established the analyzer must be the sole gate
// because type forwarding makes any references-list-only restriction porous).
// The references list now serves only as scoping hygiene — out-of-band BCL
// surface (operator-authored hosting helpers, third-party packages, app code)
// is not on the list and stays unreachable.
var references = new List<MetadataReference>();
foreach (var asm in pinnedAssemblies)
references.Add(MetadataReference.CreateFromFile(asm.Location));
foreach (var path in EnumerateBclAssemblyPaths())
references.Add(MetadataReference.CreateFromFile(path));
var imports = new[]
{
"System",
"System.Linq",
@@ -80,8 +90,56 @@ public static class ScriptSandbox
"ZB.MOM.WW.OtOpcUa.Core.Scripting",
};
return ScriptOptions.Default
.WithReferences(allowedAssemblies)
.WithImports(allowedImports);
return new SandboxConfig(references, imports);
}
private static IEnumerable<string> EnumerateBclAssemblyPaths()
{
// The .NET host advertises the resolved runtime-shared-framework + BCL DLL set
// via the TRUSTED_PLATFORM_ASSEMBLIES AppContext data slot. This is what the
// ALC fallback uses when resolving assemblies, so anything in here is already
// loadable by the host process. We restrict to System.* and netstandard to keep
// the script's reachable surface to the BCL — anything else (Microsoft.*,
// application code, third-party packages happening to be in the runtime store)
// would expand the analyzer's deny-list job unnecessarily.
var raw = (string?)AppContext.GetData("TRUSTED_PLATFORM_ASSEMBLIES");
if (string.IsNullOrEmpty(raw))
yield break;
var separator = OperatingSystem.IsWindows() ? ';' : ':';
foreach (var path in raw.Split(separator, StringSplitOptions.RemoveEmptyEntries))
{
var name = System.IO.Path.GetFileName(path);
if (name.StartsWith("System.", StringComparison.Ordinal) ||
string.Equals(name, "netstandard.dll", StringComparison.Ordinal) ||
string.Equals(name, "mscorlib.dll", StringComparison.Ordinal) ||
// Microsoft.Win32.Registry isn't a System.* DLL but the analyzer's
// Microsoft.Win32 deny-list relies on the type being resolvable so it
// can identify + reject it (Core.Scripting-001 / -002). Add the one
// DLL we need rather than broadening to Microsoft.* (which would also
// pull in compilers, build tooling, etc.).
string.Equals(name, "Microsoft.Win32.Registry.dll", StringComparison.Ordinal))
{
yield return path;
}
}
}
}
/// <summary>
/// Compile-time sandbox configuration. Returned by <see cref="ScriptSandbox.Build"/>;
/// consumed by <see cref="ScriptEvaluator{TContext, TResult}"/>'s manual
/// <c>CSharpCompilation</c> path.
/// </summary>
/// <param name="References">
/// Metadata references (allow-listed assemblies) the script compilation is built
/// against. Anything not in this set is unresolved at compile, which is the sandbox's
/// first line of defense — <see cref="ForbiddenTypeAnalyzer"/> is the second.
/// </param>
/// <param name="Imports">
/// Namespaces pre-imported into the wrapper compilation as <c>using</c> directives
/// so scripts can write <c>Math.Abs</c> rather than <c>System.Math.Abs</c>.
/// </param>
public sealed record SandboxConfig(
IReadOnlyList<MetadataReference> References,
IReadOnlyList<string> Imports);
@@ -37,6 +37,21 @@ public sealed class VirtualTagEngine : IDisposable
private readonly DependencyGraph _graph = new();
private readonly Dictionary<string, VirtualTagState> _tags = new(StringComparer.Ordinal);
/// <summary>
/// Compile cache for every virtual-tag script. Routes <see cref="Load"/>'s
/// <see cref="ScriptEvaluator{TContext, TResult}.Compile"/> calls through the
/// cache so the collectible <see cref="System.Runtime.Loader.AssemblyLoadContext"/>
/// each compile produces is actually disposed on the publish-replace path
/// (Core.Scripting-016): the cache's <see cref="CompiledScriptCache{TContext, TResult}.Clear"/>
/// disposes every materialised evaluator before dropping its dictionary entry,
/// so a config-publish releases the prior generation's ALCs and the per-publish
/// accretion the Core.Scripting-008 fix targeted is actually freed in production.
/// Pre-fix the engine called <c>ScriptEvaluator.Compile</c> directly, which left
/// the ALCs rooted until the process exited — defeating -008 on the real path.
/// </summary>
private readonly CompiledScriptCache<VirtualTagContext, object?> _compileCache = new();
private readonly ConcurrentDictionary<string, DataValueSnapshot> _valueCache = new(StringComparer.Ordinal);
private readonly ConcurrentDictionary<string, List<Action<string, DataValueSnapshot>>> _observers
= new(StringComparer.Ordinal);
@@ -74,6 +89,10 @@ public sealed class VirtualTagEngine : IDisposable
UnsubscribeFromUpstream();
_tags.Clear();
_graph.Clear();
// Dispose every compiled-script ALC from the prior generation BEFORE we
// recompile this one. Skipping this is what made Core.Scripting-008 a
// no-op in production (Core.Scripting-016).
_compileCache.Clear();
var compileFailures = new List<string>();
var seenPaths = new HashSet<string>(StringComparer.Ordinal);
@@ -102,7 +121,9 @@ public sealed class VirtualTagEngine : IDisposable
continue;
}
var evaluator = ScriptEvaluator<VirtualTagContext, object?>.Compile(def.ScriptSource);
// Route through CompiledScriptCache so the emitted assembly's collectible
// ALC participates in publish-replace cleanup. (Core.Scripting-016)
var evaluator = _compileCache.GetOrCompile(def.ScriptSource);
var timed = new TimedScriptEvaluator<VirtualTagContext, object?>(evaluator, _scriptTimeout);
var scriptLogger = _loggerFactory.Create(def.Path);
@@ -481,6 +502,9 @@ public sealed class VirtualTagEngine : IDisposable
UnsubscribeFromUpstream();
_tags.Clear();
_graph.Clear();
// Dispose every compiled-script ALC so the engine's shutdown actually
// releases the emitted assemblies. (Core.Scripting-016)
_compileCache.Dispose();
}
internal DependencyGraph GraphForTesting => _graph;
@@ -68,7 +68,10 @@ public static class SnapshotFormatter
int tagW = rows.Length == 0 ? "TAG".Length : Math.Max("TAG".Length, rows.Max(r => r.Tag.Length));
int valW = rows.Length == 0 ? "VALUE".Length : Math.Max("VALUE".Length, rows.Max(r => r.Value.Length));
int statW = rows.Length == 0 ? "STATUS".Length : Math.Max("STATUS".Length, rows.Max(r => r.Status.Length));
// source-time column is fixed-width (ISO-8601 to ms) so no max-measurement needed.
// source-time is the right-most column, so it is intentionally not measured or padded;
// when a snapshot has a non-null SourceTimestampUtc the cell is 24 chars (ISO-8601 to ms),
// and when the timestamp is null FormatTimestamp emits "-" — the resulting unalignment is
// harmless because nothing is appended after this column.
var sb = new System.Text.StringBuilder();
sb.Append("TAG".PadRight(tagW)).Append(" ")
@@ -113,12 +116,17 @@ public static class SnapshotFormatter
{
0x00000000u => "Good",
0x80000000u => "Bad",
0x80020000u => "BadInternalError",
0x80050000u => "BadCommunicationError",
0x800A0000u => "BadTimeout",
0x80310000u => "BadNoCommunication",
0x80320000u => "BadWaitingForInitialData",
0x80340000u => "BadNodeIdUnknown",
0x80330000u => "BadNodeIdInvalid",
0x803B0000u => "BadNotWritable",
0x803C0000u => "BadOutOfRange",
0x803D0000u => "BadNotSupported",
0x808B0000u => "BadDeviceFailure",
0x80740000u => "BadTypeMismatch",
0x40000000u => "Uncertain",
_ => null,
@@ -24,6 +24,8 @@ public sealed class ProbeCommand : FocasCommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
// Driver.FOCAS.Cli-003: validate numeric option ranges before any driver work.
ValidateOptions();
var ct = console.RegisterCancellationHandler();
var probeTag = new FocasTagDefinition(
@@ -34,24 +36,20 @@ public sealed class ProbeCommand : FocasCommandBase
Writable: false);
var options = BuildOptions([probeTag]);
// Driver.FOCAS.Cli-004: `await using` is the sole disposal mechanism — FocasDriver.DisposeAsync
// already invokes ShutdownAsync, so a redundant explicit ShutdownAsync(CancellationToken.None)
// in a finally block ran shutdown twice. The await-using on the next line is enough.
await using var driver = new FocasDriver(options, DriverInstanceId);
try
{
await driver.InitializeAsync("{}", ct);
var snapshot = await driver.ReadAsync(["__probe"], ct);
var health = driver.GetHealth();
await driver.InitializeAsync("{}", ct);
var snapshot = await driver.ReadAsync(["__probe"], ct);
var health = driver.GetHealth();
await console.Output.WriteLineAsync($"CNC: {CncHost}:{CncPort}");
await console.Output.WriteLineAsync($"Series: {Series}");
await console.Output.WriteLineAsync($"Health: {health.State}");
if (health.LastError is { } err)
await console.Output.WriteLineAsync($"Last error: {err}");
await console.Output.WriteLineAsync();
await console.Output.WriteLineAsync(SnapshotFormatter.Format(Address, snapshot[0]));
}
finally
{
await driver.ShutdownAsync(CancellationToken.None);
}
await console.Output.WriteLineAsync($"CNC: {CncHost}:{CncPort}");
await console.Output.WriteLineAsync($"Series: {Series}");
await console.Output.WriteLineAsync($"Health: {health.State}");
if (health.LastError is { } err)
await console.Output.WriteLineAsync($"Last error: {err}");
await console.Output.WriteLineAsync();
await console.Output.WriteLineAsync(SnapshotFormatter.Format(Address, snapshot[0]));
}
}
@@ -23,6 +23,8 @@ public sealed class ReadCommand : FocasCommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
// Driver.FOCAS.Cli-003: validate numeric option ranges before any driver work.
ValidateOptions();
var ct = console.RegisterCancellationHandler();
var tagName = SynthesiseTagName(Address, DataType);
@@ -34,17 +36,13 @@ public sealed class ReadCommand : FocasCommandBase
Writable: false);
var options = BuildOptions([tag]);
// Driver.FOCAS.Cli-004: `await using` is the sole disposal mechanism — FocasDriver.DisposeAsync
// already invokes ShutdownAsync, so a redundant explicit ShutdownAsync(CancellationToken.None)
// in a finally block ran shutdown twice. The await-using on the next line is enough.
await using var driver = new FocasDriver(options, DriverInstanceId);
try
{
await driver.InitializeAsync("{}", ct);
var snapshot = await driver.ReadAsync([tagName], ct);
await console.Output.WriteLineAsync(SnapshotFormatter.Format(Address, snapshot[0]));
}
finally
{
await driver.ShutdownAsync(CancellationToken.None);
}
await driver.InitializeAsync("{}", ct);
var snapshot = await driver.ReadAsync([tagName], ct);
await console.Output.WriteLineAsync(SnapshotFormatter.Format(Address, snapshot[0]));
}
internal static string SynthesiseTagName(string address, FocasDataType type)
@@ -25,6 +25,10 @@ public sealed class SubscribeCommand : FocasCommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
// Driver.FOCAS.Cli-003: validate numeric option ranges (including the subscribe-only
// --interval-ms) before any driver work so a zero/negative interval surfaces as a
// clean CommandException rather than a tight-spinning poll loop.
ValidateOptions(IntervalMs);
var ct = console.RegisterCancellationHandler();
var tagName = ReadCommand.SynthesiseTagName(Address, DataType);
@@ -36,24 +40,59 @@ public sealed class SubscribeCommand : FocasCommandBase
Writable: false);
var options = BuildOptions([tag]);
// Driver.FOCAS.Cli-004: `await using` is the sole driver-disposal mechanism — FocasDriver.DisposeAsync
// already invokes ShutdownAsync, so a redundant ShutdownAsync(CancellationToken.None) in finally
// ran shutdown twice. Only UnsubscribeAsync stays in the finally block — that's a subscription
// lifecycle concern that is not part of driver disposal.
await using var driver = new FocasDriver(options, DriverInstanceId);
// Driver.FOCAS.Cli-002: serialize console writes from the PollGroupEngine background
// thread so overlapping poll ticks (and the "Subscribed to ..." banner from the CliFx
// invocation thread) can't interleave partial lines.
var writeLock = new object();
ISubscriptionHandle? handle = null;
try
{
await driver.InitializeAsync("{}", ct);
// Driver.FOCAS.Cli-002: route every data-change event to the CliFx console (not
// System.Console — the analyzer flags it + IConsole is the testable abstraction).
// The handler is synchronous because OnDataChange is raised from a driver
// background thread; the IConsole.Output writer is not documented as thread-safe
// so we serialize against the banner write via writeLock.
driver.OnDataChange += (_, e) =>
{
var line = $"[{DateTime.UtcNow:HH:mm:ss.fff}] " +
$"{e.FullReference} = {SnapshotFormatter.FormatValue(e.Snapshot.Value)} " +
$"({SnapshotFormatter.FormatStatus(e.Snapshot.StatusCode)})";
console.Output.WriteLine(line);
// Swallow + log write failures so a transient stdout error (closed pipe, IO
// exception on a redirected stream) cannot tear down the poll-engine
// background loop. Without this guard the unhandled exception would fault
// the long-running subscribe.
try
{
var line = $"[{DateTime.UtcNow:HH:mm:ss.fff}] " +
$"{e.FullReference} = {SnapshotFormatter.FormatValue(e.Snapshot.Value)} " +
$"({SnapshotFormatter.FormatStatus(e.Snapshot.StatusCode)})";
lock (writeLock)
{
console.Output.WriteLine(line);
}
}
catch (Exception ex)
{
Serilog.Log.Logger.Warning(ex,
"SubscribeCommand: console write failed for {Tag}; continuing poll loop.",
e.FullReference);
}
};
handle = await driver.SubscribeAsync([tagName], TimeSpan.FromMilliseconds(IntervalMs), ct);
await console.Output.WriteLineAsync(
$"Subscribed to {Address} @ {IntervalMs}ms. Ctrl+C to stop.");
// Driver.FOCAS.Cli-002: hold the lock around the banner write so the first
// poll-driven change line from the driver tick thread can't interleave with
// this banner.
lock (writeLock)
{
console.Output.WriteLine(
$"Subscribed to {Address} @ {IntervalMs}ms. Ctrl+C to stop.");
}
try
{
await Task.Delay(System.Threading.Timeout.InfiniteTimeSpan, ct);
@@ -67,10 +106,16 @@ public sealed class SubscribeCommand : FocasCommandBase
{
if (handle is not null)
{
// Driver.FOCAS.Cli-002: detach the OnDataChange handler before unsubscribe +
// disposal for symmetry with the handle teardown, so a future refactor that
// reuses the driver after the subscribe verb returns wouldn't leak a
// dangling subscription.
// (Single anonymous handler instance is captured implicitly by `await using`
// disposing the driver immediately afterwards; the unsubscribe + dispose
// sequence is what really cleans up here.)
try { await driver.UnsubscribeAsync(handle, CancellationToken.None); }
catch { /* teardown best-effort */ }
}
await driver.ShutdownAsync(CancellationToken.None);
}
}
}
@@ -29,6 +29,10 @@ public sealed class WriteCommand : FocasCommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
// Driver.FOCAS.Cli-003: validate numeric option ranges before any driver work so
// a zero/negative port/timeout surfaces as a clean CommandException rather than an
// opaque downstream exception.
ValidateOptions();
var ct = console.RegisterCancellationHandler();
var tagName = ReadCommand.SynthesiseTagName(Address, DataType);
@@ -42,30 +46,49 @@ public sealed class WriteCommand : FocasCommandBase
var parsed = ParseValue(Value, DataType);
// Driver.FOCAS.Cli-004: `await using` is the sole disposal mechanism — FocasDriver.DisposeAsync
// already invokes ShutdownAsync, so a redundant explicit ShutdownAsync(CancellationToken.None)
// in a finally block ran shutdown twice. The await-using on the next line is enough.
await using var driver = new FocasDriver(options, DriverInstanceId);
try
{
await driver.InitializeAsync("{}", ct);
var results = await driver.WriteAsync([new WriteRequest(tagName, parsed)], ct);
await console.Output.WriteLineAsync(SnapshotFormatter.FormatWrite(Address, results[0]));
}
finally
{
await driver.ShutdownAsync(CancellationToken.None);
}
await driver.InitializeAsync("{}", ct);
var results = await driver.WriteAsync([new WriteRequest(tagName, parsed)], ct);
await console.Output.WriteLineAsync(SnapshotFormatter.FormatWrite(Address, results[0]));
}
internal static object ParseValue(string raw, FocasDataType type) => type switch
/// <summary>Parse <c>--value</c> per <see cref="FocasDataType"/>, invariant culture throughout.</summary>
/// <remarks>
/// Driver.FOCAS.Cli-001: numeric parses are wrapped so that malformed input
/// (<see cref="FormatException"/> / <see cref="OverflowException"/>) surfaces
/// as a clean <see cref="CliFx.Exceptions.CommandException"/> rather than a raw
/// .NET stack trace — matching the friendly message the Bit path already produces.
/// </remarks>
internal static object ParseValue(string raw, FocasDataType type)
{
FocasDataType.Bit => ParseBool(raw),
FocasDataType.Byte => sbyte.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Int16 => short.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Int32 => int.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Float32 => float.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Float64 => double.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.String => raw,
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
if (type == FocasDataType.Bit) return ParseBool(raw);
if (type == FocasDataType.String) return raw;
try
{
return type switch
{
FocasDataType.Byte => (object)sbyte.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Int16 => (object)short.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Int32 => (object)int.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Float32 => (object)float.Parse(raw, CultureInfo.InvariantCulture),
FocasDataType.Float64 => (object)double.Parse(raw, CultureInfo.InvariantCulture),
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
}
catch (FormatException ex)
{
throw new CliFx.Exceptions.CommandException(
$"Value '{raw}' is not a valid {type}: {ex.Message}");
}
catch (OverflowException ex)
{
throw new CliFx.Exceptions.CommandException(
$"Value '{raw}' is out of range for {type}: {ex.Message}");
}
}
private static bool ParseBool(string raw) => raw.Trim().ToLowerInvariant() switch
{
@@ -54,4 +54,26 @@ public abstract class FocasCommandBase : DriverCommandBase
};
protected string DriverInstanceId => $"focas-cli-{CncHost}:{CncPort}";
/// <summary>
/// Driver.FOCAS.Cli-003: validate numeric option ranges at the CLI boundary so a
/// zero/negative <c>--cnc-port</c>, <c>--timeout-ms</c>, or <c>--interval-ms</c>
/// surfaces as a clean <see cref="CliFx.Exceptions.CommandException"/> rather than
/// either an opaque downstream exception (invalid <c>focas://host:&lt;n&gt;</c> /
/// zero <c>TimeSpan</c>) or a tight-spinning poll loop. The <c>--interval-ms</c>
/// option is subscribe-only — pass <c>null</c> for probe/read/write so this
/// helper can be a single shared validator.
/// </summary>
protected void ValidateOptions(int? intervalMs = null)
{
if (CncPort < 1 || CncPort > 65535)
throw new CliFx.Exceptions.CommandException(
$"--cnc-port must be in the range 1..65535 (got {CncPort}).");
if (TimeoutMs <= 0)
throw new CliFx.Exceptions.CommandException(
$"--timeout-ms must be positive (got {TimeoutMs}).");
if (intervalMs is { } iv && iv <= 0)
throw new CliFx.Exceptions.CommandException(
$"--interval-ms must be positive (got {iv}).");
}
}
@@ -22,6 +22,10 @@
<ProjectReference Include="..\..\ZB.MOM.WW.OtOpcUa.Driver.FOCAS\ZB.MOM.WW.OtOpcUa.Driver.FOCAS.csproj"/>
</ItemGroup>
<ItemGroup>
<InternalsVisibleTo Include="ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests"/>
</ItemGroup>
<!-- CLI runs the managed WireFocasClient and talks to the CNC over TCP:8193
directly — no Fwlib64.dll copy step needed. -->
@@ -41,7 +41,7 @@ public static class AbCipStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -16,7 +16,7 @@ public static class AbLegacyStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -14,7 +14,7 @@ public static class FocasStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,7 +1,7 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
@@ -2,7 +2,7 @@ using System.Diagnostics.Metrics;
using System.Threading.Channels;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,7 +1,7 @@
using System.Diagnostics.Metrics;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,8 +1,8 @@
using System.Collections.Concurrent;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
// Use the generated nested status enum for the SetBufferedUpdateInterval reply check.
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using System.Runtime.CompilerServices;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -15,10 +15,18 @@
<ItemGroup>
<ProjectReference Include="..\..\Core\ZB.MOM.WW.OtOpcUa.Core.Abstractions\ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj"/>
<ProjectReference Include="..\..\Core\ZB.MOM.WW.OtOpcUa.Core\ZB.MOM.WW.OtOpcUa.Core.csproj"/>
<!-- mxaccessgw .NET client. Path-based ProjectReference because both repos sit
side-by-side on the dev box; long-term we'll consume MxGateway.Client as a
NuGet package. PR 4.W revisits the dependency shape before parity gating. -->
<ProjectReference Include="..\..\..\..\mxaccessgw\clients\dotnet\MxGateway.Client\MxGateway.Client.csproj"/>
</ItemGroup>
<ItemGroup>
<!-- Sibling mxaccessgw repo's .NET client + contracts. The sibling restored
a proper client library under clients/dotnet/ (May 2026), so this is
back on a path-based ProjectReference per the libs/README unwind plan #1.
Both projects target net10.0; the Contracts project transitively pulls
Google.Protobuf + Grpc.Core.Api, the Client project transitively pulls
Grpc.Net.Client + Polly.Core + Microsoft.Extensions.Logging.Abstractions,
so the explicit PackageReference shims that backfilled the vendored
binary references are no longer needed. -->
<ProjectReference Include="..\..\..\..\mxaccessgw\clients\dotnet\ZB.MOM.WW.MxGateway.Client\ZB.MOM.WW.MxGateway.Client.csproj"/>
</ItemGroup>
<ItemGroup>
@@ -72,8 +72,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
MaxValues = (int)Math.Min(maxValuesPerNode, int.MaxValue),
CorrelationId = Guid.NewGuid().ToString("N"),
};
var reply = await Invoke<ReadRawRequest, ReadRawReply>(MessageKind.ReadRawRequest, MessageKind.ReadRawReply, req, cancellationToken).ConfigureAwait(false);
ThrowIfFailed(reply.Success, reply.Error, "ReadRaw");
var reply = await InvokeAndClassifyAsync<ReadRawRequest, ReadRawReply>(
MessageKind.ReadRawRequest, MessageKind.ReadRawReply, req,
r => (r.Success, r.Error), "ReadRaw", cancellationToken).ConfigureAwait(false);
return new HistoryReadResult(ToSnapshots(reply.Samples), ContinuationPoint: null);
}
@@ -90,8 +91,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
AggregateColumn = MapAggregate(aggregate),
CorrelationId = Guid.NewGuid().ToString("N"),
};
var reply = await Invoke<ReadProcessedRequest, ReadProcessedReply>(MessageKind.ReadProcessedRequest, MessageKind.ReadProcessedReply, req, cancellationToken).ConfigureAwait(false);
ThrowIfFailed(reply.Success, reply.Error, "ReadProcessed");
var reply = await InvokeAndClassifyAsync<ReadProcessedRequest, ReadProcessedReply>(
MessageKind.ReadProcessedRequest, MessageKind.ReadProcessedReply, req,
r => (r.Success, r.Error), "ReadProcessed", cancellationToken).ConfigureAwait(false);
return new HistoryReadResult(ToAggregateSnapshots(reply.Buckets), ContinuationPoint: null);
}
@@ -107,8 +109,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
TimestampsUtcTicks = ticks,
CorrelationId = Guid.NewGuid().ToString("N"),
};
var reply = await Invoke<ReadAtTimeRequest, ReadAtTimeReply>(MessageKind.ReadAtTimeRequest, MessageKind.ReadAtTimeReply, req, cancellationToken).ConfigureAwait(false);
ThrowIfFailed(reply.Success, reply.Error, "ReadAtTime");
var reply = await InvokeAndClassifyAsync<ReadAtTimeRequest, ReadAtTimeReply>(
MessageKind.ReadAtTimeRequest, MessageKind.ReadAtTimeReply, req,
r => (r.Success, r.Error), "ReadAtTime", cancellationToken).ConfigureAwait(false);
return new HistoryReadResult(AlignAtTimeSnapshots(timestampsUtc, reply.Samples), ContinuationPoint: null);
}
@@ -167,11 +170,34 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
MaxEvents = maxEvents,
CorrelationId = Guid.NewGuid().ToString("N"),
};
var reply = await Invoke<ReadEventsRequest, ReadEventsReply>(MessageKind.ReadEventsRequest, MessageKind.ReadEventsReply, req, cancellationToken).ConfigureAwait(false);
ThrowIfFailed(reply.Success, reply.Error, "ReadEvents");
var reply = await InvokeAndClassifyAsync<ReadEventsRequest, ReadEventsReply>(
MessageKind.ReadEventsRequest, MessageKind.ReadEventsReply, req,
r => (r.Success, r.Error), "ReadEvents", cancellationToken).ConfigureAwait(false);
return new HistoricalEventsResult(ToHistoricalEvents(reply.Events), ContinuationPoint: null);
}
/// <summary>
/// Returns a snapshot of operation counters and the single pipe channel's connection
/// state.
/// </summary>
/// <remarks>
/// This client owns one duplex named-pipe channel to the sidecar — it has no notion of
/// separate process / event connections and no per-node telemetry. The single channel's
/// connected state is reported for both <see cref="HistorianHealthSnapshot.ProcessConnectionOpen"/>
/// and <see cref="HistorianHealthSnapshot.EventConnectionOpen"/>, and
/// <see cref="HistorianHealthSnapshot.ActiveProcessNode"/> /
/// <see cref="HistorianHealthSnapshot.ActiveEventNode"/> /
/// <see cref="HistorianHealthSnapshot.Nodes"/> are intentionally null/empty. Consumers
/// that need to distinguish two connections should read another driver. (Finding 010.)
/// <para>
/// All six counter fields (TotalQueries, TotalSuccesses, TotalFailures,
/// ConsecutiveFailures, LastSuccessTime, LastFailureTime, LastError) are mutated
/// exclusively under <c>_healthLock</c>, so the snapshot is internally consistent —
/// in particular <c>TotalSuccesses + TotalFailures == TotalQueries</c> at every
/// observed snapshot (a call that has started but not yet completed has not
/// incremented any counter). (Finding 003 / 004.)
/// </para>
/// </remarks>
public HistorianHealthSnapshot GetHealthSnapshot()
{
lock (_healthLock)
@@ -233,8 +259,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
try
{
var reply = await Invoke<WriteAlarmEventsRequest, WriteAlarmEventsReply>(
MessageKind.WriteAlarmEventsRequest, MessageKind.WriteAlarmEventsReply, req, cancellationToken).ConfigureAwait(false);
var reply = await InvokeAsync<WriteAlarmEventsRequest, WriteAlarmEventsReply>(
MessageKind.WriteAlarmEventsRequest, MessageKind.WriteAlarmEventsReply, req,
r => (r.Success, r.Error), cancellationToken).ConfigureAwait(false);
// Whole-call failure → transient retry for every event in the batch.
if (!reply.Success)
@@ -279,69 +306,79 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
// ===== Helpers =====
private async Task<TReply> Invoke<TRequest, TReply>(
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request, CancellationToken ct)
/// <summary>
/// Sends one request through the channel and records the outcome (transport success or
/// transport failure) under a single <c>_healthLock</c> acquisition that also bumps
/// <c>_totalQueries</c>. Sidecar-level success / failure is NOT classified here — the
/// caller passes that through <see cref="InvokeAndClassifyAsync"/> instead. (Finding
/// 003 / 004: all six counter fields share one synchronization mechanism so a snapshot
/// can never observe a torn state.)
/// </summary>
private async Task<TReply> InvokeAsync<TRequest, TReply>(
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request,
Func<TReply, (bool ok, string? error)> evaluate, CancellationToken ct)
where TReply : class
{
Interlocked.Increment(ref _totalQueries);
try
{
var reply = await _channel.InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, ct).ConfigureAwait(false);
RecordSuccess();
// Classify transport+sidecar in one lock so TotalQueries/TotalSuccesses/
// TotalFailures move together and no intermediate "success-then-undo" state is
// visible to a concurrent GetHealthSnapshot.
var (ok, error) = evaluate(reply);
RecordOutcome(ok, error);
return reply;
}
catch (Exception ex)
{
RecordFailure(ex.Message);
RecordOutcome(success: false, ex.Message);
throw;
}
}
private void RecordSuccess()
/// <summary>
/// Convenience wrapper around <see cref="InvokeAsync"/> that throws
/// <see cref="InvalidOperationException"/> on a sidecar-reported failure. Used by the
/// <see cref="IHistorianDataSource"/> read methods.
/// </summary>
private async Task<TReply> InvokeAndClassifyAsync<TRequest, TReply>(
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request,
Func<TReply, (bool ok, string? error)> evaluate, string op, CancellationToken ct)
where TReply : class
{
lock (_healthLock)
var reply = await InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, evaluate, ct).ConfigureAwait(false);
var (ok, error) = evaluate(reply);
if (!ok)
{
_totalSuccesses++;
_consecutiveFailures = 0;
_lastSuccessUtc = DateTime.UtcNow;
}
}
private void RecordFailure(string message)
{
lock (_healthLock)
{
_totalFailures++;
_consecutiveFailures++;
_lastFailureUtc = DateTime.UtcNow;
_lastError = message;
}
}
private void ThrowIfFailed(bool success, string? error, string op)
{
if (!success)
{
// Sidecar-reported failure counts as an operation failure even though the
// transport delivered a reply. The Invoke wrapper already recorded transport
// success — undo that and record the failure so the health snapshot reflects
// operation-level success rates rather than just connectivity.
ReclassifySuccessAsFailure(error);
throw new InvalidOperationException(
$"Sidecar {op} failed: {error ?? "<no message>"}.");
}
return reply;
}
private void ReclassifySuccessAsFailure(string? message)
/// <summary>
/// Records the outcome of a single call — increments <c>_totalQueries</c> and exactly
/// one of <c>_totalSuccesses</c> / <c>_totalFailures</c> under a single
/// <c>_healthLock</c> acquisition. (Findings 003 + 004.)
/// </summary>
private void RecordOutcome(bool success, string? error)
{
lock (_healthLock)
{
// Transport-level RecordSuccess happened a moment ago; reverse it.
_totalSuccesses--;
_totalFailures++;
_consecutiveFailures++;
_lastFailureUtc = DateTime.UtcNow;
_lastError = message;
_totalQueries++;
if (success)
{
_totalSuccesses++;
_consecutiveFailures = 0;
_lastSuccessUtc = DateTime.UtcNow;
}
else
{
_totalFailures++;
_consecutiveFailures++;
_lastFailureUtc = DateTime.UtcNow;
_lastError = error;
}
}
}
@@ -452,9 +489,12 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
/// <summary>
/// Synchronous dispose required by <see cref="IDisposable"/> on
/// <see cref="IHistorianDataSource"/>. The underlying channel's async cleanup is
/// non-blocking (just resets transport state + disposes streams), so the
/// GetAwaiter()/GetResult() bridge is safe.
/// <see cref="IHistorianDataSource"/>. The underlying channel's async cleanup runs
/// <see cref="System.IO.Pipes.NamedPipeClientStream"/> teardown, which can block briefly
/// on OS handle release — strictly speaking it is not non-blocking — but the
/// <c>GetAwaiter()/GetResult()</c> bridge is deadlock-safe because the cleanup never
/// awaits a captured <see cref="System.Threading.SynchronizationContext"/> nor takes any
/// lock that the caller could hold. (Finding 010.)
/// </summary>
public void Dispose() => _channel.DisposeAsync().AsTask().GetAwaiter().GetResult();
}
@@ -3,24 +3,28 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client;
/// <summary>
/// Connection options for <see cref="WonderwareHistorianClient"/>.
/// </summary>
/// <remarks>
/// <para>
/// <b>Retry / backoff ownership (finding 006):</b> this module performs exactly one
/// in-place transport reconnect inside <c>PipeChannel.InvokeAsync</c> with no delay,
/// and does NOT implement exponential reconnect backoff. Broader retry/backoff is the
/// caller's responsibility — the alarm drain worker
/// (<c>Core.AlarmHistorian.SqliteStoreAndForwardSink</c>) and the read-side
/// history router are expected to layer their own backoff on top.
/// </para>
/// </remarks>
/// <param name="PipeName">Named-pipe name the sidecar listens on (matches the sidecar's <c>OTOPCUA_HISTORIAN_PIPE</c>).</param>
/// <param name="SharedSecret">Per-process shared secret the sidecar will verify in the Hello frame.</param>
/// <param name="PeerName">Diagnostic peer identifier sent in Hello — typically the OtOpcUa instance id.</param>
/// <param name="ConnectTimeout">Cap on the named-pipe connect + Hello round trip on each (re)connect.</param>
/// <param name="CallTimeout">Cap on a single read/write call once connected.</param>
/// <param name="ReconnectInitialBackoff">Backoff between the first failed reconnect attempts.</param>
/// <param name="ReconnectMaxBackoff">Upper bound on the exponential backoff between reconnects.</param>
public sealed record WonderwareHistorianClientOptions(
string PipeName,
string SharedSecret,
string PeerName = "OtOpcUa",
TimeSpan? ConnectTimeout = null,
TimeSpan? CallTimeout = null,
TimeSpan? ReconnectInitialBackoff = null,
TimeSpan? ReconnectMaxBackoff = null)
TimeSpan? CallTimeout = null)
{
public TimeSpan EffectiveConnectTimeout => ConnectTimeout ?? TimeSpan.FromSeconds(10);
public TimeSpan EffectiveCallTimeout => CallTimeout ?? TimeSpan.FromSeconds(30);
public TimeSpan EffectiveReconnectInitialBackoff => ReconnectInitialBackoff ?? TimeSpan.FromMilliseconds(500);
public TimeSpan EffectiveReconnectMaxBackoff => ReconnectMaxBackoff ?? TimeSpan.FromSeconds(30);
}
@@ -1511,7 +1511,7 @@ public sealed class ModbusDriver
private const uint StatusBadNotWritable = 0x803B0000u;
private const uint StatusBadOutOfRange = 0x803C0000u;
private const uint StatusBadNotSupported = 0x803D0000u;
private const uint StatusBadDeviceFailure = 0x80550000u;
private const uint StatusBadDeviceFailure = 0x808B0000u;
private const uint StatusBadCommunicationError = 0x80050000u;
/// <summary>
@@ -69,7 +69,7 @@ public sealed class S7Driver(S7DriverOptions options, string driverInstanceId, I
/// <summary>OPC UA StatusCode used for socket / timeout / protocol-layer faults.</summary>
private const uint StatusBadCommunicationError = 0x80050000u;
/// <summary>OPC UA StatusCode used for a genuine device fault (CPU error, hardware fault).</summary>
private const uint StatusBadDeviceFailure = 0x80550000u;
private const uint StatusBadDeviceFailure = 0x808B0000u;
private readonly Dictionary<string, S7TagDefinition> _tagsByName = new(StringComparer.OrdinalIgnoreCase);
private readonly Dictionary<string, S7ParsedAddress> _parsedByName = new(StringComparer.OrdinalIgnoreCase);
@@ -16,7 +16,7 @@ public static class TwinCATStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -101,29 +101,44 @@ else
{
<Generations ClusterId="@ClusterId"/>
}
else if (_tab == "equipment" && _currentDraft is not null)
else if (_tab is "equipment" or "uns" or "namespaces" or "drivers" or "tags" or "acls")
{
<EquipmentTab GenerationId="@_currentDraft.GenerationId"/>
}
else if (_tab == "uns" && _currentDraft is not null)
{
<UnsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "namespaces" && _currentDraft is not null)
{
<NamespacesTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "drivers" && _currentDraft is not null)
{
<DriversTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "tags" && _currentDraft is not null)
{
<TagsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "acls" && _currentDraft is not null)
{
<AclsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
@* Bug #10 fix — these six tabs are scoped to a generation. Per docs/v2/admin-ui.md the
design intent is a read-only view of the published generation when no draft is open
("Edit in draft" affordance), and the editable view of the draft when one is open.
The earlier implementation rendered nothing in the no-draft case, leaving operators
with just the "Open a draft to edit" placeholder. We now route both states through
the same tab components, gating edits via <fieldset disabled> so a button click in
the read-only state cannot silently mutate the published rows even though the tab
components themselves haven't been refactored to honor an IsReadOnly flag yet. *@
var genId = _currentDraft?.GenerationId ?? _currentPublished?.GenerationId;
var isReadOnly = _currentDraft is null;
if (genId is null)
{
<section class="panel notice rise" style="animation-delay:.02s">
No published generation yet. Click <strong>New draft</strong> above to author this cluster's first generation.
</section>
}
else
{
if (isReadOnly)
{
<section class="panel notice rise mb-3" style="animation-delay:.02s">
<strong>Read-only view</strong> of published generation @genId. Click <strong>New draft</strong> above to make changes.
</section>
}
<fieldset disabled="@isReadOnly" style="border:0;padding:0;margin:0;min-width:0;">
@switch (_tab)
{
case "equipment": <EquipmentTab GenerationId="@genId.Value"/> break;
case "uns": <UnsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "namespaces": <NamespacesTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "drivers": <DriversTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "tags": <TagsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "acls": <AclsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
}
</fieldset>
}
}
else if (_tab == "redundancy")
{
@@ -133,10 +148,6 @@ else
{
<AuditTab ClusterId="@ClusterId"/>
}
else
{
<section class="panel notice rise" style="animation-delay:.02s">Open a draft to edit this cluster's content.</section>
}
}
@code {
@@ -16,12 +16,40 @@ public sealed class ClusterNodeService(OtOpcUaConfigDbContext db)
/// tolerance covers a missed heartbeat plus publisher GC pauses.</summary>
public static readonly TimeSpan StaleThreshold = TimeSpan.FromSeconds(30);
public Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct) =>
db.ClusterNodes.AsNoTracking()
public async Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct)
{
var nodes = await db.ClusterNodes.AsNoTracking()
.Where(n => n.ClusterId == clusterId)
.OrderByDescending(n => n.ServiceLevelBase)
.ThenBy(n => n.NodeId)
.ToListAsync(ct);
.ToListAsync(ct).ConfigureAwait(false);
// Bug #12 fix follow-up — the live-node heartbeat lands on
// ClusterNodeGenerationState.LastSeenAt (written by sp_RegisterNodeGenerationApplied
// on every generation poll). The ClusterNode.LastSeenAt column is a legacy slot that
// no current writer maintains, so reading it directly would show "never STALE"
// forever for every running node. Overlay the GenerationState heartbeat onto the
// returned ClusterNode rows when it's more recent so the Redundancy tab + IsStale
// predicate reflect actual liveness without needing a new write path or schema change.
var nodeIds = nodes.Select(n => n.NodeId).ToList();
if (nodeIds.Count > 0)
{
var heartbeats = await db.ClusterNodeGenerationStates.AsNoTracking()
.Where(s => nodeIds.Contains(s.NodeId))
.Select(s => new { s.NodeId, s.LastSeenAt })
.ToListAsync(ct).ConfigureAwait(false);
var beatByNode = heartbeats.ToDictionary(s => s.NodeId, s => s.LastSeenAt);
foreach (var n in nodes)
{
if (beatByNode.TryGetValue(n.NodeId, out var hb) && hb is not null
&& (n.LastSeenAt is null || hb > n.LastSeenAt))
{
n.LastSeenAt = hb;
}
}
}
return nodes;
}
public static bool IsStale(ClusterNode node) =>
node.LastSeenAt is null || DateTime.UtcNow - node.LastSeenAt.Value > StaleThreshold;
@@ -1,6 +1,7 @@
using Microsoft.Data.SqlClient;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
using ZB.MOM.WW.OtOpcUa.Server.Redundancy;
namespace ZB.MOM.WW.OtOpcUa.Server.Hosting;
@@ -42,10 +43,20 @@ public sealed class GenerationRefreshHostedService(
RedundancyCoordinator coordinator,
ILogger<GenerationRefreshHostedService> logger,
TimeSpan? tickInterval = null,
Func<CancellationToken, Task<long?>>? currentGenerationQuery = null) : BackgroundService
Func<CancellationToken, Task<long?>>? currentGenerationQuery = null,
Func<long, NodeApplyStatus, string?, CancellationToken, Task>? registerAppliedAsync = null) : BackgroundService
{
private readonly Func<CancellationToken, Task<long?>> _generationQuery = currentGenerationQuery
?? new Func<CancellationToken, Task<long?>>(ct => DefaultQueryCurrentGenerationAsync(options, logger, ct));
// Bug #12 fix — the server now reports applied-generation state + heartbeat back to the
// central DB via sp_RegisterNodeGenerationApplied. Before this wiring the proc had zero
// callers, so dbo.ClusterNodeGenerationState stayed empty for every node and the Admin UI
// Fleet status page + cluster-detail Redundancy LastSeenAt both showed "no node state /
// never STALE" indefinitely. Tests inject a stub via the registerAppliedAsync parameter.
private readonly Func<long, NodeApplyStatus, string?, CancellationToken, Task> _registerApplied = registerAppliedAsync
?? new Func<long, NodeApplyStatus, string?, CancellationToken, Task>(
(gen, status, err, ct) => DefaultRegisterAppliedAsync(options, logger, gen, status, err, ct));
/// <summary>
/// How often the service polls <c>sp_GetCurrentGenerationForCluster</c>. Default 5 s —
/// low enough that operator publishes take effect promptly, high enough that the
@@ -97,6 +108,18 @@ public sealed class GenerationRefreshHostedService(
if (LastAppliedGenerationId is long last && current == last)
{
// Heartbeat — re-stamps LastSeenAt on dbo.ClusterNodeGenerationState so the Admin
// Fleet status page + cluster Redundancy tab can detect the node is alive without
// a generation change. Best-effort: a transient DB error here must not throw out of
// the tick (the next tick will retry) and must not block applies.
try
{
await _registerApplied(current.Value, NodeApplyStatus.Applied, null, cancellationToken).ConfigureAwait(false);
}
catch (Exception hbEx) when (hbEx is not OperationCanceledException)
{
logger.LogDebug(hbEx, "Heartbeat to sp_RegisterNodeGenerationApplied failed; will retry next tick");
}
return; // no change
}
@@ -109,14 +132,44 @@ public sealed class GenerationRefreshHostedService(
// lease is open. Publisher ticks in parallel (1s cadence) will observe the band
// transition and push it onto the OPC UA Server.ServiceLevel node.
var publishRequestId = Guid.NewGuid();
await using (leases.BeginApplyLease(current.Value, publishRequestId))
NodeApplyStatus applyStatus;
string? applyError = null;
try
{
await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false);
// Future: fire a domain event that driver hosts / virtual-tag engine /
// scripted-alarm engine subscribe to. For now the topology refresh is the
// only thing we rewire — everything else still requires a process restart.
await using (leases.BeginApplyLease(current.Value, publishRequestId))
{
await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false);
// Future: fire a domain event that driver hosts / virtual-tag engine /
// scripted-alarm engine subscribe to. For now the topology refresh is the
// only thing we rewire — everything else still requires a process restart.
}
applyStatus = NodeApplyStatus.Applied;
}
catch (Exception applyEx) when (applyEx is not OperationCanceledException)
{
applyStatus = NodeApplyStatus.Failed;
applyError = applyEx.Message;
logger.LogError(applyEx, "Apply of generation {Generation} failed; will report Failed status to central DB", current);
// fall through to register so operators see the failed apply in /fleet
}
// Always tell the central DB what happened with this apply attempt — success or
// failure. The proc upserts dbo.ClusterNodeGenerationState (CurrentGenerationId +
// LastAppliedAt + LastAppliedStatus + LastAppliedError + LastSeenAt). Failure here
// mustn't prevent us from advancing LastAppliedGenerationId — the apply already
// happened (or already failed); the publish is purely observability.
try
{
await _registerApplied(current.Value, applyStatus, applyError, cancellationToken).ConfigureAwait(false);
}
catch (Exception regEx) when (regEx is not OperationCanceledException)
{
logger.LogWarning(regEx, "sp_RegisterNodeGenerationApplied call failed for gen {Generation} status {Status}", current, applyStatus);
}
// Advance the cursor even on Failed — the proc has been told; next tick will heartbeat
// and a future generation will trigger a fresh apply attempt. Pinning the cursor on
// failure would loop us through the same broken apply every 5s.
LastAppliedGenerationId = current;
RefreshCount++;
}
@@ -157,4 +210,35 @@ public sealed class GenerationRefreshHostedService(
return null;
}
}
/// <summary>
/// Default register-applied implementation — calls <c>sp_RegisterNodeGenerationApplied</c>
/// to MERGE-upsert <see cref="ZB.MOM.WW.OtOpcUa.Configuration.Entities.ClusterNodeGenerationState"/>
/// for this node. Called both at apply completion (success or failure) and on every
/// no-change heartbeat tick so <c>LastSeenAt</c> stays fresh in the central DB and the
/// Admin UI Fleet status page + Redundancy LastSeenAt indicator can detect a healthy node.
/// Bug #12 fix — wires the previously-orphaned proc into the apply loop.
/// </summary>
private static async Task DefaultRegisterAppliedAsync(
NodeOptions options,
ILogger logger,
long generationId,
NodeApplyStatus status,
string? error,
CancellationToken cancellationToken)
{
await using var conn = new SqlConnection(options.ConfigDbConnectionString);
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
await using var cmd = conn.CreateCommand();
cmd.CommandText = "EXEC dbo.sp_RegisterNodeGenerationApplied @NodeId=@n, @GenerationId=@g, @Status=@s, @Error=@e";
cmd.Parameters.AddWithValue("@n", options.NodeId);
cmd.Parameters.AddWithValue("@g", generationId);
cmd.Parameters.AddWithValue("@s", status.ToString());
cmd.Parameters.AddWithValue("@e", (object?)error ?? DBNull.Value);
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
// Single-line trace so soak runs can see heartbeat ticks without flooding at Info.
logger.LogTrace("Reported gen {Generation} status {Status} to central DB", generationId, status);
}
}
@@ -0,0 +1,165 @@
using CliFx.Exceptions;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Tests.Fakes;
namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Tests;
/// <summary>
/// Regression tests for Client.CLI-003: numeric command options must validate their ranges
/// and report a clean operator-facing error instead of silently passing bad values to the SDK.
/// </summary>
public class CommandRangeValidationTests
{
[Fact]
public async Task BrowseCommand_NegativeDepth_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new BrowseCommand(factory)
{
Url = "opc.tcp://localhost:4840",
Depth = -1
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--depth");
}
[Fact]
public async Task BrowseCommand_ZeroDepth_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new BrowseCommand(factory)
{
Url = "opc.tcp://localhost:4840",
Depth = 0
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--depth");
}
[Fact]
public async Task SubscribeCommand_ZeroInterval_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
Interval = 0
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--interval");
}
[Fact]
public async Task SubscribeCommand_NegativeInterval_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
Interval = -50
};
using var console = TestConsoleHelper.CreateConsole();
await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
}
[Fact]
public async Task SubscribeCommand_RecursiveZeroMaxDepth_ThrowsCommandException()
{
// --max-depth only matters when --recursive is set; validation is scoped to that combination.
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
Recursive = true,
MaxDepth = 0
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--max-depth");
}
[Fact]
public async Task SubscribeCommand_NegativeDuration_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
DurationSeconds = -5
};
using var console = TestConsoleHelper.CreateConsole();
await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
}
[Fact]
public async Task AlarmsCommand_ZeroInterval_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new AlarmsCommand(factory)
{
Url = "opc.tcp://localhost:4840",
Interval = 0
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--interval");
}
[Fact]
public async Task HistoryReadCommand_NegativeMax_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new HistoryReadCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
MaxValues = -10
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--max");
}
[Fact]
public async Task HistoryReadCommand_ZeroInterval_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new HistoryReadCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
Aggregate = "Average",
IntervalMs = 0
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--interval");
}
}
@@ -0,0 +1,59 @@
using Opc.Ua;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Tests.Fakes;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Tests;
/// <summary>
/// Regression tests for Client.CLI-009: long-running commands must detach their event handlers
/// from the service before the command finishes, so notifications fired during teardown don't
/// land in a disposed console.
/// </summary>
public class EventHandlerLifecycleTests
{
[Fact]
public async Task SubscribeCommand_AfterExit_DataChangedEventHasNoSubscribers()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=Node",
DurationSeconds = 1
};
using var console = TestConsoleHelper.CreateConsole();
await command.ExecuteAsync(console);
// The fake's event should have no subscribers after the command completes — every
// handler attached by the command must have been detached during teardown.
fakeService.HasDataChangedSubscribers.ShouldBeFalse(
"SubscribeCommand must detach its DataChanged handler before returning.");
}
[Fact]
public async Task AlarmsCommand_AfterExit_AlarmEventHasNoSubscribers()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new AlarmsCommand(factory)
{
Url = "opc.tcp://localhost:4840"
};
using var console = TestConsoleHelper.CreateConsole();
var task = Task.Run(async () => await command.ExecuteAsync(console));
await Task.Delay(150);
console.RequestCancellation();
await task;
fakeService.HasAlarmEventSubscribers.ShouldBeFalse(
"AlarmsCommand must detach its AlarmEvent handler before returning.");
}
}
@@ -55,6 +55,14 @@ public sealed class FakeOpcUaClientService : IOpcUaClientService
new("ns=2;s=Node2", "Node2", "Variable", false)
};
/// <summary>
/// Optional per-parent-node browse results. When a key matches the requested parent's
/// <see cref="NodeId.ToString" />, this dictionary takes precedence over <see cref="BrowseResults" />.
/// Tests exercising recursive walks (Client.CLI-010) use it to model a real subtree whose
/// child node ids do not collide on descent.
/// </summary>
public Dictionary<string, IReadOnlyList<BrowseResult>> BrowseResultsByParent { get; } = new();
public IReadOnlyList<DataValue> HistoryReadResult { get; set; } = new List<DataValue>
{
new(new Variant(10.0), StatusCodes.Good, DateTime.UtcNow.AddHours(-1), DateTime.UtcNow),
@@ -68,6 +76,7 @@ public sealed class FakeOpcUaClientService : IOpcUaClientService
public Exception? ReadException { get; set; }
public Exception? WriteException { get; set; }
public Exception? ConditionRefreshException { get; set; }
public Exception? SubscribeException { get; set; }
/// <inheritdoc />
public bool IsConnected => ConnectCalled && !DisconnectCalled;
@@ -84,6 +93,12 @@ public sealed class FakeOpcUaClientService : IOpcUaClientService
/// <inheritdoc />
public event EventHandler<ConnectionStateChangedEventArgs>? ConnectionStateChanged;
/// <summary>True when at least one handler is attached to <see cref="DataChanged" />.</summary>
public bool HasDataChangedSubscribers => DataChanged != null;
/// <summary>True when at least one handler is attached to <see cref="AlarmEvent" />.</summary>
public bool HasAlarmEventSubscribers => AlarmEvent != null;
/// <inheritdoc />
public Task<ConnectionInfo> ConnectAsync(ConnectionSettings settings, CancellationToken ct = default)
{
@@ -120,6 +135,9 @@ public sealed class FakeOpcUaClientService : IOpcUaClientService
public Task<IReadOnlyList<BrowseResult>> BrowseAsync(NodeId? parentNodeId = null, CancellationToken ct = default)
{
BrowseNodeIds.Add(parentNodeId);
var key = parentNodeId?.ToString();
if (key != null && BrowseResultsByParent.TryGetValue(key, out var keyed))
return Task.FromResult(keyed);
return Task.FromResult(BrowseResults);
}
@@ -127,6 +145,7 @@ public sealed class FakeOpcUaClientService : IOpcUaClientService
public Task SubscribeAsync(NodeId nodeId, int intervalMs = 1000, CancellationToken ct = default)
{
SubscribeCalls.Add((nodeId, intervalMs));
if (SubscribeException != null) throw SubscribeException;
return Task.CompletedTask;
}
@@ -105,7 +105,7 @@ public class HistoryReadCommandTests
}
[Fact]
public async Task Execute_InvalidAggregate_ThrowsArgumentException()
public async Task Execute_InvalidAggregate_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
@@ -117,7 +117,10 @@ public class HistoryReadCommandTests
};
using var console = TestConsoleHelper.CreateConsole();
await Should.ThrowAsync<ArgumentException>(async () => await command.ExecuteAsync(console));
// Operator-input errors now surface as CliFx CommandException (was ArgumentException)
// per Client.CLI-006 so malformed input prints a clean CLI error instead of a stack trace.
await Should.ThrowAsync<CliFx.Exceptions.CommandException>(
async () => await command.ExecuteAsync(console));
}
[Fact]
@@ -0,0 +1,98 @@
using CliFx.Exceptions;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Tests.Fakes;
namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Tests;
/// <summary>
/// Regression tests for Client.CLI-006: predictable operator-input errors must surface as
/// CliFx CommandException with a clean message, not raw FormatException/ArgumentException
/// with a stack trace.
/// </summary>
public class InputValidationErrorsTests
{
[Fact]
public async Task HistoryReadCommand_InvalidStartTime_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new HistoryReadCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
StartTime = "not-a-date"
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--start");
}
[Fact]
public async Task HistoryReadCommand_InvalidEndTime_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new HistoryReadCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
EndTime = "garbage"
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("--end");
}
[Fact]
public async Task HistoryReadCommand_InvalidAggregate_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new HistoryReadCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=N",
Aggregate = "NotARealAggregate"
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("aggregate", Case.Insensitive);
}
[Fact]
public async Task ReadCommand_InvalidNodeId_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new ReadCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "not-a-node-id"
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("node", Case.Insensitive);
}
[Fact]
public async Task SubscribeCommand_InvalidNodeId_ThrowsCommandException()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "not-a-node-id"
};
using var console = TestConsoleHelper.CreateConsole();
var ex = await Should.ThrowAsync<CommandException>(async () => await command.ExecuteAsync(console));
ex.Message.ShouldContain("node", Case.Insensitive);
}
}
@@ -0,0 +1,55 @@
using Serilog;
using Serilog.Core;
using Serilog.Events;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Tests.Fakes;
namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Tests;
/// <summary>
/// Regression test for Client.CLI-007: ConfigureLogging must dispose the previously assigned
/// Log.Logger before replacing it, so repeated CLI invocations do not leak sinks.
/// </summary>
public class LoggerLifecycleTests
{
[Fact]
public async Task ConfigureLogging_DisposesPreviousLogger_BeforeReassigning()
{
// Install a tracker logger before the command runs.
var trackerSink = new DisposeTrackingSink();
var trackerLogger = new LoggerConfiguration()
.WriteTo.Sink(trackerSink)
.CreateLogger();
Log.Logger = trackerLogger;
try
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new ConnectCommand(factory)
{
Url = "opc.tcp://localhost:4840"
};
using var console = TestConsoleHelper.CreateConsole();
await command.ExecuteAsync(console);
// The command's ConfigureLogging should have disposed the tracker logger we installed.
trackerSink.Disposed.ShouldBeTrue(
"Previous Log.Logger should be disposed via Log.CloseAndFlush() before ConfigureLogging assigns a new one.");
}
finally
{
Log.CloseAndFlush();
}
}
private sealed class DisposeTrackingSink : ILogEventSink, IDisposable
{
public bool Disposed { get; private set; }
public void Emit(LogEvent logEvent) { }
public void Dispose() => Disposed = true;
}
}
@@ -0,0 +1,258 @@
using Opc.Ua;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Tests.Fakes;
using BrowseResult = ZB.MOM.WW.OtOpcUa.Client.Shared.Models.BrowseResult;
namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Tests;
/// <summary>
/// Regression tests for SubscribeCommand summary bucketing, --duration, --quiet, --summary-file,
/// and recursive collection. Covers Client.CLI-002, -009, and -010.
/// </summary>
public class SubscribeCommandSummaryTests
{
[Fact]
public async Task Summary_NodeWithNoUpdate_IsCountedAsNeverNotAsNeverWentBad()
{
// Client.CLI-002: A node that received no updates at all is "never received an update",
// NOT a "suspect that never went bad".
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=SilentNode",
DurationSeconds = 1
};
using var console = TestConsoleHelper.CreateConsole();
await command.ExecuteAsync(console);
var output = TestConsoleHelper.GetOutput(console);
output.ShouldContain("No update received at all: 1");
output.ShouldContain("NEVER went bad (suspect): 0");
// The "suspect" detail header should not appear when the suspect list is empty.
output.ShouldNotContain("--- Nodes that NEVER received a bad-quality update (suspect) ---");
// The "never received update" detail header should appear.
output.ShouldContain("--- Nodes that never received an update at all ---");
}
[Fact]
public async Task Summary_NodeReceivedOnlyGoodValues_IsCountedAsNeverWentBad()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=GoodNode",
// Use --duration so the command auto-exits.
DurationSeconds = 2
};
using var console = TestConsoleHelper.CreateConsole();
var runTask = Task.Run(async () => await command.ExecuteAsync(console));
// Wait until the subscription is registered by the fake.
await WaitForAsync(() => fakeService.SubscribeCalls.Count == 1);
fakeService.RaiseDataChanged(
"ns=2;s=GoodNode",
new DataValue(new Variant(42), StatusCodes.Good, DateTime.UtcNow, DateTime.UtcNow));
await runTask;
var output = TestConsoleHelper.GetOutput(console);
output.ShouldContain("NEVER went bad (suspect): 1");
output.ShouldContain("Last status GOOD: 1");
output.ShouldContain("No update received at all: 0");
}
[Fact]
public async Task Summary_NodeReceivedBadValue_IsCountedAsEverWentBad()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=BadNode",
DurationSeconds = 2
};
using var console = TestConsoleHelper.CreateConsole();
var runTask = Task.Run(async () => await command.ExecuteAsync(console));
await WaitForAsync(() => fakeService.SubscribeCalls.Count == 1);
fakeService.RaiseDataChanged(
"ns=2;s=BadNode",
new DataValue(Variant.Null, StatusCodes.BadDeviceFailure, DateTime.UtcNow, DateTime.UtcNow));
await runTask;
var output = TestConsoleHelper.GetOutput(console);
output.ShouldContain("Ever went BAD during window: 1");
output.ShouldContain("NEVER went bad (suspect): 0");
}
[Fact]
public async Task Duration_ZeroOrPositive_AutoExits()
{
// --duration > 0 should make the command exit on its own without needing cancellation.
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=AutoExit",
DurationSeconds = 1
};
using var console = TestConsoleHelper.CreateConsole();
var sw = System.Diagnostics.Stopwatch.StartNew();
await command.ExecuteAsync(console);
sw.Stop();
sw.Elapsed.ShouldBeGreaterThanOrEqualTo(TimeSpan.FromMilliseconds(900));
sw.Elapsed.ShouldBeLessThan(TimeSpan.FromSeconds(10));
var output = TestConsoleHelper.GetOutput(console);
output.ShouldContain("==================== SUMMARY ====================");
}
[Fact]
public async Task Quiet_SuppressesPerUpdateOutputButPrintsSummary()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=Quiet",
Quiet = true,
DurationSeconds = 2
};
using var console = TestConsoleHelper.CreateConsole();
var runTask = Task.Run(async () => await command.ExecuteAsync(console));
await WaitForAsync(() => fakeService.SubscribeCalls.Count == 1);
fakeService.RaiseDataChanged(
"ns=2;s=Quiet",
new DataValue(new Variant(1.0), StatusCodes.Good, DateTime.UtcNow, DateTime.UtcNow));
await runTask;
var output = TestConsoleHelper.GetOutput(console);
// No per-update "value = ..." line should appear because --quiet was set.
output.ShouldNotContain(" = 1 (");
// Summary section is still printed.
output.ShouldContain("==================== SUMMARY ====================");
}
[Fact]
public async Task SummaryFile_WritesSummaryToDisk()
{
var fakeService = new FakeOpcUaClientService();
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var tempFile = Path.Combine(Path.GetTempPath(), $"otopcua-cli-summary-{Guid.NewGuid():N}.txt");
try
{
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=SummaryFileNode",
DurationSeconds = 1,
SummaryFile = tempFile
};
using var console = TestConsoleHelper.CreateConsole();
await command.ExecuteAsync(console);
File.Exists(tempFile).ShouldBeTrue();
var contents = await File.ReadAllTextAsync(tempFile);
contents.ShouldContain("SUMMARY");
contents.ShouldContain("Total subscribed: 1");
}
finally
{
if (File.Exists(tempFile)) File.Delete(tempFile);
}
}
[Fact]
public async Task Recursive_BrowsesSubtreeAndSubscribesEveryVariable()
{
// Root has a Folder child and a top-level Variable; the Folder contains a second Variable.
// Each node id is distinct so the ToDictionary(t => t.nodeId.ToString()) collapse
// doesn't trip a duplicate-key error.
var fakeService = new FakeOpcUaClientService();
fakeService.BrowseResultsByParent["ns=2;s=Root"] = new List<BrowseResult>
{
new("ns=2;s=Folder", "Folder", "Object", true),
new("ns=2;s=Tag1", "Tag1", "Variable", false)
};
fakeService.BrowseResultsByParent["ns=2;s=Folder"] = new List<BrowseResult>
{
new("ns=2;s=Tag2", "Tag2", "Variable", false)
};
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=Root",
Recursive = true,
MaxDepth = 3,
DurationSeconds = 1
};
using var console = TestConsoleHelper.CreateConsole();
await command.ExecuteAsync(console);
// Both Variables (depth 1 + depth 2) should have been subscribed.
fakeService.SubscribeCalls.Count.ShouldBeGreaterThanOrEqualTo(2);
fakeService.SubscribeCalls.ShouldContain(c => c.NodeId.Identifier.ToString() == "Tag1");
fakeService.SubscribeCalls.ShouldContain(c => c.NodeId.Identifier.ToString() == "Tag2");
var output = TestConsoleHelper.GetOutput(console);
output.ShouldContain("Browsing subtree of ns=2;s=Root");
}
[Fact]
public async Task SubscribeFailure_PrintsFailedMessage_DoesNotCrash()
{
var fakeService = new FakeOpcUaClientService
{
SubscribeException = new InvalidOperationException("forced failure")
};
var factory = new FakeOpcUaClientServiceFactory(fakeService);
var command = new SubscribeCommand(factory)
{
Url = "opc.tcp://localhost:4840",
NodeId = "ns=2;s=FailNode",
DurationSeconds = 1
};
using var console = TestConsoleHelper.CreateConsole();
await command.ExecuteAsync(console);
var output = TestConsoleHelper.GetOutput(console);
output.ShouldContain("FAILED to subscribe");
output.ShouldContain("forced failure");
// The summary block is still printed.
output.ShouldContain("==================== SUMMARY ====================");
}
private static async Task WaitForAsync(Func<bool> predicate, int timeoutMs = 5000)
{
var deadline = DateTime.UtcNow.AddMilliseconds(timeoutMs);
while (!predicate() && DateTime.UtcNow < deadline)
await Task.Delay(25);
if (!predicate())
throw new TimeoutException("Condition not met within timeout.");
}
}
@@ -0,0 +1,163 @@
using Opc.Ua;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Adapters;
namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Tests.Adapters;
/// <summary>
/// Regression tests for the pure best-endpoint selection logic extracted from
/// <see cref="DefaultEndpointDiscovery"/> (Client.Shared-011). The selector picks the best
/// endpoint matching the requested security mode (preferring Basic256Sha256) and rewrites
/// the discovered endpoint URL hostname to match the operator-supplied URL so internal DNS
/// hostnames in discovery responses do not leak into the session.
/// </summary>
public class EndpointSelectorTests
{
private static EndpointDescription Ep(MessageSecurityMode mode, string policy, string url)
{
return new EndpointDescription
{
EndpointUrl = url,
SecurityMode = mode,
SecurityPolicyUri = policy,
};
}
/// <summary>
/// Verifies that the selector returns the only endpoint matching the requested
/// security mode even when other endpoints with different modes are present.
/// </summary>
[Fact]
public void SelectBest_PicksMatchingSecurityMode()
{
var endpoints = new[]
{
Ep(MessageSecurityMode.None, SecurityPolicies.None, "opc.tcp://server:4840"),
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic256Sha256, "opc.tcp://server:4840"),
Ep(MessageSecurityMode.SignAndEncrypt, SecurityPolicies.Basic256Sha256, "opc.tcp://server:4840"),
};
var best = EndpointSelector.SelectBest(endpoints, "opc.tcp://server:4840", MessageSecurityMode.Sign);
best.SecurityMode.ShouldBe(MessageSecurityMode.Sign);
best.SecurityPolicyUri.ShouldBe(SecurityPolicies.Basic256Sha256);
}
/// <summary>
/// Verifies that when multiple endpoints match the requested mode, Basic256Sha256 wins
/// over weaker policies — even when Basic256Sha256 is not the first encountered.
/// </summary>
[Fact]
public void SelectBest_PrefersBasic256Sha256WhenMultipleMatch()
{
var endpoints = new[]
{
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic128Rsa15, "opc.tcp://server:4840"),
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic256Sha256, "opc.tcp://server:4840"),
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic256, "opc.tcp://server:4840"),
};
var best = EndpointSelector.SelectBest(endpoints, "opc.tcp://server:4840", MessageSecurityMode.Sign);
best.SecurityPolicyUri.ShouldBe(SecurityPolicies.Basic256Sha256);
}
/// <summary>
/// Verifies that the selector falls back to the first matching endpoint when no
/// Basic256Sha256 endpoint is advertised for the requested security mode.
/// </summary>
[Fact]
public void SelectBest_FallsBackToFirstMatchWhenNoBasic256Sha256()
{
var endpoints = new[]
{
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic128Rsa15, "opc.tcp://server:4840"),
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic256, "opc.tcp://server:4840"),
};
var best = EndpointSelector.SelectBest(endpoints, "opc.tcp://server:4840", MessageSecurityMode.Sign);
best.SecurityPolicyUri.ShouldBe(SecurityPolicies.Basic128Rsa15);
}
/// <summary>
/// Verifies that no matching endpoint produces an InvalidOperationException whose
/// message lists the available security mode/policy combinations to aid diagnosis.
/// </summary>
[Fact]
public void SelectBest_NoMatchingMode_ThrowsWithDiagnostic()
{
var endpoints = new[]
{
Ep(MessageSecurityMode.None, SecurityPolicies.None, "opc.tcp://server:4840"),
};
var ex = Should.Throw<InvalidOperationException>(() =>
EndpointSelector.SelectBest(endpoints, "opc.tcp://server:4840", MessageSecurityMode.SignAndEncrypt));
ex.Message.ShouldContain("SignAndEncrypt");
ex.Message.ShouldContain("None"); // available endpoint listed in the message
}
/// <summary>
/// Verifies that the selector rewrites the discovery-returned hostname to the
/// operator-supplied hostname so internal DNS names in the response do not leak
/// into the resulting session.
/// </summary>
[Fact]
public void SelectBest_RewritesHostToMatchRequestedUrl()
{
var endpoints = new[]
{
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic256Sha256,
"opc.tcp://internal-host:4840/UA/Server"),
};
var best = EndpointSelector.SelectBest(endpoints, "opc.tcp://external-host:4840",
MessageSecurityMode.Sign);
new Uri(best.EndpointUrl).Host.ShouldBe("external-host");
}
/// <summary>
/// Verifies that when the discovery host already matches the requested host the
/// endpoint URL is left untouched.
/// </summary>
[Fact]
public void SelectBest_HostsMatch_LeavesUrlUnchanged()
{
var endpoints = new[]
{
Ep(MessageSecurityMode.Sign, SecurityPolicies.Basic256Sha256,
"opc.tcp://server:4840/UA/Server"),
};
var best = EndpointSelector.SelectBest(endpoints, "opc.tcp://server:4840",
MessageSecurityMode.Sign);
best.EndpointUrl.ShouldBe("opc.tcp://server:4840/UA/Server");
}
/// <summary>
/// Verifies that a null endpoints argument throws ArgumentNullException rather than
/// producing a confusing downstream NullReferenceException.
/// </summary>
[Fact]
public void SelectBest_NullEndpoints_Throws()
{
Should.Throw<ArgumentNullException>(() =>
EndpointSelector.SelectBest(null!, "opc.tcp://server:4840", MessageSecurityMode.None));
}
/// <summary>
/// Verifies that an empty endpointUrl produces ArgumentException so the caller gets
/// a clear contract violation rather than a downstream UriFormatException.
/// </summary>
[Fact]
public void SelectBest_EmptyEndpointUrl_Throws()
{
Should.Throw<ArgumentException>(() =>
EndpointSelector.SelectBest(Array.Empty<EndpointDescription>(), "", MessageSecurityMode.None));
}
}
@@ -168,10 +168,25 @@ internal sealed class FakeSessionAdapter : ISessionAdapter
}
}
/// <summary>
/// Number of times <see cref="CallMethodAsync"/> was invoked so tests can assert
/// acknowledgment workflows reached the session adapter.
/// </summary>
public int CallMethodCount { get; private set; }
/// <summary>
/// When set, <see cref="CallMethodAsync"/> throws this exception — used to simulate
/// a bad method call status surfacing as a <see cref="ServiceResultException"/>.
/// </summary>
public Exception? CallMethodException { get; set; }
/// <inheritdoc />
public Task<IList<object>?> CallMethodAsync(NodeId objectId, NodeId methodId, object[] inputArguments,
CancellationToken ct = default)
{
CallMethodCount++;
if (CallMethodException != null)
throw CallMethodException;
return Task.FromResult<IList<object>?>(null);
}
@@ -18,8 +18,10 @@ public class ConnectionSettingsTests
settings.SecurityMode.ShouldBe(SecurityMode.None);
settings.SessionTimeoutSeconds.ShouldBe(60);
settings.AutoAcceptCertificates.ShouldBeTrue();
settings.CertificateStorePath.ShouldContain("OtOpcUaClient");
settings.CertificateStorePath.ShouldContain("pki");
// CertificateStorePath defaults to empty so constructing settings does not
// touch disk; DefaultApplicationConfigurationFactory resolves the canonical
// PKI path lazily on first connect (Client.Shared-010).
settings.CertificateStorePath.ShouldBe(string.Empty);
}
[Fact]
@@ -996,6 +996,143 @@ public class OpcUaClientServiceTests : IDisposable
eventCount.ShouldBe(0);
}
// --- AcknowledgeAlarm tests (Client.Shared-009) ---
/// <summary>
/// Verifies that a successful acknowledge call returns <see cref="StatusCodes.Good"/>
/// and reaches the session adapter's CallMethodAsync (Client.Shared-009).
/// </summary>
[Fact]
public async Task AcknowledgeAlarmAsync_OnSuccess_ReturnsGood()
{
var session = new FakeSessionAdapter();
_sessionFactory.EnqueueSession(session);
await _service.ConnectAsync(ValidSettings());
var result = await _service.AcknowledgeAlarmAsync("ns=2;s=Cond", new byte[] { 1, 2 }, "acked");
result.ShouldBe(StatusCodes.Good);
session.CallMethodCount.ShouldBe(1);
}
/// <summary>
/// Regression for Client.Shared-009: a bad call result must surface as the returned
/// <see cref="StatusCode"/> rather than escape as an uncaught
/// <see cref="ServiceResultException"/>, so callers using
/// <c>if (StatusCode.IsBad(result))</c> actually see the failure.
/// </summary>
[Fact]
public async Task AcknowledgeAlarmAsync_OnServiceResultException_ReturnsBadStatusCode()
{
var session = new FakeSessionAdapter
{
CallMethodException = new ServiceResultException(
StatusCodes.BadConditionAlreadyEnabled, "already acked")
};
_sessionFactory.EnqueueSession(session);
await _service.ConnectAsync(ValidSettings());
var result = await _service.AcknowledgeAlarmAsync("ns=2;s=Cond", new byte[] { 1, 2 }, "acked");
StatusCode.IsBad(result).ShouldBeTrue();
result.Code.ShouldBe(StatusCodes.BadConditionAlreadyEnabled);
}
/// <summary>
/// Verifies the ".Condition" suffix is appended when the caller supplies the
/// source node, but left alone when the caller already passes the condition node —
/// matches the documented contract.
/// </summary>
[Fact]
public async Task AcknowledgeAlarmAsync_LeavesConditionSuffixAlone()
{
var session = new FakeSessionAdapter();
_sessionFactory.EnqueueSession(session);
await _service.ConnectAsync(ValidSettings());
await _service.AcknowledgeAlarmAsync("ns=2;s=Cond.Condition", new byte[] { 1, 2 }, "acked");
// Both call shapes reach the adapter once.
session.CallMethodCount.ShouldBe(1);
}
// --- Alarm fallback path (Client.Shared-011) ---
/// <summary>
/// Regression for Client.Shared-011: when standard AckedState/Id and ActiveState/Id
/// fields are missing (null Value) but the SourceNode (ConditionId) field at index 12
/// is populated, the client launches the Task.Run fallback that reads
/// <c>InAlarm</c>/<c>Acked</c> from the condition node's Galaxy attributes. Verify
/// the alarm event is delivered with the values from the supplemental reads.
/// </summary>
[Fact]
public async Task OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent()
{
var fakeSub = new FakeSubscriptionAdapter();
var session = new FakeSessionAdapter
{
NextSubscription = fakeSub,
ReadResponseFunc = nodeId =>
{
var key = nodeId.ToString();
if (key.EndsWith(".InAlarm"))
return new DataValue(new Variant(true), StatusCodes.Good);
if (key.EndsWith(".Acked"))
return new DataValue(new Variant(false), StatusCodes.Good);
if (key.EndsWith(".TimeAlarmOn"))
return new DataValue(new Variant(new DateTime(2026, 1, 1, 12, 0, 0)), StatusCodes.Good);
if (key.EndsWith(".DescAttrName"))
return new DataValue(new Variant("Fallback message"), StatusCodes.Good);
return new DataValue(StatusCodes.BadNodeIdUnknown);
}
};
_sessionFactory.EnqueueSession(session);
await _service.ConnectAsync(ValidSettings());
AlarmEventArgs? received = null;
var raised = new TaskCompletionSource();
_service.AlarmEvent += (_, e) =>
{
received = e;
raised.TrySetResult();
};
await _service.SubscribeAlarmsAsync();
var handle = fakeSub.ActiveHandles.First();
// AckedState/Id (8) and ActiveState/Id (9) are present but Variant.Value is null,
// which triggers the supplemental Galaxy-attribute fallback; SourceNode (12) is set.
var fields = new EventFieldList
{
EventFields =
[
new Variant(new byte[] { 1, 2, 3 }), // 0: EventId
new Variant(ObjectTypeIds.AlarmConditionType), // 1: EventType
new Variant("Source1"), // 2: SourceName
new Variant(DateTime.MinValue), // 3: Time
new Variant(new LocalizedText("Initial")), // 4: Message
new Variant((ushort)400), // 5: Severity
new Variant("CondName"), // 6: ConditionName
new Variant(true), // 7: Retain
Variant.Null, // 8: AckedState/Id — missing
Variant.Null, // 9: ActiveState/Id — missing
new Variant(true), // 10: EnabledState/Id
new Variant(false), // 11: SuppressedOrShelved
new Variant("ns=2;s=ConditionId") // 12: SourceNode
]
};
fakeSub.SimulateEvent(handle, fields);
// The fallback runs on a background Task.Run continuation — wait briefly for it.
await Task.WhenAny(raised.Task, Task.Delay(500));
received.ShouldNotBeNull();
received!.ActiveState.ShouldBeTrue(); // from InAlarm read
received.AckedState.ShouldBeFalse(); // from Acked read
received.ConditionNodeId.ShouldBe("ns=2;s=ConditionId");
received.Message.ShouldBe("Fallback message"); // from DescAttrName read
}
// --- Failover tests ---
/// <summary>
@@ -138,4 +138,21 @@ public class AlarmsViewModelTests
{
_vm.Interval.ShouldBe(1000);
}
/// <summary>
/// Regression test for Client.UI-006 — when SubscribeAlarmsAsync throws, the failure must be
/// surfaced to the operator via the view model's StatusMessage rather than silently swallowed.
/// </summary>
[Fact]
public async Task Subscribe_OnFailure_SurfacesStatusMessage()
{
_vm.IsConnected = true;
_service.SubscribeAlarmsException = new Exception("Server returned BadSubscriptionIdInvalid");
await _vm.SubscribeCommand.ExecuteAsync(null);
_vm.IsSubscribed.ShouldBeFalse();
_vm.StatusMessage.ShouldNotBeNullOrWhiteSpace();
_vm.StatusMessage.ShouldContain("BadSubscriptionIdInvalid");
}
}
@@ -175,12 +175,23 @@ public sealed class FakeOpcUaClientService : IOpcUaClientService
return Task.FromResult(BrowseResults);
}
/// <summary>
/// Gets or sets the exception thrown to simulate subscribe failures in the UI.
/// </summary>
public Exception? SubscribeException { get; set; }
/// <summary>
/// Gets or sets the exception thrown to simulate alarm-subscribe failures in the UI.
/// </summary>
public Exception? SubscribeAlarmsException { get; set; }
/// <inheritdoc />
public Task SubscribeAsync(NodeId nodeId, int intervalMs = 1000, CancellationToken ct = default)
{
SubscribeCallCount++;
LastSubscribeNodeId = nodeId;
LastSubscribeIntervalMs = intervalMs;
if (SubscribeException != null) throw SubscribeException;
return Task.CompletedTask;
}
@@ -196,6 +207,7 @@ public sealed class FakeOpcUaClientService : IOpcUaClientService
public Task SubscribeAlarmsAsync(NodeId? sourceNodeId = null, int intervalMs = 1000, CancellationToken ct = default)
{
SubscribeAlarmsCallCount++;
if (SubscribeAlarmsException != null) throw SubscribeAlarmsException;
return Task.CompletedTask;
}
@@ -515,6 +515,25 @@ public class MainWindowViewModelTests
_settingsService.LastSaved!.SubscribedNodes.ShouldContain("ns=2;s=TestSub");
}
/// <summary>
/// Regression test for Client.UI-006 — when GetRedundancyInfoAsync throws (the server
/// does not implement the redundancy facet) the connection must still succeed and the
/// view model must leave RedundancyInfo null without crashing or hiding the diagnostic.
/// The Status text is expected to remain "Connected" (redundancy is optional).
/// </summary>
[Fact]
public async Task ConnectCommand_RedundancyFailure_DoesNotBreakConnection()
{
_service.RedundancyException = new Exception("BadServiceUnsupported");
await _vm.ConnectCommand.ExecuteAsync(null);
_vm.ConnectionState.ShouldBe(ConnectionState.Connected);
_vm.RedundancyInfo.ShouldBeNull();
// Connection succeeded; status reflects connection, not the optional redundancy probe failure
_vm.StatusMessage.ShouldContain("Connected");
}
/// <summary>
/// Verifies that saved subscriptions are restored after reconnecting the shell.
/// </summary>

Some files were not shown because too many files have changed in this diff Show More