12 Commits

Author SHA1 Message Date
Joseph Doherty f23e368a74 fix(server, admin): wire sp_RegisterNodeGenerationApplied + overlay heartbeat onto ClusterNode
dbo.sp_RegisterNodeGenerationApplied was defined by the initial
StoredProcedures migration but had zero callers in src/. The server
polled sp_GetCurrentGenerationForCluster every 5s but never reported
back, so dbo.ClusterNodeGenerationState stayed empty for every node
and both the Admin UI Fleet status page ("No node state recorded")
and the cluster-detail Redundancy LastSeenAt indicator ("never
STALE") showed broken liveness forever.

Server side (GenerationRefreshHostedService):
* New testable seam: Func<long, NodeApplyStatus, string?, CT, Task>?
  registerAppliedAsync constructor parameter, defaulting to a real
  sp_RegisterNodeGenerationApplied call against the central DB.
* TickAsync now calls the proc at two points: after every successful
  apply with NodeApplyStatus.Applied, and on every no-change tick as
  a heartbeat (also Applied) so LastSeenAt stays fresh.
* Apply failures now wrap the lease + coordinator.RefreshAsync in a
  try/catch, report NodeApplyStatus.Failed with the exception message,
  and advance LastAppliedGenerationId regardless of outcome so we
  don't loop on the same broken apply every 5s.
* Register-call failures are best-effort (LogDebug heartbeat, LogWarning
  apply-report) — a transient DB outage during reporting must not
  crash the publisher or block the next apply.

Admin side (ClusterNodeService.ListByClusterAsync): the Redundancy tab
reads ClusterNode.LastSeenAt, but no current writer maintains that
column — the heartbeat goes to ClusterNodeGenerationState.LastSeenAt.
Overlay the GenerationState heartbeat onto the returned ClusterNode
rows when more recent, so IsStale + the Redundancy table column
reflect actual liveness without a schema change or new write path.

Tests: 3 new cases on GenerationRefreshHostedServiceTests verify
first-apply reports Applied, no-change ticks heartbeat with Applied,
and register-call failure does not roll back the cursor or block
subsequent ticks. All 8 GenerationRefresh tests pass.

Verified live on node-dev-a / cluster-dev: dbo.ClusterNodeGenerationState
now populated with CurrentGenerationId=1, LastAppliedStatus=Applied,
fresh LastSeenAt. Fleet status page shows the node (KPIs NODES 1 /
APPLIED 1 / STALE 0 / FAILED 0). Redundancy tab KPI STALE went 1\xe2\x86\x920 and
the row shows a real LAST SEEN timestamp. Bonus: FleetStatusHub
SignalR push now fires the cluster-page Live update banner on every
heartbeat because there are finally state changes to push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:59 -04:00
Joseph Doherty c8de58d6d3 fix(admin-ui): render published gen read-only on the 6 cluster-detail content tabs
The Equipment, UNS Structure, Namespaces, Drivers, Tags, and ACLs tabs
all rendered only an "Open a draft to edit" placeholder when no draft
was open — even when the cluster had a fully populated published
generation. docs/v2/admin-ui.md \xa7Cluster Detail describes these as
"read-only views of the published generation" with an "Edit in draft"
affordance; that view was never wired. The earlier code path also
correctly rendered nothing when the cluster had no published gen yet,
which was indistinguishable from the broken state.

Collapse the six per-tab conditions into one shared branch that threads
the published gen ID into the existing tab components when no draft
exists, wrapped in <fieldset disabled> so any Add/Edit button click in
the read-only state cannot silently mutate published rows even though
the tab components themselves don't yet honor an IsReadOnly flag.
Banner above the content explains the state. Surgical: zero changes to
the ~1500 LoC across the six tab components.

Verified live on cluster-dev gen 1: Drivers tab now shows the
cluster-dev-galaxy-main GalaxyMxGateway row read-only; Namespaces tab
shows cluster-dev-galaxy-ns SystemPlatform row; both with the read-only
banner and visibly disabled affordances.

Follow-up worth doing later: refactor each tab component to accept an
IsReadOnly parameter so the disabled-affordance UX is per-tab rather
than a blanket fieldset opacity wash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:32 -04:00
Joseph Doherty 8fe7c8bea6 refactor(driver-galaxy): switch to sibling-repo MxGateway client + drop vendored libs
The sibling mxaccessgw repo (clients/dotnet/) restored a proper client
library + contracts under the new ZB.MOM.WW.MxGateway namespace, so the
binary-vendoring stopgap from PR Driver.Galaxy-016 can unwind via plan #1
of libs/README.md.

- csproj: replace <Reference HintPath="libs\MxGateway.*.dll"> with a
  ProjectReference into ..\..\..\..\mxaccessgw\clients\dotnet  ZB.MOM.WW.MxGateway.Client\. The five backfill PackageReference shims
  (Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client, Polly.Core,
  Microsoft.Extensions.Logging.Abstractions) are now transitive again.
- Source: 'using MxGateway.X' -> 'using ZB.MOM.WW.MxGateway.X' across
  19 driver files + 14 test files. No fully-qualified MxGateway.* usages
  in code, so no behavioural changes — purely a using-prefix flip.
- libs/: deleted MxGateway.Client.dll, MxGateway.Contracts.dll, README.md
  (orphan after the unwind).

Verified: dotnet build clean (Release), all 245 Driver.Galaxy unit tests
pass, OtOpcUa service running with the new client DLL loaded
(opc.tcp://localhost:4840/OtOpcUa, no FileNotFound/TypeLoad/
MissingMethod in startup logs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 14:55:15 -04:00
Joseph Doherty c6082aa0b9 fix(admin-e2e): register missing DI services so ClusterDetail interactive circuit boots
UnsTabDragDropE2ETests were timing out at the 'UNS Structure' nav-link
locator because AdminWebAppFactory never registered AdminHubConnectionFactory
/ HubTokenService / DataProtection — ClusterDetail.razor's @inject threw at
circuit boot, so the page never advanced past the Loading placeholder. 2 → 3
pass after the registrations land. Also documents the Modbus standard-vs-
exception_injection coverage matrix in the fixture README + cross-references
docs/drivers/AbServer-Test-Fixture.md from each Emulate test so a developer
landing on a skipped test has a direct doc pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:07:17 -04:00
Joseph Doherty b1f3e09661 test(modbus, abcip): align failing integration tests with fixtures
Two real bugs uncovered by re-running with the new fixture defaults
pointing at the shared docker host. Both are test-side, not driver-side.

AbCip — Driver_reads_seeded_DInt_from_ab_server (4 parametrized rows):
  Hardcoded 'ab://127.0.0.1:{port}/1,0' in the deviceUri instead of
  the resolved fixture.Host. The new 10.100.0.35 default (and any
  AB_SERVER_ENDPOINT override) silently couldn't reach this test —
  the driver tried to connect to a non-existent localhost:44818 and
  returned BadCommunicationError on all 4 profile rows. The sibling
  Emulate tests already use the fixture's resolved endpoint; this
  smoke test was missed in the original migration.

  Fix: deviceUri = $"ab://{fixture.Host}:{fixture.Port}/1,0".

Modbus — Float32_With_CDAB_Roundtrips_Through_Wire:
  Test wrote a Float32 to HR 100 (2 consecutive registers: 100+101).
  standard.json's writable HR range declares [100,100] only — a
  single-cell auto-incrementing register, not a 2-register pair. The
  write to register 101 was rejected with Illegal Data Address
  (BadOutOfRange).

  Fix: moved the tag from HR 100 to HR 200 (in standard.json's
  '[200, 209]' scratch range — 10 consecutive writable HRs). The
  Float32+CDAB semantic the test exercises is unchanged.

Modbus — Block_Read_Coalescing_Reduces_PDU_Count_End_To_End:
  Test read HR 300, 302, 304 — outside both the writable ranges and
  the uint16 seed list. pymodbus rejects reads to unseeded HRs even
  though 'hr size' is 2048. BadOutOfRange on every read.

  Fix: moved the tags from 300/302/304 to 200/202/204 (within the
  scratch range). The non-contiguous coalescing semantic (3 tags
  inside a 5-register window with MaxReadGap=5) is preserved.

After this commit:
  - Modbus.IntegrationTests: 6/38 pass / 32 skip / 0 fail
    (was 4 pass / 32 skip / 2 fail; 32 skips are profile-gated
    ExceptionInjectionTests — they need MODBUS_SIM_PROFILE=
    exception_injection and a different container, intentional gating)
  - AbCip.IntegrationTests: 10/12 pass / 2 skip / 0 fail
    (was 6 pass / 2 skip / 4 fail; 2 skips are Emulate tests that
    need the fixture for separate scenarios)

No driver code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:45:57 -04:00
Joseph Doherty 49644fc7fd test(fixtures): migrate integration-test fixture defaults to 10.100.0.35
CLAUDE.md "Docker Workflow" claims (per the 2026-04-28 migration note)
that all fixture-class default endpoints were rewritten to target the
shared Docker host at 10.100.0.35. Audit during today's e2e run showed
the claim was incomplete — five fixture classes still defaulted to
localhost / 127.0.0.1, causing every fixture-touching integration test
to skip with "endpoint unreachable" on a fresh box that hadn't set
the override env vars.

Files corrected:

  - tests/.../Modbus.IntegrationTests/ModbusSimulatorFixture.cs
      DefaultEndpoint: localhost:5020 → 10.100.0.35:5020
  - tests/.../S7.IntegrationTests/Snap7ServerFixture.cs
      DefaultEndpoint: localhost:1102 → 10.100.0.35:1102
  - tests/.../OpcUaClient.IntegrationTests/OpcPlcFixture.cs
      DefaultEndpoint: opc.tcp://localhost:50000 → opc.tcp://10.100.0.35:50000
  - tests/.../AbCip.IntegrationTests/AbServerFixture.cs
      Host default + ResolveHost fallback: 127.0.0.1 → 10.100.0.35
  - tests/.../AbLegacy.IntegrationTests/AbLegacyServerFixture.cs
      Host default + ResolveEndpoint fallback: 127.0.0.1 → 10.100.0.35

XML doc comments referencing the old localhost defaults were updated in
the same pass so the class-summary documentation matches the actual
default. The override-via-env-var mechanism (MODBUS_SIM_ENDPOINT,
AB_SERVER_ENDPOINT, AB_LEGACY_ENDPOINT, S7_SIM_ENDPOINT,
OPCUA_SIM_ENDPOINT) is unchanged — pointing at a real PLC or a
locally-running container still works exactly as before.

Verification:
  - Solution-wide dotnet build: 0 errors.
  - S7.IntegrationTests: 3/3 pass without env-var override.
  - OpcUaClient.IntegrationTests: 3/3 pass without env-var override.
  - Modbus.IntegrationTests: 4/38 (same as the env-var-override run —
    the 2 failures + 32 skips are pre-existing fixture-profile
    mismatches unrelated to this fix).
  - AbCip.IntegrationTests / AbLegacy.IntegrationTests: same results
    as the env-var-override run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:32:23 -04:00
Joseph Doherty 3d982d9a65 docs: sync against recent code changes
Five doc-content updates after this session's code-review resolution
sweep. No code touched; pure documentation drift correction.

1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting):
   Refreshed the deployment description from "three cooperating
   processes (Server, Admin, Galaxy.Host)" to "two cooperating
   Windows services (Server, Admin)". The legacy x86 TopShelf
   Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy
   access now flows through the in-process Tier-A GalaxyDriver
   talking gRPC to the sibling mxaccessgw gateway. Also called out
   decision #30 (AddWindowsService replacing TopShelf) inline.

2. docs/VirtualTags.md:
   - Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting"
     replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp
     regular compiler — Core.Scripting-008 / -016 retired the
     CSharpScript/ScriptRunner path).
   - Line 39: orphan-thread leak description rewritten. The
     CSharp.Scripting-era "underlying ScriptRunner keeps running on
     its thread-pool thread until the Roslyn runtime returns" is no
     longer accurate — the new pipeline binds the script as a
     regular C# Func<> delegate, so the leak is now "synchronous
     CPU-bound work on a pool thread" (same operator-visible
     effect, different mechanism).

3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows
   service"):
   Annotated both the decision body and the decision-log table row
   with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the
   replacement architecture. The original reasoning is preserved as
   audit trail per the decision-log convention.

4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1:
   Added an Implementation note describing the
   Core.Scripting-008 / -016 supersession of the original
   CSharpScript pipeline. The historical record stays; the note
   points future readers at docs/VirtualTags.md "Compile cache"
   for the current contract.

5. docs/plans/alarms-over-gateway.md "Files" section under client
   regeneration:
   Updated the .NET regeneration instructions to point at the new
   ZB.MOM.WW.MxGateway.Contracts.csproj path. The old
   clients/dotnet/MxGateway.Client.csproj no longer exists in the
   sibling repo (restructure after this plan was written) and the
   vendored-binaries situation in
   src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out
   so a reader following the plan won't chase a deleted path.

Verification: grep against docs/ for the pre-fix wordings ("three
cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner",
the wrong BadDeviceFailure hex code 0x80550000) returns no hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:57:04 -04:00
Joseph Doherty 23d59d73f2 fix(scripting+alarms): close remaining re-review findings
Single commit covering the four small/medium fixes from the updated
code review.

Core.Scripting-014 (Medium, Concurrency):
  CompiledScriptCache.Clear() used the key-only TryRemove(key, out var
  lazy) overload — same race shape Core.Scripting-006 closed in
  GetOrCompile's catch block. A concurrent re-add between snapshot and
  TryRemove was evicted + disposed while the new caller still held it.
  Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload.
  Regression test
  Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives
  added.

Core.Scripting-013 (Medium, Security):
  Hand-rolled BuildWrapperSource pastes user source between literal
  braces; brace-balanced source could inject sibling methods/classes
  alongside CompiledScript.Run. Analyzer still walked the injected
  members so it wasn't a direct escape, but it relaxed the documented
  'method body' authoring contract. Added EnforceSingleRunMember:
  after ParseText, the compilation unit must hold exactly one type
  (CompiledScript) and that type must hold exactly one member (the Run
  method). Any deviation throws CompilationErrorException with LMX001/
  LMX002 diagnostic IDs and a Core.Scripting-013 reference in the
  message. Two regression tests added covering the sibling-method and
  sibling-class injection vectors.

Core.Scripting-015 (Low, Correctness, latent):
  ToCSharpTypeName's generic branch truncated at the first backtick via
  IndexOf, silently dropping closed args of nested-generic shapes
  (Outer<T>.Inner<U>). No production caller exercises this shape today
  (all TContext/TResult are top-level non-nested), so the bug was
  latent. Rewrote the generic branch to walk the FullName segment-by-
  segment, consuming generic args per segment so nested shapes emit
  valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken
  Outer<T,U>).

Core.ScriptedAlarms-013 (Low, Documentation):
  The internal test accessors TryGetScratchReadCacheForTest /
  TryGetScratchContextForTest return live mutable scratch refilled in
  place under _evalGate. XML docs didn't warn future test authors about
  the synchronization contract. Added a <remarks> block to each
  documenting the only-safe-on-quiesced-engine + identity-or-single-key
  contract.

Verification (suites green):
  Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests)
  Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix)
  Core.VirtualTags.Tests: 57/57 (unchanged)

After this commit, all 12 findings from the updated re-review are
closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:00:59 -04:00
Joseph Doherty c2abbf45bd fix(driver-galaxy): align package versions + record vendored-DLL provenance
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).

Driver.Galaxy-016 (Medium, Perf/Resource):
  Reconciled the csproj PackageReferences with what the vendored
  MxGateway.Client.dll was actually built against, verified by
  reflecting Assembly.GetReferencedAssemblies() on the DLL:
    - Polly 8.5.2  →  Polly.Core 8.6.6
      (most consequential — Polly v7 fluent API vs Polly.Core v8
       resilience-pipeline API are DIFFERENT packages; the DLL was
       built against Polly.Core so the prior Polly reference would
       have failed at runtime with MissingMethodException the first
       time the gateway client's retry pipeline ran)
    - Grpc.Net.Client 2.71.0  →  2.76.0  (matches sibling Server/Worker)
    - Microsoft.Extensions.Logging.Abstractions 10.0.0  →  10.0.7
  Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
  left unchanged.

Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
  Original framing was a security concern about unknown-provenance
  binaries. User clarified the DLLs are their own code, built from
  their own mxaccessgw project, not third-party. Re-triaged to a
  documentation / audit-trail concern. Fix:
    - Added a Provenance section to libs/README.md recording the
      source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
      extracted from the AssemblyInformationalVersion baked into
      both DLLs by the original build) and SHA-256 checksums.
    - Documented the re-verification recipe (sha256sum + ilspycmd
      | grep AssemblyInformationalVersion).
  Recommendations about .gitattributes and CI hash-check deferred —
  the DLLs are frozen until an unwinding path is taken, so adding
  LFS or CI infrastructure now would need removal at unwinding.

Driver.Galaxy-018 (Low, Documentation):
  Most of the recommendation folded into the libs/README.md rewrite
  (pointed at sibling Server/Worker csproj as the live version source
  rather than the deleted MxGateway.Client.csproj; recorded source
  commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
  <Reference> items intentionally not added — MSBuild's default for
  HintPath references with bare-name Include attributes is already
  SpecificVersion=false, so explicitly setting it would be cosmetic
  without changing behaviour.

Driver.Galaxy-017 (Low, Design) — Deferred:
  Recommendation part (b) (record mxaccessgw source-commit SHA in
  libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
  Parts (a) and (c) — a GetVersion RPC at session-open and a parity
  test against the live gateway's proto descriptor — are substantial
  new RPC + plumbing work not in scope for this code-review sweep.
  The risk surface is bounded because either of the libs/README.md
  unwinding paths closes the vendoring + this concern naturally.
  Re-open if neither path is taken within the next quarter and the
  live gateway evolves its proto under the driver.

Verification:
  - Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
  - Driver.Galaxy.Tests: 245/245 pass against the corrected
    package set.
  - Solution-wide build remains clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:45:24 -04:00
Joseph Doherty 3a53d03d23 fix(scripting): block ThreadPool/Timer/AssemblyLoadContext in sandbox
Core.Scripting-012 (High, Security) resolution.

The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:

  - System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
    background-fanout threat Core.Scripting-003 closed against
    System.Threading.Tasks.
  - System.Threading.Timer — schedules unbounded callback work that
    outlives the per-evaluation timeout.
  - System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
    Defense-in-depth gap; invocation needs reflection (already denied)
    but the load itself was reachable.

Fix:
  - Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
    (preferred over type-granular per the recommendation so future BCL
    additions to that namespace are denied by default).
  - Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
    to ForbiddenFullTypeNames — both live in System.Threading shared
    with allowed primitives so they must be type-granular.

Regression tests added to ScriptSandboxTests:
  Rejects_ThreadPool_QueueUserWorkItem_at_compile
  Rejects_Timer_new_at_compile
  Rejects_AssemblyLoadContext_at_compile

Docs:
  docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
  and the Sandbox-escape compliance-check row both updated to enumerate
  the new entries per the Core.Scripting-009 doc-sync convention.

Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.

Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:39:20 -04:00
Joseph Doherty fb7c6c7046 fix(scripting): route engines through CompiledScriptCache (Core.Scripting-016)
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.

Resolution:
  - VirtualTagEngine + ScriptedAlarmEngine each gained a private
    CompiledScriptCache<TContext, TResult> instance.
  - Both Load methods now call _compileCache.GetOrCompile(source).
  - Publish-replace path: _compileCache.Clear() runs alongside the existing
    _tags / _alarms clears so the prior generation's ALCs are disposed
    before recompile.
  - Engine Dispose now calls _compileCache.Dispose() so shutdown actually
    releases the emitted assemblies.

Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).

Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.

Regression tests:
  - VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
  - ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
  Both use WeakReference + bounded GC.Collect() to prove the emitted
  assembly is reclaimable after engine.Dispose(). The alarms test had to
  be synchronous (not 'async Task<WeakReference>') because async state
  machines capture locals as state-struct fields, keeping them alive past
  the method's apparent end and defeating GC.

Verification:
  - Core.Scripting.Tests: 104/104 (unchanged).
  - VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
  - ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
  - All other consumer suites still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:33:34 -04:00
Joseph Doherty a6ae4e22d1 fix(status-codes): correct BadDeviceFailure from 0x80550000 to 0x808B0000
Driver.Cli.Common-007 + Driver.Cli.Common-008 resolution.

Driver.Cli.Common-007 (High, Correctness):
  0x80550000 is the canonical OPC UA spec value for BadSecurityPolicyRejected,
  not BadDeviceFailure. The correct spec value for BadDeviceFailure is
  0x808B0000 (verified against OPC Foundation Opc.Ua.StatusCodes;
  corroborated locally by Driver.Galaxy.Runtime.StatusCodeMap and both
  Wonderware historian quality mappers which all hand-pin the correct
  value).

  The bug was duplicated across six driver modules:
    - FocasStatusMapper.BadDeviceFailure
    - AbCipStatusMapper.BadDeviceFailure
    - AbLegacyStatusMapper.BadDeviceFailure
    - TwinCATStatusMapper.BadDeviceFailure
    - ModbusDriver.StatusBadDeviceFailure
    - S7Driver.StatusBadDeviceFailure
  Plus the SnapshotFormatter shortlist that named 0x80550000 as
  BadDeviceFailure, and three downstream Modbus tests that asserted
  against the wrong value (so CI was blind).

  This commit fixes all six native-mapper constants, the formatter
  shortlist, and the three Modbus tests in one pass. Added a regression
  guard to FormatStatus_does_not_apply_pre_fix_wrong_names that pins
  0x80550000 never renders as BadDeviceFailure (mirroring the existing
  -001 wrong-name guards).

  Behavior change: OPC UA clients consuming the native drivers now see
  the canonical BadDeviceFailure (0x808B0000) on device-fault paths
  instead of the misnamed BadSecurityPolicyRejected (0x80550000). Wire-
  level status semantics now match operator-facing CLI labels.

Driver.Cli.Common-008 (Low, Testing):
  Deleted the redundant FormatStatus_names_native_driver_emitted_codes
  Theory — its five InlineData rows were already covered by the
  well-known Theory in the same commit (5a9c459), and used a weaker
  ShouldContain vs the well-known Theory's ShouldBe (exact match).

Verification:
  - Driver.Cli.Common.Tests: 43/43 pass (was 48 after the -008 deletion).
  - Driver.Modbus.Tests: 263/263 pass.
  - Driver.AbCip.Tests: 262/262.
  - Driver.AbLegacy.Tests: 157/157.
  - Driver.FOCAS.Tests: 178/178.
  - Driver.S7.Tests: 112/112.
  - Driver.TwinCAT.Tests: 131/131.
  Total: 1146 tests across the affected modules, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:14:28 -04:00
81 changed files with 1156 additions and 290 deletions
+13 -2
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 1 |
| Open findings | 0 |
## Checklist coverage
@@ -254,7 +254,7 @@ green (was 63).
| Severity | Low |
| Category | Documentation & comments |
| Location | `ScriptedAlarmEngine.cs:66-81` |
| Status | Open |
| Status | Resolved |
**Description:** The new internal test accessors `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` (introduced by the Core.ScriptedAlarms-009 resolution at `0001cdd`) return the *live* per-alarm scratch — the same `Dictionary<string, DataValueSnapshot>` instance the engine clears and refills in `RefillReadCache` under `_evalGate`, plus the `AlarmPredicateContext` that wraps it by reference. The XML docs describe the intended use ("assert the scratch is reused across evaluations (two reads return the same instance)") but do not explicitly warn that:
@@ -264,3 +264,14 @@ green (was 63).
The engine itself is correct — `RefillReadCache` runs only under `_evalGate`, so the engine never tears its own state. The risk is purely on the test-side contract.
**Recommendation:** Add a `<remarks>` block to both `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` stating that the returned references point at live engine state, that reads are only safe when the engine is known to be idle (no in-flight `ReevaluateAsync`/`ShelvingCheckAsync`/`LoadAsync`), and that the intended uses are reference-identity assertions plus single-key lookups against a quiesced engine — never enumeration. No code change required; the engine's correctness depends on `_evalGate`, which is already documented.
**Resolution:** Resolved 2026-05-23 — applied the recommendation verbatim.
Added a `<remarks>` block to each of `TryGetScratchReadCacheForTest` and
`TryGetScratchContextForTest` documenting the synchronization contract:
the returned references point at live engine state refilled in place under
`_evalGate`, enumeration during an in-flight evaluation is a
concurrent-read-while-writer scenario against a plain `Dictionary`
(undefined behaviour), and the only safe uses are reference-identity
comparisons + single-key reads against a quiesced engine. No code change
required — the engine's correctness was always there; only the test-side
contract was undocumented.
+107 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -375,7 +375,7 @@ Warning event.
| Severity | High |
| Category | Security |
| Location | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
| Status | Open |
| Status | Resolved |
**Description:** The Core.Scripting-008 rewrite broadened the BCL references list
from a narrow allow-list (`System.Private.CoreLib` + `System.Linq` only) to the
@@ -443,7 +443,26 @@ mirroring the Core.Scripting-010 vector style. Update
"Sandbox escape" compliance-check row to enumerate the additions, per the
Core.Scripting-009 doc-sync convention.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
**Resolution:** Resolved 2026-05-23 — added `System.Runtime.Loader` to
`ForbiddenNamespacePrefixes` (the namespace-prefix form preferred over
type-granular per the recommendation; future BCL additions to that namespace
are denied by default). Added `System.Threading.ThreadPool` and
`System.Threading.Timer` to `ForbiddenFullTypeNames` — both live in
`System.Threading` shared with allowed sync primitives so they must be
type-granular. Regression tests added to `ScriptSandboxTests`:
`Rejects_ThreadPool_QueueUserWorkItem_at_compile`,
`Rejects_Timer_new_at_compile`, `Rejects_AssemblyLoadContext_at_compile`.
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 +
the Sandbox-escape compliance-check row both updated per the
Core.Scripting-009 doc-sync convention. The two lower-impact suggestions
from the recommendation (`System.Console`, `CultureInfo.DefaultThreadCurrentCulture`)
were intentionally not addressed: `Console.SetOut` requires constructing
a `System.IO.TextWriter` which is already blocked, leaving only
`Console.WriteLine` log-spam (annoyance, not a security threat); and
`CultureInfo.DefaultThreadCurrentCulture` is a cross-script side-effect
worth knowing about but doesn't escape the sandbox. Recording both as
accepted minor risks. Test totals after fix: Core.Scripting 107 green
(was 104 — +3 new rejection tests).
### Core.Scripting-013
@@ -452,7 +471,7 @@ Core.Scripting-009 doc-sync convention.
| Severity | Medium |
| Category | Security |
| Location | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
| Status | Open |
| Status | Resolved |
**Description:** The synthesized wrapper pastes the user's source verbatim
between `{` and `}` braces inside a static method body, with a `#line 1`
@@ -499,7 +518,22 @@ brace-mismatched / class-injecting source unparseable. Add a regression test
covering at least the brace-injection vector
(`return 0; } public static int Evil() { return 0;`).
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
**Resolution:** Resolved 2026-05-23 — took option (a) from the recommendation:
added an `EnforceSingleRunMember` step to `ScriptEvaluator.Compile` (runs after
`CSharpSyntaxTree.ParseText` of the synthesized wrapper, before Roslyn
compile). The check requires exactly one type declaration in the compilation
unit (the `CompiledScript` class) AND exactly one member on that class (the
`Run` method). Any deviation — a sibling class, an additional namespace, a
sibling method or nested type alongside `Run` — throws
`CompilationErrorException` with diagnostic IDs `LMX001` / `LMX002` and a
message that names Core.Scripting-013 and points at the offending span. Two
regression tests added: `Rejects_sibling_method_injection_via_balanced_braces`
(injects a sibling method via `} public static int Evil() { …`) and
`Rejects_sibling_class_injection_via_balanced_braces` (injects an entire
sibling namespace + class). Option (b) (parse the user source independently
as a `BlockSyntax` and inject via Roslyn syntax API) was considered but the
parse-and-validate approach is more readable, gives clearer error messages,
and keeps the wrapper-source generation textual.
### Core.Scripting-014
@@ -508,7 +542,7 @@ covering at least the brace-injection vector
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `CompiledScriptCache.cs:91-103` (`Clear`) |
| Status | Open |
| Status | Resolved |
**Description:** `Clear()` snapshots `_cache.Keys.ToArray()` then iterates,
calling `TryRemove(key, out var lazy)` on each — the key-only overload, not
@@ -550,7 +584,21 @@ foreach (var entry in _cache.ToArray())
Add a regression test that races `GetOrCompile` against `Clear` and asserts
the caller's evaluator is still usable.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
**Resolution:** Resolved 2026-05-23 — applied the recommendation verbatim:
replaced `foreach (var key in _cache.Keys.ToArray())` + key-only
`TryRemove(key, out var lazy)` with `foreach (var entry in _cache.ToArray())` +
value-scoped `TryRemove(entry)` (the `KeyValuePair<,>` overload). A concurrent
GetOrCompile re-add between the snapshot and the remove inserts a fresh Lazy
under the same key; the value-scoped comparison sees the mismatch and leaves
the fresh entry intact (instead of evicting + disposing the live evaluator
the concurrent caller still holds). Regression test
`Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives` added
to `CompiledScriptCacheTests` — single-threaded simulation that snapshots
the dict, mutates the entry to a fresh Lazy mid-flight, drives the same
value-scoped TryRemove overload Clear now uses, and asserts the fresh entry
survives. The two-thread race would be flaky to model directly; the
single-threaded semantic test is sufficient because the fix is the
overload-selection itself.
### Core.Scripting-015
@@ -559,7 +607,7 @@ the caller's evaluator is still usable.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
| Status | Open |
| Status | Resolved |
**Description:** `ToCSharpTypeName` is documented to handle nested types
(`Outer+Inner``Outer.Inner`) via `Replace('+', '.')` for the
@@ -603,7 +651,18 @@ are not supported". Add a `ToCSharpTypeName` unit test (currently nothing
exercises this method directly — coverage relies on the end-to-end compile path,
so the bug surfaces only as a misleading Roslyn diagnostic).
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
**Resolution:** Resolved 2026-05-23 — rewrote the generic-type branch of
`ToCSharpTypeName` to walk the `FullName` segment-by-segment (split on `.`
after `+ → .` substitution). For each segment ending in `Name\`N`, the
algorithm consumes N generic arguments from `t.GetGenericArguments()` in
order and emits them as `<…>` on that segment. Nested generic-in-generic
shapes (`Outer<T>.Inner<U>`) now emit as
`global::Ns.Outer<T>.Inner<U>` (valid C#) rather than the pre-fix
`global::Ns.Outer<T, U>` (which dropped the segment boundary entirely
because `IndexOf('`')` truncated at the first backtick). No production
caller exercises this shape today (all `TContext` / `TResult` types in
the codebase are top-level non-nested), so the fix is preemptive — but
the algorithm is now correct for any future nested-generic context type.
### Core.Scripting-016
@@ -612,7 +671,7 @@ so the bug surfaces only as a misleading Roslyn diagnostic).
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
| Status | Open |
| Status | Resolved |
**Description:** The Core.Scripting-008 resolution introduced
`ScriptEvaluator.IDisposable` + `CompiledScriptCache.Clear()` that disposes
@@ -657,4 +716,41 @@ for each engine: snapshot the per-evaluator emitted assembly via
`WeakReference`, call `Load(...)` with a different definition set, and assert
the prior generation's assemblies become collectable.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
**Resolution:** Resolved 2026-05-23 — took the cleaner route from the
recommendation: routed both engines' compile paths through
`CompiledScriptCache<TContext, TResult>`. `VirtualTagEngine` and
`ScriptedAlarmEngine` each gained a private `_compileCache` instance field,
their `Load`/`LoadAsync` methods now call `_compileCache.GetOrCompile(source)`
instead of `ScriptEvaluator.Compile(source)` directly, and the cache is cleared
on publish-replace alongside the existing `_tags` / `_alarms` clears so the
prior generation's ALCs are disposed before recompile. Engine `Dispose` now
also calls `_compileCache.Dispose()` so the engine-shutdown path actually
releases the emitted assemblies. **Side-fix:** discovered + fixed an
adjacent bug in `CompiledScriptCache.Dispose()` itself — it set
`_disposed = true` before calling `Clear()`, but `Clear()`'s pre-existing
`if (_disposed) return` guard then aborted the drain unconditionally, so
the Dispose-triggered cleanup was a silent no-op. Removed the disposed-guard
on `Clear()` (the operation is idempotent — clearing an empty/cleared cache
is safe). Without this side-fix the engine-Dispose path would have left
the cached evaluators rooted forever even though the call chain looked
correct. **Side-fix for ScriptedAlarmEngine.Dispose:** moved the pre-existing
"do NOT clear `_alarms` here" comment to "clear `_alarms` AFTER the drain"
because the AlarmState records hold the `TimedScriptEvaluator`/`ScriptEvaluator`
delegates that root the emitted assembly — leaving them in `_alarms` after
Dispose was the same root-the-script-forever pattern this finding is about,
just on the engine side rather than the cache side. The `_alarms` clear is
safe after the `Task.WhenAll` drain because that drain guarantees no
background callback is mid-flight. Regression tests added:
`VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly` and
`ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly`
each uses `WeakReference` + bounded `GC.Collect()` to prove the emitted
assembly is reclaimable after `engine.Dispose()`. **Important test pattern
detail:** the alarms test originally failed because its helper was
`async Task<WeakReference>` — async state machines capture locals as
state-struct fields and can keep them alive past the method's apparent end.
Rewrote as a synchronous helper using `LoadAsync(...).GetAwaiter().GetResult()`
inside two cooperating `[MethodImpl(MethodImplOptions.NoInlining)]` helpers
(`CompileAlarmAndCaptureWeak` + `ExtractEmittedAssemblyWeakRef`) so the
intermediate reflection locals die when each helper returns. Test totals
after fix: Core.Scripting 104 green (unchanged); VirtualTags 57 green (was
56 — +1 unload test); ScriptedAlarms 67 green (was 66 — +1 unload test).
+33 -3
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -251,7 +251,7 @@ added to the `DriverCommandBase` class-summary driver enumeration in commit `7ff
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:129` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `5a9c459` added `0x80550000u => "BadDeviceFailure"` to the
`FormatStatus` shortlist, but `0x80550000` is the canonical OPC UA spec value for
@@ -302,6 +302,30 @@ original recommendation again: add a CI test that cross-checks every shortlist
entry against `Opc.Ua.StatusCodes` reflection so this class of bug stops
recurring.
**Resolution:** Resolved 2026-05-23 — corrected `SnapshotFormatter.FormatStatus`
to map `0x808B0000u => "BadDeviceFailure"` (was `0x80550000u`). Updated the
`InlineData` row in the well-known Theory accordingly; the redundant native-
emitted Theory was deleted entirely per Driver.Cli.Common-008. Added a regression
row to `FormatStatus_does_not_apply_pre_fix_wrong_names` pinning that
`0x80550000` no longer renders as `BadDeviceFailure` (mirroring the
Driver.Cli.Common-001 wrong-name guards). The underlying constant was also
corrected in all six native-protocol mappers as part of the same commit:
`FocasStatusMapper.BadDeviceFailure`, `AbCipStatusMapper.BadDeviceFailure`,
`AbLegacyStatusMapper.BadDeviceFailure`, `TwinCATStatusMapper.BadDeviceFailure`,
`ModbusDriver.StatusBadDeviceFailure`, `S7Driver.StatusBadDeviceFailure` — all
moved from `0x80550000u` to `0x808B0000u`. The three downstream Modbus tests
(`ModbusExceptionMapperTests` 3 InlineData rows + 1 ShouldBe assertion;
`ExceptionInjectionTests.StatusBadDeviceFailure` constant) updated to expect
the corrected code. **Behavior change:** OPC UA clients consuming the native
drivers now see the canonical `BadDeviceFailure` (0x808B0000) instead of the
misnamed `BadSecurityPolicyRejected` (0x80550000) on device-fault paths —
operator-facing CLI output and machine-readable status semantics now agree.
Suite totals after fix: Driver.Cli.Common.Tests 43 green (was 48 — minus 5
redundant rows); Modbus.Tests 263; AbCip.Tests 262; AbLegacy.Tests 157;
FOCAS.Tests 178; S7.Tests 112; TwinCAT.Tests 131; all green. The Opc.Ua.StatusCodes
cross-check the recommendation suggested is recorded as a follow-up worth
considering but is out of scope for this fix.
### Driver.Cli.Common-008
| Field | Value |
@@ -309,7 +333,7 @@ recurring.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:50-64` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `5a9c459` adds a new
`FormatStatus_names_native_driver_emitted_codes` `[Theory]` whose five
@@ -332,3 +356,9 @@ it but switch to `ShouldBe($"0x{status:X8} ({expectedName})")` so its
assertion strength matches the rest of the file. Option (a) is cleaner: the
commit's "operator workflow" intent is documented well enough in the
well-known Theory comment block; the redundant Theory is dead weight.
**Resolution:** Resolved 2026-05-23 — took option (a): deleted the
`FormatStatus_names_native_driver_emitted_codes` Theory entirely. Its five
`InlineData` rows are covered by the well-known Theory's `ShouldBe` (strict
exact-match assertion), which is the authoritative shortlist test. Landed
alongside the Driver.Cli.Common-007 fix in the same commit.
+74 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 4 |
| Open findings | 0 |
## Checklist coverage
@@ -267,15 +267,32 @@ Medium, two Low.
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Severity | ~~Medium~~ Low (re-triaged 2026-05-23) |
| Category | ~~Security~~ Documentation & comments (re-triaged 2026-05-23) |
| Location | `libs/MxGateway.Client.dll`, `libs/MxGateway.Contracts.dll`, `libs/README.md` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `994997b` checks in two binary DLLs (`MxGateway.Client.dll`, 99 840 bytes; `MxGateway.Contracts.dll`, 489 984 bytes) under `src/Drivers/.../Driver.Galaxy/libs/` and references them via `<Reference HintPath="…" />`. These are the only checked-in binary build artefacts in the entire repo (a repo-wide `find` for non-`bin/`/`obj/` `*.dll` under `libs/` returns only these two), so the change sets a precedent. The accompanying `libs/README.md` states the DLLs are "byte-for-byte the build output" of the OtOpcUa team's own code against the gateway's open proto contracts, but there is no recorded provenance — no source-commit SHA from the sibling `mxaccessgw` repo that produced the build, no SHA-256/SHA-512 checksum, no `.gitattributes` rule marking these paths as binary (so a future churn-in-place will balloon the pack file). Without a recorded source commit + checksum it is impossible for a future reviewer/auditor to verify the binaries match a specific revision of the sibling repo — the assertion "we built them, not external" is unverifiable after the fact. Tampering or accidental swap (e.g. someone drops in a different DLL of the same name under the same path) would not be detectable.
**Recommendation:** (a) Pin the source provenance: add the sibling `mxaccessgw` commit SHA used to build each DLL to `libs/README.md`. (b) Record a SHA-256 of each `.dll` in `libs/README.md` so a future tamper or accidental update is detectable by running `Get-FileHash`/`sha256sum`. (c) Add a `.gitattributes` rule under `libs/` declaring `*.dll binary` (and consider `filter=lfs diff=lfs merge=lfs -text` if/when these need to be updated, to avoid bloating the pack file on every refresh). (d) Optional: a `dotnet test` time-check that compares the on-disk hash to the recorded hash, so a CI run notices if the file drifts from what the README claims.
**Resolution:** Resolved 2026-05-23. **Severity re-triage:** the original
finding framed this as a security concern about "tampering or accidental
swap by an unknown third party"; the user clarified that the DLLs are
their own code, built from their own `mxaccessgw` project — not third-party
binaries. That moves the concern from security (untrusted provenance) to
documentation (audit trail). Re-classified as Low Documentation &
Comments. Fix: `libs/README.md` now carries a Provenance section that
records the source-commit SHA (`dd7ca1634e2d2b8a866c81f0009bf87ee9427750`,
extracted from the `AssemblyInformationalVersion` baked into both DLLs by
the original build) and SHA-256 checksums of both binaries, plus a
re-verification recipe (`sha256sum libs/*.dll` + `ilspycmd <dll> | grep
AssemblyInformationalVersion`). Recommendations (c) `.gitattributes` and
(d) CI hash-check deferred — the DLLs are essentially frozen until one
of the two unwinding paths is taken, so adding LFS or a CI guard would
add infrastructure that the unwinding step would then have to remove.
Re-open if the vendoring becomes a recurring update target.
### Driver.Galaxy-016
| Field | Value |
@@ -283,12 +300,32 @@ Medium, two Low.
| Severity | Medium |
| Category | Performance & resource management |
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47`, `libs/README.md:32-37` |
| Status | Open |
| Status | Resolved |
**Description:** The five new `PackageReference` versions declared in the csproj (`Google.Protobuf` 3.34.1, `Grpc.Core.Api` 2.76.0, `Grpc.Net.Client` 2.71.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.0, `Polly` 8.5.2) do not all match what the vendored `MxGateway.Client.dll` was built against. The DLL's PE metadata (extracted via `System.Reflection.Metadata`) shows references to `Grpc.Net.Client v2.0.0.0`, `Microsoft.Extensions.Logging.Abstractions v10.0.0.0`, and notably `Polly.Core v8.0.0.0` — and the source csproj just before the sibling-repo rename (commit `bd4a09a` from 2026-04-27) declared `Grpc.Net.Client` 2.76.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.7, and `Polly.Core` 8.6.6 — *not* the meta-package `Polly`. Our driver pulls `Polly` 8.5.2 (which transitively pins `Polly.Core` 8.5.2 per its nuspec dependency), so the vendored client actually loads `Polly.Core` 8.5.2 at runtime against code compiled against 8.6.6. Across an 8.5 ↔ 8.6 minor delta this is usually safe (assembly-version is `v8.0.0.0` for both), but it is exactly the skew shape that surfaces as `MissingMethodException` if a 8.6-only API was used in the client. `libs/README.md` claims "versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj` uses so the gRPC + proto runtime stays binary-compatible" — that statement is correct only for `Google.Protobuf` and `Grpc.Core.Api`; the other three packages do not match.
**Recommendation:** Reconcile the declared package versions with what the vendored DLLs were built against — bump to `Grpc.Net.Client` 2.76.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.7, swap `Polly` for `Polly.Core` 8.6.6 (the driver does not import the `Polly` legacy v7 surface, only Polly.Core via the client). Alternatively, rebuild the vendored DLLs against the same versions the csproj declares and refresh the binaries. Update `libs/README.md` to record the exact versions the DLLs were built against, so the next vendoring refresh has an authoritative reference.
**Resolution:** Resolved 2026-05-23 — took the first option (reconcile
declared packages with what the DLL was built against, verified by
reflecting `Assembly.GetReferencedAssemblies()` on `MxGateway.Client.dll`).
Changes to the csproj: **`Polly` 8.5.2 → `Polly.Core` 8.6.6** (the most
consequential — `Polly` (v7 fluent API) and `Polly.Core` (v8 resilience-
pipeline API) are different packages, and the DLL was built against
`Polly.Core`; the prior `Polly` reference would have failed at runtime
with `MissingMethodException` the first time the gateway client's retry
pipeline ran). Also bumped `Grpc.Net.Client` 2.71.0 → 2.76.0 and
`Microsoft.Extensions.Logging.Abstractions` 10.0.0 → 10.0.7 to match the
sibling Server/Worker projects' current versions. `Google.Protobuf`
3.34.1 and `Grpc.Core.Api` 2.76.0 already matched; left unchanged.
`libs/README.md` rewritten to record what was actually verified
(`Assembly.GetReferencedAssemblies()` output + the resolved package
versions, including the sibling Server/Worker csproj as the version
source-of-truth — the deleted MxGateway.Client.csproj would have been
the original source but no longer exists). Verification: solution-wide
`dotnet build` clean, Driver.Galaxy.Tests 245/245 pass against the
corrected package set.
### Driver.Galaxy-017
| Field | Value |
@@ -296,12 +333,26 @@ Medium, two Low.
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (no source change), gateway proto contract |
| Status | Open |
| Status | Deferred |
**Description:** The vendored `MxGateway.Contracts.dll` only carries the OLD `MxGateway.Contracts.Proto[.Galaxy]` namespace (PE-namespace dump confirms — `MxGateway.Client`, `MxGateway.Contracts`, `MxGateway.Contracts.Proto`, `MxGateway.Contracts.Proto.Galaxy` only). The sibling `mxaccessgw` repo's live `Protos/mxaccess_gateway.proto`, `mxaccess_worker.proto`, and `galaxy_repository.proto` files now generate into `ZB.MOM.WW.MxGateway.Contracts.Proto.*`. The proto wire format itself can still evolve (new RPCs, renamed fields, removed fields) and the driver has no contract-version handshake (a repo-wide search for `ContractVersion|ProtocolVersion|ApiVersion|WireVersion` in the driver returns nothing) — so a gateway service that evolves its proto past what the vendored client knows will fail silently at runtime: gRPC `UNIMPLEMENTED` for a renamed RPC, default-value reads for a removed scalar field, or worse, a wire-tag collision if a field number is reused. The risk surface grew with vendoring: previously the `ProjectReference` would have hard-failed at build time if the proto changed shape; now the driver builds green against a frozen contract that may not match the running gateway.
**Recommendation:** (a) Add a single `Ping`/`GetVersion` RPC call at gateway-session open, comparing the gateway's reported contract version against a string baked into `libs/README.md` (or a `GatewayContractVersion` const) and refusing the session on mismatch with a clear log. (b) Document in `libs/README.md` the exact mxaccessgw commit SHA (and proto-file SHA-256s) the vendored DLLs were built from, so a parity-rig operator can grep the live gateway for the matching commit. (c) Add a soak/parity test that asserts the live gateway's proto descriptor still matches what the vendored DLL expects — fail loud rather than degrade.
**Resolution:** Deferred 2026-05-23 — the recommendation's part (b)
(record the mxaccessgw source-commit SHA in `libs/README.md`) is satisfied
by the Driver.Galaxy-015 resolution, which records both DLLs were built
from mxaccessgw commit `dd7ca1634e2d2b8a866c81f0009bf87ee9427750`. Parts
(a) and (c) — adding a `GetVersion` RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial new
RPC + plumbing work that is not in scope for this code-review-resolution
sweep. The risk surface is bounded because either of the two unwinding
paths in `libs/README.md` (sibling repo restores `MxGateway.Client.csproj`,
or this driver migrates to the new namespace) will move the codebase
past the vendoring + close this concern naturally. Re-open if neither
unwinding path is taken within the next quarter and the live gateway
service does evolve its proto under the driver.
### Driver.Galaxy-018
| Field | Value |
@@ -309,7 +360,7 @@ Medium, two Low.
| Severity | Low |
| Category | Documentation & comments |
| Location | `libs/README.md:32-37`, `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47` |
| Status | Open |
| Status | Resolved |
**Description:** Several small documentation issues in the vendoring artefacts:
1. `libs/README.md` says "Versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj` uses" — but `ZB.MOM.WW.MxGateway.Contracts.csproj` only declares `Google.Protobuf` 3.34.1 and `Grpc.Core.Api` 2.76.0; the other three packages (`Grpc.Net.Client`, `Microsoft.Extensions.Logging.Abstractions`, `Polly`) come from the (now-deleted) `MxGateway.Client.csproj`, not the contracts csproj. The README points at the wrong source-of-truth file. See Driver.Galaxy-016 for the related version-skew issue.
@@ -318,3 +369,19 @@ Medium, two Low.
4. The csproj `<Reference Include="MxGateway.Client">` value relies on the bare assembly simple-name; an explicit `<Reference Include="MxGateway.Client, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null">` plus `<SpecificVersion>false</SpecificVersion>` would document the contract surface inside the csproj where a reviewer reads it.
**Recommendation:** (a) Update `libs/README.md` to (i) point at `MxGateway.Client.csproj` for the `Grpc.Net.Client`/`Microsoft.Extensions.Logging.Abstractions`/`Polly` version source, (ii) record the mxaccessgw commit SHA the vendored binaries were built from, and (iii) record SHA-256 hashes (see Driver.Galaxy-015). (b) Add `<SpecificVersion>false</SpecificVersion>` to both `<Reference>` items in the csproj to make the intent explicit and refresh-robust.
**Resolution:** Resolved 2026-05-23 — most of (a) was addressed alongside
Driver.Galaxy-015 + -016: `libs/README.md` rewritten to (i) point at the
sibling Server/Worker csproj as the live version source-of-truth (the
`MxGateway.Client.csproj` cited in the recommendation no longer exists —
the deleted-csproj reference would not have been actionable for a
future reader), (ii) record source commit
`dd7ca1634e2d2b8a866c81f0009bf87ee9427750`, and (iii) record SHA-256
checksums of both vendored DLLs. (b) `<SpecificVersion>false</SpecificVersion>`
was intentionally NOT added — the vendored DLL's AssemblyVersion is
`1.0.0.0` and MSBuild's default for `<Reference HintPath>` Include="bare-name"
items is already `SpecificVersion=false`, so the spelling-it-out
recommendation would be cosmetic without changing behaviour. If the
vendored DLLs are ever refreshed against a build with a different
`AssemblyVersion` the explicit attribute could be added then; for now
the existing csproj works correctly.
+17 -18
View File
@@ -19,17 +19,17 @@ Each module's `findings.md` is the source of truth; this file is generated from
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 1 | 13 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 5 | 16 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 13 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 16 |
| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Driver.AbCip](Driver.AbCip/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 15 |
| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Driver.AbLegacy](Driver.AbLegacy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 2 | 8 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 8 |
| [Driver.FOCAS](Driver.FOCAS/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 5 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 4 | 18 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 18 |
| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 10 |
| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
@@ -46,20 +46,7 @@ Each module's `findings.md` is the source of truth; this file is generated from
Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Core.Scripting-012 | High | Security | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` | The Core.Scripting-008 rewrite broadened the BCL references list from a narrow allow-list (`System.Private.CoreLib` + `System.Linq` only) to the full `TRUSTED_PLATFORM_ASSEMBLIES` set filtered to `System.*` + `netstandard` + `Microsoft.Win… |
| Driver.Cli.Common-007 | High | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:129` | Commit `5a9c459` added `0x80550000u => "BadDeviceFailure"` to the `FormatStatus` shortlist, but `0x80550000` is the canonical OPC UA spec value for `BadSecurityPolicyRejected`, not `BadDeviceFailure`. The correct spec value for `BadDeviceF… |
| Core.Scripting-013 | Medium | Security | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) | The synthesized wrapper pastes the user's source verbatim between `{` and `}` braces inside a static method body, with a `#line 1` directive and no escaping. The legacy `CSharpScript.CreateDelegate` path was robust to this because Roslyn's… |
| Core.Scripting-014 | Medium | Concurrency & thread safety | `CompiledScriptCache.cs:91-103` (`Clear`) | `Clear()` snapshots `_cache.Keys.ToArray()` then iterates, calling `TryRemove(key, out var lazy)` on each — the key-only overload, not the value-scoped one used in `GetOrCompile`'s catch block. Between the snapshot and a given `TryRemove`,… |
| Core.Scripting-016 | Medium | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` | The Core.Scripting-008 resolution introduced `ScriptEvaluator.IDisposable` + `CompiledScriptCache.Clear()` that disposes each materialised evaluator before dropping its dictionary entry, so per-publish ALC accretion is no longer process-li… |
| Driver.Galaxy-015 | Medium | Security | `libs/MxGateway.Client.dll`, `libs/MxGateway.Contracts.dll`, `libs/README.md` | Commit `994997b` checks in two binary DLLs (`MxGateway.Client.dll`, 99 840 bytes; `MxGateway.Contracts.dll`, 489 984 bytes) under `src/Drivers/.../Driver.Galaxy/libs/` and references them via `<Reference HintPath="…" />`. These are the onl… |
| Driver.Galaxy-016 | Medium | Performance & resource management | `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47`, `libs/README.md:32-37` | The five new `PackageReference` versions declared in the csproj (`Google.Protobuf` 3.34.1, `Grpc.Core.Api` 2.76.0, `Grpc.Net.Client` 2.71.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.0, `Polly` 8.5.2) do not all match what the vendo… |
| Core.ScriptedAlarms-013 | Low | Documentation & comments | `ScriptedAlarmEngine.cs:66-81` | The new internal test accessors `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` (introduced by the Core.ScriptedAlarms-009 resolution at `0001cdd`) return the *live* per-alarm scratch — the same `Dictionary<string, DataVa… |
| Core.Scripting-015 | Low | Correctness & logic bugs | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) | `ToCSharpTypeName` is documented to handle nested types (`Outer+Inner``Outer.Inner`) via `Replace('+', '.')` for the non-generic path (line 269) but the generic path (line 263-266) constructs the name from `def.FullName!` then takes a s… |
| Driver.Cli.Common-008 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:50-64` | Commit `5a9c459` adds a new `FormatStatus_names_native_driver_emitted_codes` `[Theory]` whose five `[InlineData]` rows are identical to five rows added to the existing `FormatStatus_names_well_known_status_codes` `[Theory]` in the same com… |
| Driver.Galaxy-017 | Low | Design-document adherence | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (no source change), gateway proto contract | The vendored `MxGateway.Contracts.dll` only carries the OLD `MxGateway.Contracts.Proto[.Galaxy]` namespace (PE-namespace dump confirms — `MxGateway.Client`, `MxGateway.Contracts`, `MxGateway.Contracts.Proto`, `MxGateway.Contracts.Proto.Gal… |
| Driver.Galaxy-018 | Low | Documentation & comments | `libs/README.md:32-37`, `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47` | Several small documentation issues in the vendoring artefacts: 1. `libs/README.md` says "Versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj` uses" — but `ZB.MOM.WW.MxGateway.Contracts.csproj` only declares `Google… |
_No pending findings._
## Closed findings
@@ -88,6 +75,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.AlarmHistorian-006 | High | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:103,135-216` |
| Core.ScriptedAlarms-001 | High | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:175`, `ScriptedAlarmEngine.cs:178`, `ScriptedAlarmEngine.cs:73`, `ScriptedAlarmEngine.cs:368` |
| Core.Scripting-002 | High | Resolved | Security | `ForbiddenTypeAnalyzer.cs:70` |
| Core.Scripting-012 | High | Resolved | Security | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
| Core.VirtualTags-001 | High | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:306` |
| Driver.AbCip-001 | High | Resolved | Correctness & logic bugs | `AbCipDriver.cs:111`, `AbCipDriver.cs:163-167` |
| Driver.AbCip-002 | High | Resolved | Correctness & logic bugs | `AbCipStatusMapper.cs:65-78` |
@@ -96,6 +84,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.AbLegacy-001 | High | Resolved | Correctness & logic bugs | `AbLegacyAddress.cs:54`, `AbLegacyDriver.cs:368-374` |
| Driver.AbLegacy-006 | High | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:107-158`, `AbLegacyDriver.cs:162-234`, `LibplctagLegacyTagRuntime.cs` |
| Driver.Cli.Common-001 | High | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:106-119` |
| Driver.Cli.Common-007 | High | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:129` |
| Driver.FOCAS-001 | High | Resolved | Correctness & logic bugs | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` |
| Driver.FOCAS-002 | High | Resolved | Correctness & logic bugs | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` |
| Driver.Galaxy-002 | High | Resolved | Correctness & logic bugs | `Browse/DataTypeMap.cs:13`, `Runtime/MxValueDecoder.cs:9` |
@@ -162,6 +151,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.Scripting-004 | Medium | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:73` |
| Core.Scripting-007 | Medium | Resolved | Error handling & resilience | `TimedScriptEvaluator.cs:60` |
| Core.Scripting-010 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
| Core.Scripting-013 | Medium | Resolved | Security | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
| Core.Scripting-014 | Medium | Resolved | Concurrency & thread safety | `CompiledScriptCache.cs:91-103` (`Clear`) |
| Core.Scripting-016 | Medium | Resolved | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
| Core.VirtualTags-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
| Core.VirtualTags-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
| Core.VirtualTags-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
@@ -199,6 +191,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.Galaxy-009 | Medium | Resolved | Error handling & resilience | `GalaxyDriver.cs:354-371` |
| Driver.Galaxy-011 | Medium | Resolved | Performance & resource management | `GalaxyDriver.cs:411` |
| Driver.Galaxy-014 | Medium | Resolved | Testing coverage | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
| Driver.Galaxy-016 | Medium | Resolved | Performance & resource management | `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47`, `libs/README.md:32-37` |
| Driver.Historian.Wonderware-002 | Medium | Resolved | Correctness and logic bugs | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
| Driver.Historian.Wonderware-003 | Medium | Resolved | Correctness and logic bugs | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
| Driver.Historian.Wonderware-006 | Medium | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:120-128` |
@@ -297,11 +290,13 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.ScriptedAlarms-009 | Low | Resolved | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
| Core.ScriptedAlarms-010 | Low | Resolved | Design-document adherence | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` |
| Core.ScriptedAlarms-011 | Low | Resolved | Code organization & conventions | `Part9StateMachine.cs:275` |
| Core.ScriptedAlarms-013 | Low | Resolved | Documentation & comments | `ScriptedAlarmEngine.cs:66-81` |
| Core.Scripting-005 | Low | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:97` |
| Core.Scripting-006 | Low | Resolved | Concurrency & thread safety | `CompiledScriptCache.cs:55` |
| Core.Scripting-008 | Low | Resolved | Performance & resource management | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Core.Scripting-009 | Low | Resolved | Design-document adherence | `ForbiddenTypeAnalyzer.cs:45` |
| Core.Scripting-011 | Low | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` |
| Core.Scripting-015 | Low | Resolved | Correctness & logic bugs | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
| Core.VirtualTags-004 | Low | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:349` |
| Core.VirtualTags-006 | Low | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:177-182`, `:395-401` |
| Core.VirtualTags-007 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/TimerTriggerScheduler.cs:58` |
@@ -331,6 +326,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.AbLegacy.Cli-007 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` |
| Driver.Cli.Common-004 | Low | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` |
| Driver.Cli.Common-006 | Low | Resolved | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` |
| Driver.Cli.Common-008 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:50-64` |
| Driver.FOCAS-007 | Low | Resolved | Error handling & resilience | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` |
| Driver.FOCAS-008 | Low | Resolved | Performance & resource management | `FocasDriver.cs:201`, `FocasDriver.cs:253` |
| Driver.FOCAS-009 | Low | Resolved | Design-document adherence | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` |
@@ -345,6 +341,8 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.Galaxy-010 | Low | Resolved | Security | `GalaxyDriver.cs:311-341` |
| Driver.Galaxy-012 | Low | Resolved | Performance & resource management | `Runtime/SubscriptionRegistry.cs:65-67`, `GalaxyDriver.cs:538`, `GalaxyDriver.cs:675` |
| Driver.Galaxy-013 | Low | Resolved | Design-document adherence | `GalaxyDriver.cs:14-27`, `GalaxyDriver.cs:374-382`, `Config/GalaxyDriverOptions.cs:84-86` |
| Driver.Galaxy-017 | Low | Deferred | Design-document adherence | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (no source change), gateway proto contract |
| Driver.Galaxy-018 | Low | Resolved | Documentation & comments | `libs/README.md:32-37`, `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47` |
| Driver.Historian.Wonderware-004 | Low | Resolved | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` |
| Driver.Historian.Wonderware-005 | Low | Resolved | Concurrency and thread safety | `Backend/HistorianDataSource.cs:124`, `:126-127` |
| Driver.Historian.Wonderware-007 | Low | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:70-75` |
@@ -402,3 +400,4 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-012 | Low | Resolved | Performance & resource management | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
| Server-014 | Low | Resolved | Code organization & conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
| Server-015 | Low | Resolved | Documentation & comments | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
| Driver.Galaxy-015 | ~~Medium~~ Low (re-triaged 2026-05-23) | Resolved | ~~Security~~ Documentation & comments (re-triaged 2026-05-23) | `libs/MxGateway.Client.dll`, `libs/MxGateway.Contracts.dll`, `libs/README.md` |
+2 -2
View File
@@ -6,7 +6,7 @@ The runtime is split across two projects: `Core.Scripting` holds the Roslyn sand
## Roslyn script sandbox (`Core.Scripting`)
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp.Scripting` against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp` (regular compiler, not the scripting variant — the original `CSharpScript` pipeline was retired by the Core.Scripting-008 / -016 rewrite, see "Compile cache" below). Each script's source is pasted as the body of a synthesized `CompiledScript.Run(ScriptGlobals<TContext>)` method against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
### Compile pipeline (`ScriptEvaluator<TContext, TResult>`)
@@ -36,7 +36,7 @@ Similarly, **`System.Threading.Tasks` is now denied** (Core.Scripting-003), whic
### Per-evaluation timeout (`TimedScriptEvaluator<TContext, TResult>`)
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying `ScriptRunner` keeps running on its thread-pool thread until the Roslyn runtime returns (documented trade-off; out-of-process evaluation is a v3 concern).
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying compiled-script delegate keeps running on its `Task.Run` thread-pool thread until it returns of its own accord (the CT is checked only at evaluator entry; once the script body is running, only the script returning or throwing will release the thread). The post-rewrite delegate is a regular C# `Func<>` bound to the synthesized `CompiledScript.Run` method, so this is a vanilla "synchronous CPU-bound work on a pool thread" leak rather than anything Roslyn-specific. Documented trade-off; out-of-process evaluation is a v3 concern.
### Script logger plumbing
+13 -3
View File
@@ -583,10 +583,20 @@ language binding.
**Depends on:** A.1 merged (proto change live).
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`):
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\src\` for .NET — note the sibling
repo restructured after this plan was written; `clients/dotnet/MxGateway.Client.csproj`
no longer exists, the proto contracts now live in
`src/ZB.MOM.WW.MxGateway.Contracts/` under the new namespace
`ZB.MOM.WW.MxGateway.Contracts.Proto[.Galaxy]`; the OtOpcUa driver currently
consumes vendored binaries from the pre-restructure build — see
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/README.md`):
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just
rebuild `MxGateway.Client.csproj` after pulling A.1.
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; rebuild
`src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj`
after pulling A.1. (If unwinding the driver's vendored binaries onto the
new contracts namespace as part of the alarm work, namespace-rename + a
reimplementation of the missing `MxGatewayClient` / `MxGatewaySession`
wrappers is also in scope.)
2. **Python** — run `clients\python\generate-proto.ps1`; commit the
regenerated `_pb2.py` + `_pb2_grpc.py` files under
`clients\python\src\`.
+3 -4
View File
@@ -1,6 +1,6 @@
# High-Level Requirements
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as three cooperating processes (Server, Admin, Galaxy.Host). The Galaxy integration is one of seven shipped drivers. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 are new and cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
> **Revision** — Refreshed 2026-05-23 for the OtOpcUa v2 multi-driver platform. The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as two cooperating processes (Server, Admin). The Galaxy integration is one of seven shipped drivers and is now an in-process Tier-A driver that talks gRPC to a separately installed `mxaccessgw` gateway (sibling repo) — PR 7.2 (2026-04-30) retired the legacy out-of-process `Galaxy.Host` Windows service. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
## HLR-001: OPC UA Server
@@ -28,11 +28,10 @@ Drivers whose backend has a native change signal (e.g. Galaxy's `time_of_last_de
## HLR-007: Service Hosting
The system shall be deployed as three cooperating Windows services:
The system shall be deployed as two cooperating Windows services (the legacy `OtOpcUa.Galaxy.Host` x86 host was retired in PR 7.2 — Galaxy access now flows through the separately installed `mxaccessgw` gateway, which lives in a sibling repository and is not part of the OtOpcUa deployment):
- **OtOpcUa.Server** — .NET 10 x64, `Microsoft.Extensions.Hosting` + `AddWindowsService`, hosts all non-Galaxy drivers in-process and the OPC UA endpoint.
- **OtOpcUa.Server** — .NET 10 AnyCPU, `Microsoft.Extensions.Hosting` + `AddWindowsService` (decision #30 replaced the original TopShelf choice), hosts every driver in-process — including the new Tier-A `GalaxyDriver` that speaks gRPC to `mxaccessgw` and the OPC UA endpoint.
- **OtOpcUa.Admin** — .NET 10 x64 Blazor Server web app, hosts the admin UI, SignalR hubs for live updates, `/metrics` Prometheus endpoint, and audit log writers.
- **OtOpcUa.Galaxy.Host** — .NET Framework 4.8 x86 (TopShelf), hosts MXAccess COM + Galaxy Repository SQL + Historian plugin. Talks to `Driver.Galaxy.Proxy` inside `OtOpcUa.Server` via a named pipe (MessagePack over length-prefixed frames, per-process shared secret, SID-restricted ACL).
## HLR-008: Logging
@@ -29,7 +29,7 @@ Tie-in capability — **historian alarm sink**:
| 3 | Evaluation trigger = **change-driven + timer-driven**; operator chooses per-tag | Change-driven is cheap at steady state; timer is the escape hatch for polling derivations that don't have a discrete "input changed" signal. |
| 4 | Script shape = **Shape A — one script per virtual tag/alarm**; `return` produces the value (or `bool` for alarm condition) | Minimal surface; no predicate/action split. Alarm side-effects (severity, message) configured out-of-band, not in the script. |
| 5 | Alarm fidelity = **full OPC UA Part 9** | Uniform with Galaxy + ALMD on the wire; client-side tooling (HMIs, historians, event pipelines) gets one shape. |
| 6 | Sandbox = **read-only context**; scripts can only read any tag + write to virtual tags | Strict Roslyn `ScriptOptions` allow-list. Authoritative deny-list (`ForbiddenTypeAnalyzer`): namespace-prefix deny `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks` (Task / Parallel fan-out — Core.Scripting-003), `System.Runtime.InteropServices`, `Microsoft.Win32`; type-granular deny `System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread` (these live directly in the allow-listed `System` / `System.Threading` namespaces, so a prefix rule cannot reach them without blocking primitives — Core.Scripting-001 / -009). |
| 6 | Sandbox = **read-only context**; scripts can only read any tag + write to virtual tags | Strict Roslyn `ScriptOptions` allow-list. Authoritative deny-list (`ForbiddenTypeAnalyzer`): namespace-prefix deny `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks` (Task / Parallel fan-out — Core.Scripting-003), `System.Runtime.InteropServices`, `System.Runtime.Loader` (AssemblyLoadContext et al. — Core.Scripting-012), `Microsoft.Win32`; type-granular deny `System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`, `System.Threading.ThreadPool` (Core.Scripting-012), `System.Threading.Timer` (Core.Scripting-012) (these live directly in the allow-listed `System` / `System.Threading` namespaces, so a prefix rule cannot reach them without blocking primitives — Core.Scripting-001 / -009 / -012). |
| 7 | Dependency declaration = **AST inference**; operator doesn't maintain a separate dependency list | `CSharpSyntaxWalker` extracts `ctx.GetTag("path")` string-literal calls at compile time; dynamic paths rejected at publish. |
| 8 | Config storage = **config DB with generation-sealed cache** (same as driver instances) | Virtual tags + alarms publish atomically in the same generation as the driver instance config they may depend on. |
| 9 | Script return value shape (`ctx.GetTag`) = **`DataValue { Value, StatusCode, Timestamp }`** | Scripts branch on quality naturally without separate `ctx.GetQuality(...)` calls. |
@@ -89,7 +89,7 @@ Tie-in capability — **historian alarm sink**:
### Stream A — `Core.Scripting` (Roslyn engine + sandbox + AST inference + logger) — **2 weeks**
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers.
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers. _(Implementation note 2026-05-23 — superseded by Core.Scripting-008 / -016: the `CSharpScript`/`ScriptRunner` path was replaced with a hand-rolled `CSharpCompilation.Create``Emit(MemoryStream)` → collectible `ScriptAssemblyLoadContext.LoadFromStream` pipeline so per-publish ALC accretion is reclaimable, and engines route compiles through `CompiledScriptCache` rather than calling `ScriptEvaluator.Compile` directly. The reference list was correspondingly widened from the narrow allow-list above to the full BCL `TRUSTED_PLATFORM_ASSEMBLIES` set (filtered to `System.*` + `netstandard` + `Microsoft.Win32.Registry`) because the new pipeline can't compile against the old narrow set; `ForbiddenTypeAnalyzer` is now the sole security gate, consistent with how Core.Scripting-001 / -002 established the analyzer must be the real boundary because type forwarding makes any references-list-only restriction porous. See `docs/VirtualTags.md` "Compile cache" for the current implementation contract.)_
2. **A.2** `DependencyExtractor : CSharpSyntaxWalker`. Visits every `InvocationExpressionSyntax` targeting `ctx.GetTag` / `ctx.SetVirtualTag`; accepts only a `LiteralExpressionSyntax` argument. Non-literal arguments (concat, variable, method call) → publish-time rejection with an actionable error pointing the operator at the exact span. Outputs `IReadOnlySet<string> Inputs` + `IReadOnlySet<string> Outputs`.
3. **A.3** Compile cache. `(source_hash) → compiled Script<T>`. Recompile only when source changes. Warm on `SealedBootstrap`.
4. **A.4** Per-evaluation timeout wrapper (default 250ms; configurable per tag). Timeout = tag quality `BadInternalError` + structured warning log. Keeps a single runaway script from starving the engine.
@@ -162,7 +162,7 @@ Tie-in capability — **historian alarm sink**:
## Compliance Checks (run at exit gate)
- [ ] **Sandbox escape**: attempts to reference any deny-listed namespace prefix (`System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `Microsoft.Win32`) or any of the type-granular forbidden types (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`) fail at script compile with an actionable error. Vectors include direct calls, `typeof(T)`, generic type arguments, casts, `is`/`as` patterns, `default(T)`, array element types, and explicitly-typed local declarations.
- [ ] **Sandbox escape**: attempts to reference any deny-listed namespace prefix (`System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `System.Runtime.Loader`, `Microsoft.Win32`) or any of the type-granular forbidden types (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`, `System.Threading.ThreadPool`, `System.Threading.Timer`) fail at script compile with an actionable error. Vectors include direct calls, `typeof(T)`, generic type arguments, casts, `is`/`as` patterns, `default(T)`, array element types, and explicitly-typed local declarations.
- [ ] **Dependency inference**: `ctx.GetTag(myStringVar)` (non-literal path) is rejected at publish with a span-pointed error; `ctx.GetTag("Line1/Speed")` is accepted + appears in the inferred input set.
- [ ] **Change cascade**: tag A → virtual tag B → virtual tag C. When A changes, B recomputes, then C recomputes. Single change event triggers the full cascade in topological order within one evaluation pass.
- [ ] **Cycle rejection**: publish a config where virtual tag B depends on A and A depends on B. Publish fails pre-commit with a clear cycle message.
+2 -2
View File
@@ -193,7 +193,7 @@ ConfigurationService
- Compact binary format, faster than JSON, good fit for high-frequency data change callbacks
- Simpler than gRPC on .NET 4.8 (which needs legacy `Grpc.Core` native library)
**Decided: Galaxy Host is a separate Windows service.**
**Decided: Galaxy Host is a separate Windows service.** _(Reversed by PR 7.2, 2026-04-30 — see PR 7.2's commit `ae7106d` and the project_galaxy_via_mxgateway memory entry. The legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the `OtOpcUaGalaxyHost` Windows service were retired; Galaxy access now flows through the in-process Tier-A `GalaxyDriver` talking gRPC to a separately installed `mxaccessgw` gateway sibling repo. The reasoning below was correct for the original LMX/x86-COM architecture; the gateway sibling repo now owns those constraints externally.)_
- Independent lifecycle from the OtOpcUa Server
- Can be restarted without affecting the main server or other drivers
- Galaxy.Proxy detects connection loss, sets Bad quality on Galaxy nodes, reconnects when Host comes back
@@ -801,7 +801,7 @@ aggregate runner (#253); server-side factory + seed SQL per driver (#210#213)
| 26 | Admin deploys on same server (co-hosted) | Simplifies deployment; can also run on separate management host | 2026-04-16 |
| 27 | Admin scaffold early, driver-specific screens deferred | Core CRUD for instances/drivers first; per-driver config UI added with each driver | 2026-04-16 |
| 28 | Named pipes for Galaxy IPC | Fast, no port conflicts, native to both .NET 4.8 and .NET 10 | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 (**reversed PR 7.2, 2026-04-30** — Galaxy is now an in-process Tier-A driver talking gRPC to the sibling `mxaccessgw` gateway; see the decision body above) |
| 30 | Drop TopShelf, use Microsoft.Extensions.Hosting | Built-in Windows Service support in .NET 10, no third-party dependency | 2026-04-16 |
| 31 | Mono-repo for all drivers | Simpler dependency management, single CI pipeline, shared abstractions | 2026-04-16 |
| 32 | MessagePack serialization for Galaxy IPC | Binary, fast, works on .NET 4.8+ and .NET 10 via MessagePack-CSharp NuGet | 2026-04-16 |
@@ -63,12 +63,39 @@ public sealed class ScriptedAlarmEngine : IDisposable
private readonly ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId =
new(StringComparer.Ordinal);
/// <summary>
/// Compile cache for every alarm predicate. Routes <see cref="LoadAsync"/>'s
/// <see cref="ScriptEvaluator{TContext, TResult}.Compile"/> calls through the
/// cache so the collectible <see cref="System.Runtime.Loader.AssemblyLoadContext"/>
/// each compile produces is actually disposed on the publish-replace path
/// (Core.Scripting-016): the cache's <see cref="CompiledScriptCache{TContext, TResult}.Clear"/>
/// disposes every materialised evaluator before dropping its dictionary entry,
/// so a config-publish releases the prior generation's ALCs and the per-publish
/// accretion the Core.Scripting-008 fix targeted is actually freed in production.
/// Pre-fix the engine called <c>ScriptEvaluator.Compile</c> directly, which left
/// the ALCs rooted until the process exited — defeating -008 on the real path.
/// </summary>
private readonly CompiledScriptCache<AlarmPredicateContext, bool> _compileCache = new();
/// <summary>
/// Test-only diagnostic: returns the per-alarm scratch read-cache dictionary
/// if one has been allocated, else null. Used by Core.ScriptedAlarms-009
/// regression tests to assert the scratch is reused across evaluations
/// (two reads return the same instance).
/// </summary>
/// <remarks>
/// <b>Synchronization:</b> the returned <see cref="IReadOnlyDictionary{TKey, TValue}"/>
/// is the engine's live mutable read-cache. It is refilled in place by
/// <c>RefillReadCache</c> on every predicate evaluation, under <c>_evalGate</c>.
/// Test callers MUST NOT iterate this dictionary while the engine is
/// actively evaluating (i.e. while an upstream change is mid-flight); the
/// refill clears the dict before repopulating and a concurrent iterator
/// would observe torn / partial state. Safe uses are: reference-identity
/// comparisons (e.g. asserting the same instance is reused across calls),
/// and single-key reads against an engine that has quiesced after a
/// deterministic upstream push. Anything more involved should snapshot a
/// copy under the gate. (Core.ScriptedAlarms-013.)
/// </remarks>
internal IReadOnlyDictionary<string, DataValueSnapshot>? TryGetScratchReadCacheForTest(string alarmId)
=> _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.ReadCache : null;
@@ -77,6 +104,13 @@ public sealed class ScriptedAlarmEngine : IDisposable
/// if one has been allocated, else null. Companion to
/// <see cref="TryGetScratchReadCacheForTest"/>.
/// </summary>
/// <remarks>
/// <b>Synchronization:</b> the returned context wraps the same live
/// read-cache as <see cref="TryGetScratchReadCacheForTest"/> — the same
/// "don't iterate during an in-flight evaluation" caveat applies. Safe
/// for reference-identity assertions on a quiesced engine.
/// (Core.ScriptedAlarms-013.)
/// </remarks>
internal AlarmPredicateContext? TryGetScratchContextForTest(string alarmId)
=> _scratchByAlarmId.TryGetValue(alarmId, out var s) ? s.Context : null;
private readonly ConcurrentDictionary<string, DataValueSnapshot> _valueCache
@@ -143,6 +177,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
// have changed (different Inputs, different Logger), so any reuse would be
// unsafe. (Core.ScriptedAlarms-009)
_scratchByAlarmId.Clear();
// Dispose every compiled-predicate ALC from the prior generation BEFORE we
// recompile this one. Skipping this is what made Core.Scripting-008 a
// no-op in production. (Core.Scripting-016)
_compileCache.Clear();
var compileFailures = new List<string>();
foreach (var def in definitions)
@@ -157,7 +195,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
continue;
}
var evaluator = ScriptEvaluator<AlarmPredicateContext, bool>.Compile(def.PredicateScriptSource);
// Route through CompiledScriptCache so the emitted assembly's
// collectible ALC participates in publish-replace cleanup.
// (Core.Scripting-016)
var evaluator = _compileCache.GetOrCompile(def.PredicateScriptSource);
var timed = new TimedScriptEvaluator<AlarmPredicateContext, bool>(evaluator, _scriptTimeout);
var logger = _loggerFactory.Create(def.AlarmId);
@@ -645,12 +686,24 @@ public sealed class ScriptedAlarmEngine : IDisposable
}
}
// Do NOT clear _alarms here: Timer.Dispose() does not wait for in-flight callbacks,
// so a ShelvingCheckAsync or ReevaluateAsync can still be running inside _evalGate.
// Those paths now re-check _disposed after acquiring the gate and bail out safely.
// Clearing _alarms outside the gate would race concurrent reads and is unnecessary
// (the whole object is being discarded). (Core.ScriptedAlarms-005)
// Safe to clear here: the Task.WhenAll drain above guaranteed no
// ReevaluateAsync / ShelvingCheckAsync is mid-flight, and _disposed=true
// prevents new background work from being queued (OnUpstreamChange bails on
// line 334). Pre-Core.Scripting-016 the comment said "Do NOT clear _alarms",
// but that was when the engine called ScriptEvaluator.Compile directly and
// held the script ALCs through _alarms→AlarmState→TimedScriptEvaluator
// forever — leaving them rooted defeated the -008 collectible-ALC unload.
// Clearing now drops the delegate references so the cache's Dispose call
// below can actually unload the emitted assemblies. (Core.ScriptedAlarms-005
// re-evaluated under -016.)
_alarms.Clear();
_alarmsReferencing.Clear();
_scratchByAlarmId.Clear();
// Dispose every compiled-predicate ALC so the engine's shutdown actually
// releases the emitted assemblies. The drain above ensures no evaluator is
// mid-call; CompiledScriptCache.Dispose internally guards against use-after-
// dispose. (Core.Scripting-016)
_compileCache.Dispose();
}
private sealed record AlarmState(
@@ -88,17 +88,30 @@ public sealed class CompiledScriptCache<TContext, TResult> : IDisposable
/// <see cref="System.Runtime.Loader.AssemblyLoadContext"/> unloads and the
/// emitted script assembly becomes eligible for GC (Core.Scripting-008).
/// </summary>
/// <remarks>
/// Safe to call after <see cref="Dispose"/> — the operation is idempotent.
/// <see cref="Dispose"/> sets <c>_disposed = true</c> before invoking this
/// method (so callers see the post-Dispose guard on <see cref="GetOrCompile"/>),
/// but this method itself MUST run to completion so the Dispose-triggered
/// drain actually unloads every materialised evaluator's ALC. (Core.Scripting-016
/// uncovered this — a previous Clear-aborts-when-disposed guard silently
/// skipped the entire drain on Dispose, leaving emitted assemblies rooted.)
/// </remarks>
public void Clear()
{
if (_disposed) return;
// Snapshot the entries, swap them out, then dispose. We use TryRemove rather
// than _cache.Clear() so a concurrent GetOrCompile re-add after our snapshot
// is not silently lost — a new compile starts a fresh cache entry, the old
// evaluator is still disposed.
foreach (var key in _cache.Keys.ToArray())
// Snapshot (key, value) pairs and remove with the value-scoped
// TryRemove(KeyValuePair<,>) overload — same shape as the
// Core.Scripting-006 fix in GetOrCompile's catch block. A concurrent
// GetOrCompile re-add that hashes to the same key between our snapshot
// and the TryRemove inserts a *different* Lazy reference; the value-
// scoped removal sees the mismatch and leaves the fresh entry intact
// (instead of evicting + disposing it while the concurrent caller
// still holds it). The fresh evaluator and its ALC stay live for the
// concurrent caller. (Core.Scripting-014.)
foreach (var entry in _cache.ToArray())
{
if (_cache.TryRemove(key, out var lazy))
DisposeLazyIfMaterialised(lazy);
if (_cache.TryRemove(entry))
DisposeLazyIfMaterialised(entry.Value);
}
}
@@ -72,6 +72,11 @@ public static class ForbiddenTypeAnalyzer
// a Task fan-out outlives the evaluation timeout entirely
// (Core.Scripting-003).
"System.Runtime.InteropServices",
"System.Runtime.Loader", // AssemblyLoadContext + AssemblyDependencyResolver —
// arbitrary DLL load into the host process
// (Core.Scripting-012). Namespace-prefix rather than
// type-granular so future BCL additions to this
// namespace are denied by default.
"Microsoft.Win32", // registry
];
@@ -113,6 +118,24 @@ public static class ForbiddenTypeAnalyzer
// target it without blocking those legitimate types. Denied type-granularly here.
// (Core.Scripting-010.)
"System.Threading.Thread",
// Core.Scripting-012 — broadening the references list to the BCL trusted-platform-
// assemblies set (Core.Scripting-008 follow-up) re-exposed two background-work
// vectors the original deny-list missed. Both live in System.Threading (shared
// with allowed sync primitives like CancellationToken / SemaphoreSlim), so they
// must be denied type-granularly:
//
// System.Threading.ThreadPool — QueueUserWorkItem / UnsafeQueueUserWorkItem
// re-introduce the background-fanout threat
// Core.Scripting-003 closed against
// System.Threading.Tasks.
// System.Threading.Timer — Timer(callback, …) schedules unbounded work
// that outlives the per-evaluation timeout.
//
// System.Runtime.Loader.AssemblyLoadContext is also covered, but at the namespace-
// prefix level above (System.Runtime.Loader) so future BCL additions to that
// namespace are denied by default.
"System.Threading.ThreadPool",
"System.Threading.Timer",
];
/// <summary>
@@ -76,6 +76,17 @@ public sealed class ScriptEvaluator<TContext, TResult> : IDisposable
var wrapperSource = BuildWrapperSource(scriptSource, sandbox.Imports);
var syntaxTree = CSharpSyntaxTree.ParseText(wrapperSource);
// Step 1a — defend against wrapper-source injection (Core.Scripting-013).
// A script body of `return 0; } public static int Evil() { return 0;` would
// close the synthesized `Run` method early, declare a sibling `Evil` method
// inside the synthesized `CompiledScript` class, and leave the wrapper's
// trailing `}` balanced. ForbiddenTypeAnalyzer still walks the injected
// members so this isn't a direct sandbox escape, but it relaxes the
// documented "method body" authoring contract and widens the analyzer's
// surface. Reject by requiring that the parsed `CompiledScript` class
// contains exactly one member declaration (the `Run` method).
EnforceSingleRunMember(syntaxTree);
// Step 2 — Roslyn compile against the sandbox allow-list. Anything not in the
// references set is unresolved and produces a compiler error.
var assemblyName = "ZB.MOM.WW.OtOpcUa.Core.Scripting.Compiled." +
@@ -193,6 +204,70 @@ public sealed class ScriptEvaluator<TContext, TResult> : IDisposable
_alc.Unload();
}
/// <summary>
/// Reject scripts whose source contains brace-balanced injections that would
/// declare sibling members alongside the synthesized <c>CompiledScript.Run</c>
/// method. The expected shape is a single <c>CompiledScript</c> class with
/// exactly one member — the <c>Run</c> method. Anything else (a sibling
/// method, nested class, additional class in the namespace, free-floating
/// top-level statement) means the user source closed the synthesized braces
/// early and injected its own declarations. (Core.Scripting-013.)
/// </summary>
private static void EnforceSingleRunMember(SyntaxTree syntaxTree)
{
var root = syntaxTree.GetCompilationUnitRoot();
// The compilation unit must hold exactly one type declaration — our
// CompiledScript. Anything else means the user closed the synthesized
// namespace or class early and injected another type declaration.
var typeMembers = root.DescendantNodes()
.OfType<Microsoft.CodeAnalysis.CSharp.Syntax.BaseTypeDeclarationSyntax>()
.ToArray();
if (typeMembers.Length != 1 || typeMembers[0].Identifier.ValueText != "CompiledScript")
{
throw new CompilationErrorException(new[]
{
Diagnostic.Create(
new DiagnosticDescriptor(
id: "LMX001",
title: "Script wrapper injection",
messageFormat: "Script source must be a statement body. Declarations of " +
"additional types alongside the wrapper's CompiledScript class " +
"are not allowed; check for unbalanced braces or stray " +
"`class` / `namespace` keywords in the source. (Core.Scripting-013)",
category: "Sandbox",
defaultSeverity: DiagnosticSeverity.Error,
isEnabledByDefault: true),
typeMembers.Length > 1 ? typeMembers[1].Identifier.GetLocation() : Location.None),
});
}
// The CompiledScript class itself must contain exactly one member — the Run
// method. A second member means the user closed Run early and started a sibling.
var classMembers = ((Microsoft.CodeAnalysis.CSharp.Syntax.ClassDeclarationSyntax)typeMembers[0]).Members;
if (classMembers.Count != 1 ||
classMembers[0] is not Microsoft.CodeAnalysis.CSharp.Syntax.MethodDeclarationSyntax m ||
m.Identifier.ValueText != "Run")
{
throw new CompilationErrorException(new[]
{
Diagnostic.Create(
new DiagnosticDescriptor(
id: "LMX002",
title: "Script wrapper injection",
messageFormat: "Script source must be a statement body. Declarations of " +
"sibling members (methods, properties, nested types) alongside " +
"the wrapper's Run method are not allowed; check for unbalanced " +
"braces or a stray `}` followed by a `public`/`private`/`static` " +
"declaration in the source. (Core.Scripting-013)",
category: "Sandbox",
defaultSeverity: DiagnosticSeverity.Error,
isEnabledByDefault: true),
classMembers.Count > 1 ? classMembers[1].GetLocation() : Location.None),
});
}
}
/// <summary>
/// Synthesize the source we hand to Roslyn. The user's script body is pasted
/// verbatim inside <c>CompiledScript.Run</c>; the <c>using</c> directives mirror
@@ -259,11 +334,41 @@ public sealed class ScriptEvaluator<TContext, TResult> : IDisposable
if (t.IsGenericType)
{
var def = t.GetGenericTypeDefinition();
var rawName = def.FullName!.Replace('+', '.');
var nameNoArity = rawName.Substring(0, rawName.IndexOf('`'));
var args = string.Join(", ", t.GetGenericArguments().Select(ToCSharpTypeName));
return "global::" + nameNoArity + "<" + args + ">";
// Walk the FullName by '.' segments (after '+ → .'). For each segment
// ending with `Name\`N`, consume N generic arguments and emit them as
// `<…>` on that segment. Nested generic-in-generic (Outer<T>.Inner<U>)
// emits as `global::Ns.Outer<T>.Inner<U>` — valid C#. The pre-fix code
// used `IndexOf('`')` to find the FIRST backtick and truncated the
// entire name there, silently dropping the rest of the nested-generic
// closed args. (Core.Scripting-015.)
var rawName = t.GetGenericTypeDefinition().FullName!.Replace('+', '.');
var allArgs = t.GetGenericArguments();
var segments = rawName.Split('.');
var argIndex = 0;
var sb = new StringBuilder("global::");
for (int i = 0; i < segments.Length; i++)
{
if (i > 0) sb.Append('.');
var seg = segments[i];
var backtick = seg.IndexOf('`');
if (backtick >= 0)
{
var arity = int.Parse(seg.AsSpan(backtick + 1));
sb.Append(seg, 0, backtick);
sb.Append('<');
for (int j = 0; j < arity; j++)
{
if (j > 0) sb.Append(", ");
sb.Append(ToCSharpTypeName(allArgs[argIndex++]));
}
sb.Append('>');
}
else
{
sb.Append(seg);
}
}
return sb.ToString();
}
return "global::" + t.FullName!.Replace('+', '.');
@@ -37,6 +37,21 @@ public sealed class VirtualTagEngine : IDisposable
private readonly DependencyGraph _graph = new();
private readonly Dictionary<string, VirtualTagState> _tags = new(StringComparer.Ordinal);
/// <summary>
/// Compile cache for every virtual-tag script. Routes <see cref="Load"/>'s
/// <see cref="ScriptEvaluator{TContext, TResult}.Compile"/> calls through the
/// cache so the collectible <see cref="System.Runtime.Loader.AssemblyLoadContext"/>
/// each compile produces is actually disposed on the publish-replace path
/// (Core.Scripting-016): the cache's <see cref="CompiledScriptCache{TContext, TResult}.Clear"/>
/// disposes every materialised evaluator before dropping its dictionary entry,
/// so a config-publish releases the prior generation's ALCs and the per-publish
/// accretion the Core.Scripting-008 fix targeted is actually freed in production.
/// Pre-fix the engine called <c>ScriptEvaluator.Compile</c> directly, which left
/// the ALCs rooted until the process exited — defeating -008 on the real path.
/// </summary>
private readonly CompiledScriptCache<VirtualTagContext, object?> _compileCache = new();
private readonly ConcurrentDictionary<string, DataValueSnapshot> _valueCache = new(StringComparer.Ordinal);
private readonly ConcurrentDictionary<string, List<Action<string, DataValueSnapshot>>> _observers
= new(StringComparer.Ordinal);
@@ -74,6 +89,10 @@ public sealed class VirtualTagEngine : IDisposable
UnsubscribeFromUpstream();
_tags.Clear();
_graph.Clear();
// Dispose every compiled-script ALC from the prior generation BEFORE we
// recompile this one. Skipping this is what made Core.Scripting-008 a
// no-op in production (Core.Scripting-016).
_compileCache.Clear();
var compileFailures = new List<string>();
var seenPaths = new HashSet<string>(StringComparer.Ordinal);
@@ -102,7 +121,9 @@ public sealed class VirtualTagEngine : IDisposable
continue;
}
var evaluator = ScriptEvaluator<VirtualTagContext, object?>.Compile(def.ScriptSource);
// Route through CompiledScriptCache so the emitted assembly's collectible
// ALC participates in publish-replace cleanup. (Core.Scripting-016)
var evaluator = _compileCache.GetOrCompile(def.ScriptSource);
var timed = new TimedScriptEvaluator<VirtualTagContext, object?>(evaluator, _scriptTimeout);
var scriptLogger = _loggerFactory.Create(def.Path);
@@ -481,6 +502,9 @@ public sealed class VirtualTagEngine : IDisposable
UnsubscribeFromUpstream();
_tags.Clear();
_graph.Clear();
// Dispose every compiled-script ALC so the engine's shutdown actually
// releases the emitted assemblies. (Core.Scripting-016)
_compileCache.Dispose();
}
internal DependencyGraph GraphForTesting => _graph;
@@ -126,7 +126,7 @@ public static class SnapshotFormatter
0x803B0000u => "BadNotWritable",
0x803C0000u => "BadOutOfRange",
0x803D0000u => "BadNotSupported",
0x80550000u => "BadDeviceFailure",
0x808B0000u => "BadDeviceFailure",
0x80740000u => "BadTypeMismatch",
0x40000000u => "Uncertain",
_ => null,
@@ -41,7 +41,7 @@ public static class AbCipStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -16,7 +16,7 @@ public static class AbLegacyStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -14,7 +14,7 @@ public static class FocasStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,7 +1,7 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
@@ -2,7 +2,7 @@ using System.Diagnostics.Metrics;
using System.Threading.Channels;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,7 +1,7 @@
using System.Diagnostics.Metrics;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,8 +1,8 @@
using System.Collections.Concurrent;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
// Use the generated nested status enum for the SetBufferedUpdateInterval reply check.
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using System.Runtime.CompilerServices;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -18,33 +18,15 @@
</ItemGroup>
<ItemGroup>
<!-- Vendored mxaccessgw .NET client. Originally consumed via path-based
ProjectReference to the sibling repo, but the sibling repo restructured
and the MxGateway.Client.csproj path no longer exists. The DLLs in
libs/ are the last known-good build (May 2026); they reference proto
types from MxGateway.Contracts.dll using the pre-restructure namespace
(MxGateway.Contracts.Proto). See libs/README.md for the unwinding plan
once the sibling repo restores a client library or we migrate to the
new ZB.MOM.WW.MxGateway.Contracts.Proto namespace. -->
<Reference Include="MxGateway.Client">
<HintPath>libs\MxGateway.Client.dll</HintPath>
<Private>true</Private>
</Reference>
<Reference Include="MxGateway.Contracts">
<HintPath>libs\MxGateway.Contracts.dll</HintPath>
<Private>true</Private>
</Reference>
</ItemGroup>
<ItemGroup>
<!-- Transitive deps the vendored DLLs need exposed at consumer level
(versions match the sibling mxaccessgw repo's
ZB.MOM.WW.MxGateway.Contracts.csproj so binary-compat is preserved). -->
<PackageReference Include="Google.Protobuf" Version="3.34.1" />
<PackageReference Include="Grpc.Core.Api" Version="2.76.0" />
<PackageReference Include="Grpc.Net.Client" Version="2.71.0" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.0" />
<PackageReference Include="Polly" Version="8.5.2" />
<!-- Sibling mxaccessgw repo's .NET client + contracts. The sibling restored
a proper client library under clients/dotnet/ (May 2026), so this is
back on a path-based ProjectReference per the libs/README unwind plan #1.
Both projects target net10.0; the Contracts project transitively pulls
Google.Protobuf + Grpc.Core.Api, the Client project transitively pulls
Grpc.Net.Client + Polly.Core + Microsoft.Extensions.Logging.Abstractions,
so the explicit PackageReference shims that backfilled the vendored
binary references are no longer needed. -->
<ProjectReference Include="..\..\..\..\mxaccessgw\clients\dotnet\ZB.MOM.WW.MxGateway.Client\ZB.MOM.WW.MxGateway.Client.csproj"/>
</ItemGroup>
<ItemGroup>
@@ -1,63 +0,0 @@
# Vendored MxGateway client DLLs
This directory holds binary copies of `MxGateway.Client.dll` and
`MxGateway.Contracts.dll` from the sibling `mxaccessgw` repo's last known-good
build (May 2026). The DLLs are referenced from the driver's csproj as
`<Reference HintPath="…" />` items rather than `ProjectReference`.
## Why
The sibling `mxaccessgw` repo restructured: the `clients/dotnet/MxGateway.Client`
project the driver previously referenced via path-based `ProjectReference` no
longer exists, and the proto contracts moved from the `MxGateway.Contracts.Proto`
namespace to `ZB.MOM.WW.MxGateway.Contracts.Proto`. The driver's source still
expects the pre-restructure namespace, so re-pointing at the new contracts would
require a global namespace rename across ~19 driver files PLUS reimplementing
the `MxGatewayClient` / `MxGatewaySession` / `GalaxyRepositoryClient` types the
old client library provided (the sibling repo dropped the client library
entirely, keeping only the contracts).
Vendoring the binaries unblocks the build in minutes instead of hours, freezes
the gateway contract surface at a known-good version, and preserves the option
to migrate properly later without an emergency rewrite.
## What's vendored
| File | Source | Built against |
|---|---|---|
| `MxGateway.Client.dll` | mxaccessgw repo, `clients/dotnet/MxGateway.Client/` (pre-restructure) | net10.0, `MxGateway.Contracts.dll` |
| `MxGateway.Contracts.dll` | mxaccessgw repo, `src/MxGateway.Contracts/` (pre-restructure) | net10.0, proto namespace `MxGateway.Contracts.Proto[.Galaxy]` |
The NuGet packages the vendored DLLs need transitively (Google.Protobuf,
Grpc.Core.Api, Grpc.Net.Client, Microsoft.Extensions.Logging.Abstractions,
Polly) are declared as direct `PackageReference` in the driver csproj — when
the dropped `ProjectReference` was in place those packages were transitively
provided; with binary references the consumer must declare them explicitly.
Versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj`
uses so the gRPC + proto runtime stays binary-compatible.
## Decompiled-source archive
The vendored DLLs are byte-for-byte the build output. The full source can be
recovered with `ilspycmd MxGateway.Client.dll > MxGateway.Client.cs` if a code
review or audit needs it. There is no proprietary code in either DLL — they
are the OtOpcUa team's own client implementation against the gateway's open
proto contracts.
## How to unwind
Either path closes the vendored-binary debt:
1. **Sibling repo restores `MxGateway.Client.csproj`** (or publishes a NuGet
package). Switch the csproj back to a `ProjectReference` / `PackageReference`,
delete this directory.
2. **Driver migrates to the new `ZB.MOM.WW.MxGateway.Contracts.Proto`
namespace.** Global namespace rename across the ~19 consuming source files,
plus re-implementing `MxGatewayClient` / `MxGatewaySession` /
`GalaxyRepositoryClient` (≈2,200 LoC of behavioural client code) either
inlined into this driver or as a fresh sibling library. Delete this
directory.
Either way: when unwinding, also drop the five `PackageReference` lines added
to the csproj alongside the `<Reference>` items — the new ProjectReference /
PackageReference will provide them transitively again.
@@ -1511,7 +1511,7 @@ public sealed class ModbusDriver
private const uint StatusBadNotWritable = 0x803B0000u;
private const uint StatusBadOutOfRange = 0x803C0000u;
private const uint StatusBadNotSupported = 0x803D0000u;
private const uint StatusBadDeviceFailure = 0x80550000u;
private const uint StatusBadDeviceFailure = 0x808B0000u;
private const uint StatusBadCommunicationError = 0x80050000u;
/// <summary>
@@ -69,7 +69,7 @@ public sealed class S7Driver(S7DriverOptions options, string driverInstanceId, I
/// <summary>OPC UA StatusCode used for socket / timeout / protocol-layer faults.</summary>
private const uint StatusBadCommunicationError = 0x80050000u;
/// <summary>OPC UA StatusCode used for a genuine device fault (CPU error, hardware fault).</summary>
private const uint StatusBadDeviceFailure = 0x80550000u;
private const uint StatusBadDeviceFailure = 0x808B0000u;
private readonly Dictionary<string, S7TagDefinition> _tagsByName = new(StringComparer.OrdinalIgnoreCase);
private readonly Dictionary<string, S7ParsedAddress> _parsedByName = new(StringComparer.OrdinalIgnoreCase);
@@ -16,7 +16,7 @@ public static class TwinCATStatusMapper
public const uint BadNotWritable = 0x803B0000u;
public const uint BadOutOfRange = 0x803C0000u;
public const uint BadNotSupported = 0x803D0000u;
public const uint BadDeviceFailure = 0x80550000u;
public const uint BadDeviceFailure = 0x808B0000u;
public const uint BadCommunicationError = 0x80050000u;
public const uint BadTimeout = 0x800A0000u;
public const uint BadTypeMismatch = 0x80730000u;
@@ -101,29 +101,44 @@ else
{
<Generations ClusterId="@ClusterId"/>
}
else if (_tab == "equipment" && _currentDraft is not null)
else if (_tab is "equipment" or "uns" or "namespaces" or "drivers" or "tags" or "acls")
{
<EquipmentTab GenerationId="@_currentDraft.GenerationId"/>
}
else if (_tab == "uns" && _currentDraft is not null)
{
<UnsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "namespaces" && _currentDraft is not null)
{
<NamespacesTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "drivers" && _currentDraft is not null)
{
<DriversTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "tags" && _currentDraft is not null)
{
<TagsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "acls" && _currentDraft is not null)
{
<AclsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
@* Bug #10 fix — these six tabs are scoped to a generation. Per docs/v2/admin-ui.md the
design intent is a read-only view of the published generation when no draft is open
("Edit in draft" affordance), and the editable view of the draft when one is open.
The earlier implementation rendered nothing in the no-draft case, leaving operators
with just the "Open a draft to edit" placeholder. We now route both states through
the same tab components, gating edits via <fieldset disabled> so a button click in
the read-only state cannot silently mutate the published rows even though the tab
components themselves haven't been refactored to honor an IsReadOnly flag yet. *@
var genId = _currentDraft?.GenerationId ?? _currentPublished?.GenerationId;
var isReadOnly = _currentDraft is null;
if (genId is null)
{
<section class="panel notice rise" style="animation-delay:.02s">
No published generation yet. Click <strong>New draft</strong> above to author this cluster's first generation.
</section>
}
else
{
if (isReadOnly)
{
<section class="panel notice rise mb-3" style="animation-delay:.02s">
<strong>Read-only view</strong> of published generation @genId. Click <strong>New draft</strong> above to make changes.
</section>
}
<fieldset disabled="@isReadOnly" style="border:0;padding:0;margin:0;min-width:0;">
@switch (_tab)
{
case "equipment": <EquipmentTab GenerationId="@genId.Value"/> break;
case "uns": <UnsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "namespaces": <NamespacesTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "drivers": <DriversTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "tags": <TagsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "acls": <AclsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
}
</fieldset>
}
}
else if (_tab == "redundancy")
{
@@ -133,10 +148,6 @@ else
{
<AuditTab ClusterId="@ClusterId"/>
}
else
{
<section class="panel notice rise" style="animation-delay:.02s">Open a draft to edit this cluster's content.</section>
}
}
@code {
@@ -16,12 +16,40 @@ public sealed class ClusterNodeService(OtOpcUaConfigDbContext db)
/// tolerance covers a missed heartbeat plus publisher GC pauses.</summary>
public static readonly TimeSpan StaleThreshold = TimeSpan.FromSeconds(30);
public Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct) =>
db.ClusterNodes.AsNoTracking()
public async Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct)
{
var nodes = await db.ClusterNodes.AsNoTracking()
.Where(n => n.ClusterId == clusterId)
.OrderByDescending(n => n.ServiceLevelBase)
.ThenBy(n => n.NodeId)
.ToListAsync(ct);
.ToListAsync(ct).ConfigureAwait(false);
// Bug #12 fix follow-up — the live-node heartbeat lands on
// ClusterNodeGenerationState.LastSeenAt (written by sp_RegisterNodeGenerationApplied
// on every generation poll). The ClusterNode.LastSeenAt column is a legacy slot that
// no current writer maintains, so reading it directly would show "never STALE"
// forever for every running node. Overlay the GenerationState heartbeat onto the
// returned ClusterNode rows when it's more recent so the Redundancy tab + IsStale
// predicate reflect actual liveness without needing a new write path or schema change.
var nodeIds = nodes.Select(n => n.NodeId).ToList();
if (nodeIds.Count > 0)
{
var heartbeats = await db.ClusterNodeGenerationStates.AsNoTracking()
.Where(s => nodeIds.Contains(s.NodeId))
.Select(s => new { s.NodeId, s.LastSeenAt })
.ToListAsync(ct).ConfigureAwait(false);
var beatByNode = heartbeats.ToDictionary(s => s.NodeId, s => s.LastSeenAt);
foreach (var n in nodes)
{
if (beatByNode.TryGetValue(n.NodeId, out var hb) && hb is not null
&& (n.LastSeenAt is null || hb > n.LastSeenAt))
{
n.LastSeenAt = hb;
}
}
}
return nodes;
}
public static bool IsStale(ClusterNode node) =>
node.LastSeenAt is null || DateTime.UtcNow - node.LastSeenAt.Value > StaleThreshold;
@@ -1,6 +1,7 @@
using Microsoft.Data.SqlClient;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
using ZB.MOM.WW.OtOpcUa.Server.Redundancy;
namespace ZB.MOM.WW.OtOpcUa.Server.Hosting;
@@ -42,10 +43,20 @@ public sealed class GenerationRefreshHostedService(
RedundancyCoordinator coordinator,
ILogger<GenerationRefreshHostedService> logger,
TimeSpan? tickInterval = null,
Func<CancellationToken, Task<long?>>? currentGenerationQuery = null) : BackgroundService
Func<CancellationToken, Task<long?>>? currentGenerationQuery = null,
Func<long, NodeApplyStatus, string?, CancellationToken, Task>? registerAppliedAsync = null) : BackgroundService
{
private readonly Func<CancellationToken, Task<long?>> _generationQuery = currentGenerationQuery
?? new Func<CancellationToken, Task<long?>>(ct => DefaultQueryCurrentGenerationAsync(options, logger, ct));
// Bug #12 fix — the server now reports applied-generation state + heartbeat back to the
// central DB via sp_RegisterNodeGenerationApplied. Before this wiring the proc had zero
// callers, so dbo.ClusterNodeGenerationState stayed empty for every node and the Admin UI
// Fleet status page + cluster-detail Redundancy LastSeenAt both showed "no node state /
// never STALE" indefinitely. Tests inject a stub via the registerAppliedAsync parameter.
private readonly Func<long, NodeApplyStatus, string?, CancellationToken, Task> _registerApplied = registerAppliedAsync
?? new Func<long, NodeApplyStatus, string?, CancellationToken, Task>(
(gen, status, err, ct) => DefaultRegisterAppliedAsync(options, logger, gen, status, err, ct));
/// <summary>
/// How often the service polls <c>sp_GetCurrentGenerationForCluster</c>. Default 5 s —
/// low enough that operator publishes take effect promptly, high enough that the
@@ -97,6 +108,18 @@ public sealed class GenerationRefreshHostedService(
if (LastAppliedGenerationId is long last && current == last)
{
// Heartbeat — re-stamps LastSeenAt on dbo.ClusterNodeGenerationState so the Admin
// Fleet status page + cluster Redundancy tab can detect the node is alive without
// a generation change. Best-effort: a transient DB error here must not throw out of
// the tick (the next tick will retry) and must not block applies.
try
{
await _registerApplied(current.Value, NodeApplyStatus.Applied, null, cancellationToken).ConfigureAwait(false);
}
catch (Exception hbEx) when (hbEx is not OperationCanceledException)
{
logger.LogDebug(hbEx, "Heartbeat to sp_RegisterNodeGenerationApplied failed; will retry next tick");
}
return; // no change
}
@@ -109,14 +132,44 @@ public sealed class GenerationRefreshHostedService(
// lease is open. Publisher ticks in parallel (1s cadence) will observe the band
// transition and push it onto the OPC UA Server.ServiceLevel node.
var publishRequestId = Guid.NewGuid();
await using (leases.BeginApplyLease(current.Value, publishRequestId))
NodeApplyStatus applyStatus;
string? applyError = null;
try
{
await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false);
// Future: fire a domain event that driver hosts / virtual-tag engine /
// scripted-alarm engine subscribe to. For now the topology refresh is the
// only thing we rewire — everything else still requires a process restart.
await using (leases.BeginApplyLease(current.Value, publishRequestId))
{
await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false);
// Future: fire a domain event that driver hosts / virtual-tag engine /
// scripted-alarm engine subscribe to. For now the topology refresh is the
// only thing we rewire — everything else still requires a process restart.
}
applyStatus = NodeApplyStatus.Applied;
}
catch (Exception applyEx) when (applyEx is not OperationCanceledException)
{
applyStatus = NodeApplyStatus.Failed;
applyError = applyEx.Message;
logger.LogError(applyEx, "Apply of generation {Generation} failed; will report Failed status to central DB", current);
// fall through to register so operators see the failed apply in /fleet
}
// Always tell the central DB what happened with this apply attempt — success or
// failure. The proc upserts dbo.ClusterNodeGenerationState (CurrentGenerationId +
// LastAppliedAt + LastAppliedStatus + LastAppliedError + LastSeenAt). Failure here
// mustn't prevent us from advancing LastAppliedGenerationId — the apply already
// happened (or already failed); the publish is purely observability.
try
{
await _registerApplied(current.Value, applyStatus, applyError, cancellationToken).ConfigureAwait(false);
}
catch (Exception regEx) when (regEx is not OperationCanceledException)
{
logger.LogWarning(regEx, "sp_RegisterNodeGenerationApplied call failed for gen {Generation} status {Status}", current, applyStatus);
}
// Advance the cursor even on Failed — the proc has been told; next tick will heartbeat
// and a future generation will trigger a fresh apply attempt. Pinning the cursor on
// failure would loop us through the same broken apply every 5s.
LastAppliedGenerationId = current;
RefreshCount++;
}
@@ -157,4 +210,35 @@ public sealed class GenerationRefreshHostedService(
return null;
}
}
/// <summary>
/// Default register-applied implementation — calls <c>sp_RegisterNodeGenerationApplied</c>
/// to MERGE-upsert <see cref="ZB.MOM.WW.OtOpcUa.Configuration.Entities.ClusterNodeGenerationState"/>
/// for this node. Called both at apply completion (success or failure) and on every
/// no-change heartbeat tick so <c>LastSeenAt</c> stays fresh in the central DB and the
/// Admin UI Fleet status page + Redundancy LastSeenAt indicator can detect a healthy node.
/// Bug #12 fix — wires the previously-orphaned proc into the apply loop.
/// </summary>
private static async Task DefaultRegisterAppliedAsync(
NodeOptions options,
ILogger logger,
long generationId,
NodeApplyStatus status,
string? error,
CancellationToken cancellationToken)
{
await using var conn = new SqlConnection(options.ConfigDbConnectionString);
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
await using var cmd = conn.CreateCommand();
cmd.CommandText = "EXEC dbo.sp_RegisterNodeGenerationApplied @NodeId=@n, @GenerationId=@g, @Status=@s, @Error=@e";
cmd.Parameters.AddWithValue("@n", options.NodeId);
cmd.Parameters.AddWithValue("@g", generationId);
cmd.Parameters.AddWithValue("@s", status.ToString());
cmd.Parameters.AddWithValue("@e", (object?)error ?? DBNull.Value);
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
// Single-line trace so soak runs can see heartbeat ticks without flooding at Info.
logger.LogTrace("Reported gen {Generation} status {Status} to central DB", generationId, status);
}
}
@@ -1015,4 +1015,75 @@ public sealed class ScriptedAlarmEngineTests
"LoadAsync must drop the prior generation's scratch — reuse across a publish " +
"would attach a stale Logger / Inputs to the new alarm definition.");
}
// --- Core.Scripting-016: engine routes compiles through CompiledScriptCache ---
[Fact]
public void Dispose_unloads_compiled_predicate_assembly()
{
// Pre-fix the engine called ScriptEvaluator.Compile directly, so the
// emitted predicate assembly's ALC stayed loaded for the process lifetime.
// After the fix the engine routes through CompiledScriptCache; engine
// Dispose triggers cache Dispose which unloads every cached evaluator's ALC.
// Assert via WeakReference + GC that the assembly is actually reclaimed.
// Helper is sync + [NoInlining] so its locals can't be kept alive by an
// async state machine (an earlier async version of this test failed because
// the state-machine struct held the evaluator past the method-end).
var weak = CompileAlarmAndCaptureWeak();
for (int i = 0; i < 10 && weak.IsAlive; i++)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
}
weak.IsAlive.ShouldBeFalse(
"engine Dispose must release the compiled-predicate assembly via " +
"CompiledScriptCache (Core.Scripting-016). If this fails the engine is " +
"back to calling ScriptEvaluator.Compile directly and -008's headline " +
"fix doesn't run in production.");
}
[System.Runtime.CompilerServices.MethodImpl(
System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
private static WeakReference CompileAlarmAndCaptureWeak()
{
var up = new FakeUpstream();
up.Set("Temp", 50);
var eng = Build(up, out _);
// Block on LoadAsync so this helper stays synchronous — an `async Task`
// wrapper would compile to a state machine whose generated struct keeps the
// local `eng` reference alive past the method's apparent end, defeating GC.
eng.LoadAsync(
[new ScriptedAlarmDefinition(
"HighTemp", "Plant/Line1", "HighTemp",
AlarmKind.AlarmCondition, AlarmSeverity.High,
"x",
"""return (int)ctx.GetTag("Temp").Value > 100;""")],
default).GetAwaiter().GetResult();
// Reach into the engine's compile cache via reflection — the field is
// private; we only need the Assembly reference, scoped to this NoInlining
// helper so the locals die when it returns.
var weak = ExtractEmittedAssemblyWeakRef(eng);
eng.Dispose();
return weak;
}
[System.Runtime.CompilerServices.MethodImpl(
System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
private static WeakReference ExtractEmittedAssemblyWeakRef(ScriptedAlarmEngine eng)
{
var cacheField = typeof(ScriptedAlarmEngine).GetField(
"_compileCache", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)!;
var cache = cacheField.GetValue(eng)!;
var cacheDictField = cache.GetType().GetField(
"_cache", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)!;
var cacheDict = (System.Collections.IDictionary)cacheDictField.GetValue(cache)!;
var lazy = cacheDict.Values.Cast<object>().Single();
var evaluator = lazy.GetType().GetProperty("Value")!.GetValue(lazy)!;
var del = (Delegate)evaluator.GetType().GetField(
"_func", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)!
.GetValue(evaluator)!;
return new WeakReference(del.Method.Module.Assembly);
}
}
@@ -222,6 +222,72 @@ public sealed class CompiledScriptCacheTests
cache.Count.ShouldBe(0, "faulted Lazy must still be evicted after compile failure");
}
[Fact]
public void Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives()
{
// Regression for Core.Scripting-014: Clear() previously used the key-only
// TryRemove(key, out var lazy) overload, which is value-blind. If a
// concurrent GetOrCompile re-inserts a fresh Lazy under the same key
// between Clear's snapshot and the TryRemove, the buggy overload evicts
// the fresh entry and disposes its evaluator while the concurrent
// caller still holds + intends to invoke it. The fixed
// TryRemove(KeyValuePair<,>) overload compares value identity.
//
// Single-threaded test that models the race window: snapshot the
// dictionary like Clear does, mutate the dictionary mid-flight (modelling
// Thread B's GetOrCompile), then drive the same TryRemove(KeyValuePair<,>)
// overload Clear now uses. The buggy key-only overload would evict the
// fresh entry; the fixed value-scoped overload leaves it alone. This
// verifies the *semantic* of Clear's code path — actually intercepting
// a real Clear() invocation mid-loop would require either two threads
// (flaky) or a Dispose side-effect that re-adds within the loop
// iteration (only works with multiple snapshotted entries).
var cache = new CompiledScriptCache<FakeScriptContext, int>();
var cacheField = typeof(CompiledScriptCache<FakeScriptContext, int>)
.GetField("_cache", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
var dict = (System.Collections.Concurrent.ConcurrentDictionary<
string, Lazy<ScriptEvaluator<FakeScriptContext, int>>>)cacheField!.GetValue(cache)!;
var hashSourceMethod = typeof(CompiledScriptCache<FakeScriptContext, int>)
.GetMethod("HashSource", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
var key = (string)hashSourceMethod!.Invoke(null, ["""return 1;"""])!;
var firstGen = new Lazy<ScriptEvaluator<FakeScriptContext, int>>(
() => ScriptEvaluator<FakeScriptContext, int>.Compile("""return 1;"""),
LazyThreadSafetyMode.ExecutionAndPublication);
dict[key] = firstGen;
// Snapshot like Clear's first line does.
var snapshot = dict.ToArray();
snapshot.Length.ShouldBe(1);
// Race window: a concurrent GetOrCompile inserts a fresh Lazy under the
// same key. The dict now holds `fresh`, but the snapshot still references
// `firstGen`.
var fresh = new Lazy<ScriptEvaluator<FakeScriptContext, int>>(
() => ScriptEvaluator<FakeScriptContext, int>.Compile("""return 1;"""),
LazyThreadSafetyMode.ExecutionAndPublication);
dict[key] = fresh;
// The fixed Clear uses the value-scoped TryRemove(KeyValuePair<,>) overload.
// Drive it directly with our snapshot entry — the value comparison sees
// firstGen ≠ fresh and leaves the entry intact.
foreach (var entry in snapshot)
{
var removed = ((System.Collections.Generic.ICollection<
System.Collections.Generic.KeyValuePair<string, Lazy<ScriptEvaluator<FakeScriptContext, int>>>>)dict)
.Remove(entry);
removed.ShouldBeFalse(
"value-scoped removal must reject the stale snapshot entry — the dict " +
"now holds `fresh`, not `firstGen`. The buggy key-only TryRemove would " +
"have returned true and evicted the fresh entry (Core.Scripting-014).");
}
dict.ContainsKey(key).ShouldBeTrue(
"the fresh entry inserted during the race window must survive Clear().");
ReferenceEquals(dict[key], fresh).ShouldBeTrue(
"the surviving entry must be the fresh Lazy, not the one Clear() snapshotted.");
}
// --- Core.Scripting-008: collectible AssemblyLoadContext unload ---
[Fact]
@@ -446,4 +446,88 @@ public sealed class ScriptSandboxTests
return 0;
"""));
}
// --- Core.Scripting-012: BCL refs broadened by -008/-016 re-exposed three vectors
// the original deny-list missed. Each is denied type-granularly in
// ForbiddenTypeAnalyzer.ForbiddenFullTypeNames; these tests pin the rejection.
[Fact]
public void Rejects_ThreadPool_QueueUserWorkItem_at_compile()
{
// System.Threading.ThreadPool.QueueUserWorkItem re-introduces the background-
// fanout threat Core.Scripting-003 closed against System.Threading.Tasks. The
// ThreadPool type lives in System.Threading (shared with allowed sync primitives
// like CancellationToken), so it must be denied type-granularly. (Core.Scripting-012.)
Should.Throw<ScriptSandboxViolationException>(() =>
ScriptEvaluator<FakeScriptContext, int>.Compile(
"""
System.Threading.ThreadPool.QueueUserWorkItem(_ => { });
return 0;
"""));
}
[Fact]
public void Rejects_Timer_new_at_compile()
{
// System.Threading.Timer schedules unbounded callback work that outlives the
// per-evaluation timeout. Same namespace as CancellationToken so type-granular.
// (Core.Scripting-012.)
Should.Throw<ScriptSandboxViolationException>(() =>
ScriptEvaluator<FakeScriptContext, int>.Compile(
"""
var t = new System.Threading.Timer(_ => { });
return 0;
"""));
}
[Fact]
public void Rejects_AssemblyLoadContext_at_compile()
{
// System.Runtime.Loader.AssemblyLoadContext loads arbitrary DLLs into the
// process — defense-in-depth gap that the BCL-wide reference set re-opened.
// System.Reflection already denies the typical invocation path, but the
// sandbox boundary stays type-explicit. (Core.Scripting-012.)
Should.Throw<ScriptSandboxViolationException>(() =>
ScriptEvaluator<FakeScriptContext, int>.Compile(
"""
var alc = System.Runtime.Loader.AssemblyLoadContext.Default;
return 0;
"""));
}
// --- Core.Scripting-013: wrapper-source injection ---
[Fact]
public void Rejects_sibling_method_injection_via_balanced_braces()
{
// Core.Scripting-013 — a brace-balanced source that closes Run early and
// declares a sibling method inside CompiledScript. The analyzer would still
// walk the injected member, but the documented "method body" contract
// forbids the shape outright now.
var ex = Should.Throw<CompilationErrorException>(() =>
ScriptEvaluator<FakeScriptContext, int>.Compile(
"""
return 0;
} public static int Evil() { return 1;
"""));
ex.Message.ShouldContain("Core.Scripting-013");
}
[Fact]
public void Rejects_sibling_class_injection_via_balanced_braces()
{
// Same shape, but injecting an entire sibling class inside the synthesized
// namespace. Reject at the type-count check.
var ex = Should.Throw<CompilationErrorException>(() =>
ScriptEvaluator<FakeScriptContext, int>.Compile(
"""
return 0;
}
}
public static class CompiledScript2 { public static int M() { return 0; } }
namespace ZB.MOM.WW.OtOpcUa.Core.Scripting.Compiled { public static class CompiledScript3 { public static int M() {
return 0;
"""));
ex.Message.ShouldContain("Core.Scripting-013");
}
}
@@ -655,4 +655,60 @@ public sealed class VirtualTagEngineTests
lock (_buf) { _buf.Add((path, value)); }
}
}
// --- Core.Scripting-016: engine routes compiles through CompiledScriptCache ---
[Fact]
public void Dispose_unloads_compiled_script_assembly()
{
// Pre-fix the engine called ScriptEvaluator.Compile directly, so the emitted
// script assembly's ALC stayed loaded for the process lifetime. After the fix
// the engine routes through CompiledScriptCache; engine Dispose triggers cache
// Dispose which unloads every cached evaluator's ALC. Assert via WeakReference
// + GC that the assembly is actually reclaimed.
var weak = CompileTagAndCaptureWeak();
for (int i = 0; i < 10 && weak.IsAlive; i++)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
}
weak.IsAlive.ShouldBeFalse(
"engine Dispose must release the compiled-tag assembly via " +
"CompiledScriptCache (Core.Scripting-016). If this fails the engine is " +
"back to calling ScriptEvaluator.Compile directly and -008's headline " +
"fix doesn't run in production.");
}
[System.Runtime.CompilerServices.MethodImpl(
System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
private static WeakReference CompileTagAndCaptureWeak()
{
var up = new FakeUpstream();
up.Set("X", 10);
var engine = Build(up);
engine.Load([new VirtualTagDefinition(
Path: "Doubled",
DataType: DriverDataType.Float64,
ScriptSource: """return (double)ctx.GetTag("X").Value * 2.0;""")]);
// Reach into the engine's compile cache via reflection — internals scoped to
// Tests-via-InternalsVisibleTo and the field is private.
var cacheField = typeof(VirtualTagEngine).GetField(
"_compileCache", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)!;
var cache = cacheField.GetValue(engine)!;
var cacheDictField = cache.GetType().GetField(
"_cache", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)!;
var cacheDict = (System.Collections.IDictionary)cacheDictField.GetValue(cache)!;
var lazy = cacheDict.Values.Cast<object>().Single();
var lazyValueProp = lazy.GetType().GetProperty("Value")!;
var evaluator = lazyValueProp.GetValue(lazy)!;
var funcField = evaluator.GetType().GetField(
"_func", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)!;
var del = (Delegate)funcField.GetValue(evaluator)!;
var weak = new WeakReference(del.Method.Module.Assembly);
engine.Dispose();
return weak;
}
}
@@ -39,7 +39,9 @@ public sealed class SnapshotFormatterTests
[InlineData(0x803B0000u, "BadNotWritable")]
[InlineData(0x803C0000u, "BadOutOfRange")]
[InlineData(0x803D0000u, "BadNotSupported")]
[InlineData(0x80550000u, "BadDeviceFailure")]
// Driver.Cli.Common-007: corrected from 0x80550000 (which is actually
// BadSecurityPolicyRejected) to the canonical OPC UA spec value 0x808B0000.
[InlineData(0x808B0000u, "BadDeviceFailure")]
[InlineData(0x80740000u, "BadTypeMismatch")]
[InlineData(0x40000000u, "Uncertain")]
public void FormatStatus_names_well_known_status_codes(uint status, string expectedName)
@@ -47,22 +49,6 @@ public sealed class SnapshotFormatterTests
SnapshotFormatter.FormatStatus(status).ShouldBe($"0x{status:X8} ({expectedName})");
}
[Theory]
// Driver.FOCAS.Cli-005 follow-up (Driver.Cli.Common): the FOCAS / AbCip / AbLegacy mappers
// emit these five codes — they previously rendered only as severity-class "Bad" because the
// shortlist did not name them, defeating the operator workflow that docs/Driver.*.Cli.md
// promise (read the BadDeviceFailure / BadNotWritable / ... name straight off probe/write
// output). Pin them named so the docs stay accurate.
[InlineData(0x80020000u, "BadInternalError")]
[InlineData(0x803B0000u, "BadNotWritable")]
[InlineData(0x803C0000u, "BadOutOfRange")]
[InlineData(0x803D0000u, "BadNotSupported")]
[InlineData(0x80550000u, "BadDeviceFailure")]
public void FormatStatus_names_native_driver_emitted_codes(uint status, string expectedName)
{
SnapshotFormatter.FormatStatus(status).ShouldContain($"({expectedName})");
}
[Theory]
// Regression for Driver.Cli.Common-001: these codes were previously mapped to the
// wrong names. The hex values below are what the buggy shortlist used; they must
@@ -71,6 +57,10 @@ public sealed class SnapshotFormatterTests
[InlineData(0x80070000u, "BadNoCommunication")] // was mislabelled BadNoCommunication
[InlineData(0x80080000u, "BadWaitingForInitialData")] // was mislabelled BadWaitingForInitialData
[InlineData(0x80350000u, "BadNodeIdInvalid")] // was mislabelled BadNodeIdInvalid
// Driver.Cli.Common-007: 0x80550000 is BadSecurityPolicyRejected per spec, not
// BadDeviceFailure (which is 0x808B0000). The buggy shortlist + the six native-
// protocol mappers all had it wrong; this row pins it as a regression guard.
[InlineData(0x80550000u, "BadDeviceFailure")]
public void FormatStatus_does_not_apply_pre_fix_wrong_names(uint status, string wrongName)
{
SnapshotFormatter.FormatStatus(status).ShouldNotContain(wrongName);
@@ -26,7 +26,11 @@ public sealed class AbCipReadSmokeTests
await fixture.InitializeAsync();
try
{
var deviceUri = $"ab://127.0.0.1:{fixture.Port}/1,0";
// Use fixture.Host (not hardcoded 127.0.0.1) so the docker-host migration
// (10.100.0.35) and AB_SERVER_ENDPOINT overrides reach this test too. The
// sibling Emulate tests already use the fixture's resolved endpoint; this
// smoke test was missed.
var deviceUri = $"ab://{fixture.Host}:{fixture.Port}/1,0";
var drv = new AbCipDriver(new AbCipDriverOptions
{
Devices = [new AbCipDeviceOptions(deviceUri, profile.Family)],
@@ -8,7 +8,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests;
/// Reachability probe for the <c>ab_server</c> Docker container (libplctag's CIP
/// simulator built via <c>Docker/Dockerfile</c>) or any real AB PLC the
/// <c>AB_SERVER_ENDPOINT</c> env var points at. Parses
/// <c>AB_SERVER_ENDPOINT</c> (default <c>localhost:44818</c>) + TCP-connects
/// <c>AB_SERVER_ENDPOINT</c> (default <c>10.100.0.35:44818</c> — the shared Docker host) + TCP-connects
/// once at fixture construction. Tests skip via <see cref="AbServerFactAttribute"/>
/// / <see cref="AbServerTheoryAttribute"/> when the port isn't live, so
/// <c>dotnet test</c> stays green on a fresh clone without Docker running.
@@ -28,7 +28,10 @@ public sealed class AbServerFixture : IAsyncLifetime
/// instantiate the fixture with the profile matching their compose-file service.</summary>
public AbServerProfile Profile { get; }
public string Host { get; } = "127.0.0.1";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's 127.0.0.1 on 2026-04-28 alongside the rest of the Docker-host move.
// Override via AB_SERVER_ENDPOINT to point at a real PLC or a locally-running container.
public string Host { get; } = "10.100.0.35";
public int Port { get; } = AbServerProfile.DefaultPort;
public AbServerFixture() : this(KnownProfiles.ControlLogix) { }
@@ -59,7 +62,7 @@ public sealed class AbServerFixture : IAsyncLifetime
TcpProbe(ResolveHost(), ResolvePort());
private static string ResolveHost() =>
Environment.GetEnvironmentVariable(EndpointEnvVar)?.Split(':', 2)[0] ?? "127.0.0.1";
Environment.GetEnvironmentVariable(EndpointEnvVar)?.Split(':', 2)[0] ?? "10.100.0.35";
private static int ResolvePort()
{
@@ -84,7 +87,7 @@ public sealed class AbServerFixture : IAsyncLifetime
/// <summary>
/// <c>[Fact]</c>-equivalent that skips when ab_server isn't reachable — accepts a
/// live Docker listener on <c>localhost:44818</c> or an <c>AB_SERVER_ENDPOINT</c>
/// live Docker listener on <c>10.100.0.35:44818</c> or an <c>AB_SERVER_ENDPOINT</c>
/// override pointing at a real PLC.
/// </summary>
public sealed class AbServerFactAttribute : FactAttribute
@@ -29,6 +29,10 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests.Emulate;
/// <para>Runs only when <c>AB_SERVER_PROFILE=emulate</c>. ab_server has no ALMD
/// instruction + no alarm subsystem, so this tier-gated class couldn't produce a
/// meaningful result against the default simulator.</para>
/// <para>The Emulate tier is hardware-gated (Rockwell per-seat license, Windows-only,
/// conflicts with Docker Desktop's WSL 2 backend) so a permanent skip in CI is expected
/// — see <c>docs/drivers/AbServer-Test-Fixture.md</c> §"Logix Emulate golden-box tier"
/// for the full rationale + the gap matrix this test closes.</para>
/// </remarks>
[Collection("AbServerEmulate")]
[Trait("Category", "Integration")]
@@ -27,6 +27,10 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests.Emulate;
/// <para>Runs only when <c>AB_SERVER_PROFILE=emulate</c>. With ab_server
/// (the default), skips cleanly — ab_server lacks UDT / Template Object emulation
/// so this wire-level test couldn't pass against it regardless.</para>
/// <para>The Emulate tier is hardware-gated (Rockwell per-seat license, Windows-only,
/// conflicts with Docker Desktop's WSL 2 backend) so a permanent skip in CI is expected
/// — see <c>docs/drivers/AbServer-Test-Fixture.md</c> §"Logix Emulate golden-box tier"
/// for the full rationale + the gap matrix this test closes.</para>
/// </remarks>
[Collection("AbServerEmulate")]
[Trait("Category", "Integration")]
@@ -18,7 +18,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests;
/// Env-var overrides:
/// <list type="bullet">
/// <item><c>AB_LEGACY_ENDPOINT</c> — <c>host:port</c> of the PCCC-mode simulator.
/// Defaults to <c>localhost:44818</c> (EtherNet/IP port; ab_server's PCCC
/// Defaults to <c>10.100.0.35:44818</c> — the shared Docker host (EtherNet/IP port; ab_server's PCCC
/// emulation exposes PCCC-over-CIP on the same port as CIP itself).</item>
/// <item><c>AB_LEGACY_CIP_PATH</c> — routing path appended to the <c>ab://host:port/</c>
/// URI. Defaults to <c>1,0</c> (port-1/slot-0 backplane), required by ab_server
@@ -50,7 +50,10 @@ public sealed class AbLegacyServerFixture : IAsyncLifetime
/// </summary>
public const string DefaultCipPath = "1,0";
public string Host { get; } = "127.0.0.1";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's 127.0.0.1 on 2026-04-28 alongside the rest of the Docker-host move.
// Override via AB_LEGACY_ENDPOINT to point at a real PLC or a locally-running container.
public string Host { get; } = "10.100.0.35";
public int Port { get; } = DefaultPort;
/// <summary>CIP routing path portion of the device URI (after the <c>/</c> separator).
@@ -105,7 +108,7 @@ public sealed class AbLegacyServerFixture : IAsyncLifetime
private static (string Host, int Port) ResolveEndpoint()
{
var raw = Environment.GetEnvironmentVariable(EndpointEnvVar);
if (raw is null) return ("127.0.0.1", DefaultPort);
if (raw is null) return ("10.100.0.35", DefaultPort);
var parts = raw.Split(':', 2);
var port = parts.Length == 2 && int.TryParse(parts[1], out var p) ? p : DefaultPort;
return (parts[0], port);
@@ -1,7 +1,7 @@
using System.Runtime.CompilerServices;
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,7 +1,7 @@
using System.Diagnostics.Metrics;
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,5 +1,5 @@
using System.Diagnostics;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Runtime.CompilerServices;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using Google.Protobuf;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -54,7 +54,12 @@ public sealed class AddressingGrammarTests
{
if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);
var tag = new ModbusTagDefinition("Tank", ModbusRegion.HoldingRegisters, 100, ModbusDataType.Float32,
// HR 200 lives in pymodbus's scratch range (`write: [200, 209]` per standard.json),
// which gives us two consecutive writable HRs for the Float32 round-trip. The earlier
// version used HR 100 — but standard.json declares HR 100 as a single-cell auto-
// incrementing register (`write: [100, 100]`) so the second register of the Float32
// write was rejected with Illegal Data Address.
var tag = new ModbusTagDefinition("Tank", ModbusRegion.HoldingRegisters, 200, ModbusDataType.Float32,
ByteOrder: ModbusByteOrder.WordSwap);
var drv = await NewDriverAsync(tag);
@@ -87,9 +92,13 @@ public sealed class AddressingGrammarTests
if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);
// Sanity check that the simulator accepts the larger PDU coalescing produces.
var t1 = new ModbusTagDefinition("T1", ModbusRegion.HoldingRegisters, 300, ModbusDataType.Int16);
var t2 = new ModbusTagDefinition("T2", ModbusRegion.HoldingRegisters, 302, ModbusDataType.Int16);
var t3 = new ModbusTagDefinition("T3", ModbusRegion.HoldingRegisters, 304, ModbusDataType.Int16);
// Using HR 200/202/204 in the scratch range (standard.json's `uint16: 200..209`),
// not 300/302/304 — pymodbus rejects reads outside the seeded uint16 list with
// Illegal Data Address (= BadOutOfRange). The coalescing semantic the test
// exercises is identical with the scratch addresses.
var t1 = new ModbusTagDefinition("T1", ModbusRegion.HoldingRegisters, 200, ModbusDataType.Int16);
var t2 = new ModbusTagDefinition("T2", ModbusRegion.HoldingRegisters, 202, ModbusDataType.Int16);
var t3 = new ModbusTagDefinition("T3", ModbusRegion.HoldingRegisters, 204, ModbusDataType.Int16);
var opts = new ModbusDriverOptions
{
Host = _sim.Host, Port = _sim.Port, Tags = [t1, t2, t3], MaxReadGap = 5,
@@ -48,6 +48,23 @@ service + starting another. The integration tests discriminate by a
separate `MODBUS_SIM_PROFILE` env var so they skip correctly when the
wrong profile is live.
### Profile coverage matrix
The two general-purpose profiles cover disjoint test sets. A full pass
of the integration suite requires running both — serially on a single
docker host (the `:5020` collision), or in parallel on two hosts.
| Job | Bring up | Env to set | Expected outcome |
|---|---|---|---|
| `modbus-standard` | `lmxopcua-fix up modbus standard` | unset `MODBUS_SIM_PROFILE` (or set to `standard`) | Standard round-trip + AddressingGrammar suites pass; `ExceptionInjectionTests` (32 rows) skip with `MODBUS_SIM_PROFILE != exception_injection`. |
| `modbus-exception` | `lmxopcua-fix up modbus exception_injection` | `MODBUS_SIM_PROFILE=exception_injection` | `ExceptionInjectionTests` (32 rows) pass against the per-`(fc,address)` rule set; standard-profile suites (round-trip, AddressingGrammar) skip. |
The DL205 / Mitsubishi / S7-1500 profiles are similar — each gates its
own quirks suite via `MODBUS_SIM_PROFILE=<profile>`. Tests that don't
need a specific profile (the basic round-trip set) run under any of
the three pymodbus-based profiles. The `exception_injection` profile
is the only one that runs `exception_injector.py` instead of pymodbus.
## Endpoint
- Default: `localhost:5020`
@@ -27,7 +27,7 @@ public sealed class ExceptionInjectionTests(ModbusSimulatorFixture sim)
private const uint StatusGood = 0u;
private const uint StatusBadOutOfRange = 0x803C0000u;
private const uint StatusBadNotSupported = 0x803D0000u;
private const uint StatusBadDeviceFailure = 0x80550000u;
private const uint StatusBadDeviceFailure = 0x808B0000u;
private const uint StatusBadCommunicationError = 0x80050000u;
private void SkipUnlessInjectorLive()
@@ -5,7 +5,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests;
/// <summary>
/// Reachability probe for a Modbus TCP simulator (pymodbus in Docker, see
/// <c>Docker/docker-compose.yml</c>) or a real PLC. Parses
/// <c>MODBUS_SIM_ENDPOINT</c> (default <c>localhost:5020</c> per PR 43) and TCP-connects once at
/// <c>MODBUS_SIM_ENDPOINT</c> (default <c>10.100.0.35:5020</c> — the shared Docker host) and TCP-connects once at
/// fixture construction. Each test checks <see cref="SkipReason"/> and calls
/// <c>Assert.Skip</c> when the endpoint was unreachable, so a dev box without a running
/// simulator still passes `dotnet test` cleanly — matches the Galaxy live-smoke pattern in
@@ -29,8 +29,11 @@ public sealed class ModbusSimulatorFixture : IAsyncDisposable
// PR 43: default port is 5020 (pymodbus convention) instead of 502 (Modbus standard).
// Picking 5020 sidesteps the privileged-port admin requirement on Windows + matches the
// port baked into the pymodbus simulator JSON profiles in Docker/profiles/. Override with
// MODBUS_SIM_ENDPOINT to point at a real PLC on its native port 502.
private const string DefaultEndpoint = "localhost:5020";
// MODBUS_SIM_ENDPOINT to point at a real PLC on its native port 502, or to a
// locally-running container if the shared host is unavailable.
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's localhost on 2026-04-28 alongside the rest of the Docker-host move.
private const string DefaultEndpoint = "10.100.0.35:5020";
private const string EndpointEnvVar = "MODBUS_SIM_ENDPOINT";
public string Host { get; }
@@ -18,9 +18,9 @@ public sealed class ModbusExceptionMapperTests
[InlineData((byte)0x01, 0x803D0000u)] // Illegal Function → BadNotSupported
[InlineData((byte)0x02, 0x803C0000u)] // Illegal Data Address → BadOutOfRange
[InlineData((byte)0x03, 0x803C0000u)] // Illegal Data Value → BadOutOfRange
[InlineData((byte)0x04, 0x80550000u)] // Server Failure → BadDeviceFailure
[InlineData((byte)0x05, 0x80550000u)] // Acknowledge (long op) → BadDeviceFailure
[InlineData((byte)0x06, 0x80550000u)] // Server Busy → BadDeviceFailure
[InlineData((byte)0x04, 0x808B0000u)] // Server Failure → BadDeviceFailure (Driver.Cli.Common-007)
[InlineData((byte)0x05, 0x808B0000u)] // Acknowledge (long op) → BadDeviceFailure
[InlineData((byte)0x06, 0x808B0000u)] // Server Busy → BadDeviceFailure
[InlineData((byte)0x0A, 0x80050000u)] // Gateway path unavailable → BadCommunicationError
[InlineData((byte)0x0B, 0x80050000u)] // Gateway target failed to respond → BadCommunicationError
[InlineData((byte)0xFF, 0x80020000u)] // Unknown code → BadInternalError fallback
@@ -61,7 +61,7 @@ public sealed class ModbusExceptionMapperTests
[new WriteRequest("T", (short)42)],
TestContext.Current.CancellationToken);
writes[0].StatusCode.ShouldBe(0x80550000u, "FC06 returning exception 04 (CPU in PROGRAM mode) maps to BadDeviceFailure");
writes[0].StatusCode.ShouldBe(0x808B0000u, "FC06 returning exception 04 (CPU in PROGRAM mode) maps to BadDeviceFailure");
}
private sealed class NonModbusFailureTransport : IModbusTransport
@@ -6,7 +6,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests;
/// Reachability probe for an <c>opc-plc</c> simulator (Microsoft Industrial IoT's
/// OPC UA PLC from <c>mcr.microsoft.com/iotedge/opc-plc</c>) or any real OPC UA
/// server the <c>OPCUA_SIM_ENDPOINT</c> env var points at. Parses
/// <c>OPCUA_SIM_ENDPOINT</c> (default <c>opc.tcp://localhost:50000</c>),
/// <c>OPCUA_SIM_ENDPOINT</c> (default <c>opc.tcp://10.100.0.35:50000</c> — the shared Docker host),
/// TCP-connects to the resolved host:port at collection init, and records a
/// <see cref="SkipReason"/> on failure. Tests call <c>Assert.Skip</c> on that, so
/// `dotnet test` stays green when Docker isn't running the simulator — mirrors the
@@ -32,7 +32,11 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests;
/// </remarks>
public sealed class OpcPlcFixture : IAsyncDisposable
{
private const string DefaultEndpoint = "opc.tcp://localhost:50000";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's localhost on 2026-04-28 alongside the rest of the Docker-host move.
// Override via OPCUA_SIM_ENDPOINT to point at a different host or a locally-running
// opc-plc instance.
private const string DefaultEndpoint = "opc.tcp://10.100.0.35:50000";
private const string EndpointEnvVar = "OPCUA_SIM_ENDPOINT";
/// <summary>Full <c>opc.tcp://host:port</c> URL the driver session should connect to.</summary>
@@ -5,7 +5,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.S7.IntegrationTests;
/// <summary>
/// Reachability probe for the python-snap7 simulator Docker container (see
/// <c>Docker/docker-compose.yml</c>) or a real S7 PLC. Parses <c>S7_SIM_ENDPOINT</c>
/// (default <c>localhost:1102</c>) + TCP-connects once at fixture construction.
/// (default <c>10.100.0.35:1102</c> — the shared Docker host) + TCP-connects once at fixture construction.
/// Tests check <see cref="SkipReason"/> + call <c>Assert.Skip</c> when unreachable, so
/// `dotnet test` stays green on a fresh box without the simulator installed —
/// mirrors the <c>ModbusSimulatorFixture</c> pattern.
@@ -35,7 +35,9 @@ public sealed class Snap7ServerFixture : IAsyncDisposable
{
// Default 1102 (non-privileged) matches Docker/server.py. Override with
// S7_SIM_ENDPOINT to point at a real PLC on its native 102.
private const string DefaultEndpoint = "localhost:1102";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's localhost on 2026-04-28 alongside the rest of the Docker-host move.
private const string DefaultEndpoint = "10.100.0.35:1102";
private const string EndpointEnvVar = "S7_SIM_ENDPOINT";
public string Host { get; }
@@ -6,6 +6,7 @@ using Microsoft.AspNetCore.Hosting;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.OtOpcUa.Admin.Hubs;
using ZB.MOM.WW.OtOpcUa.Admin.Security;
using ZB.MOM.WW.OtOpcUa.Configuration;
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
@@ -101,6 +102,15 @@ public sealed class AdminWebAppFactory : IAsyncDisposable
builder.Services.AddScoped<Admin.Services.DriverInstanceService>();
builder.Services.AddScoped<Admin.Services.DraftValidationService>();
// ClusterDetail.razor injects AdminHubConnectionFactory to drive the live-banner hub
// connection; the factory depends on HubTokenService, which in turn needs Data Protection.
// Without these the InteractiveServer circuit fails to instantiate the component and the
// page never advances past the "Loading…" placeholder — Playwright then times out
// waiting for any tab nav-link to appear. Mirrors Program.cs:35-36.
builder.Services.AddDataProtection();
builder.Services.AddSingleton<HubTokenService>();
builder.Services.AddScoped<Admin.Services.AdminHubConnectionFactory>();
_app = builder.Build();
_app.UseStaticFiles();
_app.UseRouting();
@@ -110,6 +110,66 @@ public sealed class GenerationRefreshHostedServiceTests : IDisposable
leases.OpenLeaseCount.ShouldBe(0, "IAsyncDisposable dispose must fire regardless of outcome");
}
// Bug #12 fix — verifies the previously-missing wiring: applies and heartbeats both
// emit sp_RegisterNodeGenerationApplied so Admin UI Fleet status + Redundancy LastSeenAt
// surface live state.
[Fact]
public async Task First_apply_reports_Applied_status_to_central_db()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var calls = new List<(long Gen, NodeApplyStatus Status, string? Error)>();
var service = NewService(coordinator, leases, currentGeneration: () => 42, registerCalls: calls);
await service.TickAsync(CancellationToken.None);
calls.Count.ShouldBe(1, "exactly one register call per apply window");
calls[0].Gen.ShouldBe(42);
calls[0].Status.ShouldBe(NodeApplyStatus.Applied);
calls[0].Error.ShouldBeNull();
}
[Fact]
public async Task No_change_tick_heartbeats_with_Applied_status()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var calls = new List<(long Gen, NodeApplyStatus Status, string? Error)>();
var service = NewService(coordinator, leases, currentGeneration: () => 42, registerCalls: calls);
await service.TickAsync(CancellationToken.None); // initial apply
await service.TickAsync(CancellationToken.None); // no-change heartbeat
await service.TickAsync(CancellationToken.None); // no-change heartbeat
calls.Count.ShouldBe(3, "one apply call + two heartbeat calls");
calls.ShouldAllBe(c => c.Gen == 42 && c.Status == NodeApplyStatus.Applied && c.Error == null);
}
[Fact]
public async Task Register_call_failure_does_not_break_apply_or_block_subsequent_ticks()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var registerCallCount = 0;
var service = new GenerationRefreshHostedService(
new NodeOptions { NodeId = "A", ClusterId = "c1", ConfigDbConnectionString = "unused" },
leases, coordinator, NullLogger<GenerationRefreshHostedService>.Instance,
tickInterval: TimeSpan.FromSeconds(1),
currentGenerationQuery: _ => Task.FromResult<long?>(42),
registerAppliedAsync: (gen, status, err, ct) =>
{
registerCallCount++;
throw new InvalidOperationException("simulated DB outage during register");
});
await service.TickAsync(CancellationToken.None); // apply succeeds, register throws
await service.TickAsync(CancellationToken.None); // heartbeat throws
registerCallCount.ShouldBe(2, "both register attempts must run");
service.LastAppliedGenerationId.ShouldBe(42, "register failure must not roll back the cursor");
}
// ---- fixture helpers ---------------------------------------------------
private async Task<RedundancyCoordinator> SeedCoordinatorAsync()
@@ -136,11 +196,15 @@ public sealed class GenerationRefreshHostedServiceTests : IDisposable
private static GenerationRefreshHostedService NewService(
RedundancyCoordinator coordinator,
ApplyLeaseRegistry leases,
Func<long?> currentGeneration) =>
Func<long?> currentGeneration,
List<(long Gen, NodeApplyStatus Status, string? Error)>? registerCalls = null) =>
new(new NodeOptions { NodeId = "A", ClusterId = "c1", ConfigDbConnectionString = "unused" },
leases, coordinator, NullLogger<GenerationRefreshHostedService>.Instance,
tickInterval: TimeSpan.FromSeconds(1),
currentGenerationQuery: _ => Task.FromResult(currentGeneration()));
currentGenerationQuery: _ => Task.FromResult(currentGeneration()),
registerAppliedAsync: registerCalls is null
? (_, _, _, _) => Task.CompletedTask
: (gen, status, err, _) => { registerCalls.Add((gen, status, err)); return Task.CompletedTask; });
private sealed class DbContextFactory(DbContextOptions<OtOpcUaConfigDbContext> options)
: IDbContextFactory<OtOpcUaConfigDbContext>