7 Commits

Author SHA1 Message Date
Joseph Doherty f23e368a74 fix(server, admin): wire sp_RegisterNodeGenerationApplied + overlay heartbeat onto ClusterNode
dbo.sp_RegisterNodeGenerationApplied was defined by the initial
StoredProcedures migration but had zero callers in src/. The server
polled sp_GetCurrentGenerationForCluster every 5s but never reported
back, so dbo.ClusterNodeGenerationState stayed empty for every node
and both the Admin UI Fleet status page ("No node state recorded")
and the cluster-detail Redundancy LastSeenAt indicator ("never
STALE") showed broken liveness forever.

Server side (GenerationRefreshHostedService):
* New testable seam: Func<long, NodeApplyStatus, string?, CT, Task>?
  registerAppliedAsync constructor parameter, defaulting to a real
  sp_RegisterNodeGenerationApplied call against the central DB.
* TickAsync now calls the proc at two points: after every successful
  apply with NodeApplyStatus.Applied, and on every no-change tick as
  a heartbeat (also Applied) so LastSeenAt stays fresh.
* Apply failures now wrap the lease + coordinator.RefreshAsync in a
  try/catch, report NodeApplyStatus.Failed with the exception message,
  and advance LastAppliedGenerationId regardless of outcome so we
  don't loop on the same broken apply every 5s.
* Register-call failures are best-effort (LogDebug heartbeat, LogWarning
  apply-report) — a transient DB outage during reporting must not
  crash the publisher or block the next apply.

Admin side (ClusterNodeService.ListByClusterAsync): the Redundancy tab
reads ClusterNode.LastSeenAt, but no current writer maintains that
column — the heartbeat goes to ClusterNodeGenerationState.LastSeenAt.
Overlay the GenerationState heartbeat onto the returned ClusterNode
rows when more recent, so IsStale + the Redundancy table column
reflect actual liveness without a schema change or new write path.

Tests: 3 new cases on GenerationRefreshHostedServiceTests verify
first-apply reports Applied, no-change ticks heartbeat with Applied,
and register-call failure does not roll back the cursor or block
subsequent ticks. All 8 GenerationRefresh tests pass.

Verified live on node-dev-a / cluster-dev: dbo.ClusterNodeGenerationState
now populated with CurrentGenerationId=1, LastAppliedStatus=Applied,
fresh LastSeenAt. Fleet status page shows the node (KPIs NODES 1 /
APPLIED 1 / STALE 0 / FAILED 0). Redundancy tab KPI STALE went 1\xe2\x86\x920 and
the row shows a real LAST SEEN timestamp. Bonus: FleetStatusHub
SignalR push now fires the cluster-page Live update banner on every
heartbeat because there are finally state changes to push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:59 -04:00
Joseph Doherty c8de58d6d3 fix(admin-ui): render published gen read-only on the 6 cluster-detail content tabs
The Equipment, UNS Structure, Namespaces, Drivers, Tags, and ACLs tabs
all rendered only an "Open a draft to edit" placeholder when no draft
was open — even when the cluster had a fully populated published
generation. docs/v2/admin-ui.md \xa7Cluster Detail describes these as
"read-only views of the published generation" with an "Edit in draft"
affordance; that view was never wired. The earlier code path also
correctly rendered nothing when the cluster had no published gen yet,
which was indistinguishable from the broken state.

Collapse the six per-tab conditions into one shared branch that threads
the published gen ID into the existing tab components when no draft
exists, wrapped in <fieldset disabled> so any Add/Edit button click in
the read-only state cannot silently mutate published rows even though
the tab components themselves don't yet honor an IsReadOnly flag.
Banner above the content explains the state. Surgical: zero changes to
the ~1500 LoC across the six tab components.

Verified live on cluster-dev gen 1: Drivers tab now shows the
cluster-dev-galaxy-main GalaxyMxGateway row read-only; Namespaces tab
shows cluster-dev-galaxy-ns SystemPlatform row; both with the read-only
banner and visibly disabled affordances.

Follow-up worth doing later: refactor each tab component to accept an
IsReadOnly parameter so the disabled-affordance UX is per-tab rather
than a blanket fieldset opacity wash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:32 -04:00
Joseph Doherty 8fe7c8bea6 refactor(driver-galaxy): switch to sibling-repo MxGateway client + drop vendored libs
The sibling mxaccessgw repo (clients/dotnet/) restored a proper client
library + contracts under the new ZB.MOM.WW.MxGateway namespace, so the
binary-vendoring stopgap from PR Driver.Galaxy-016 can unwind via plan #1
of libs/README.md.

- csproj: replace <Reference HintPath="libs\MxGateway.*.dll"> with a
  ProjectReference into ..\..\..\..\mxaccessgw\clients\dotnet  ZB.MOM.WW.MxGateway.Client\. The five backfill PackageReference shims
  (Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client, Polly.Core,
  Microsoft.Extensions.Logging.Abstractions) are now transitive again.
- Source: 'using MxGateway.X' -> 'using ZB.MOM.WW.MxGateway.X' across
  19 driver files + 14 test files. No fully-qualified MxGateway.* usages
  in code, so no behavioural changes — purely a using-prefix flip.
- libs/: deleted MxGateway.Client.dll, MxGateway.Contracts.dll, README.md
  (orphan after the unwind).

Verified: dotnet build clean (Release), all 245 Driver.Galaxy unit tests
pass, OtOpcUa service running with the new client DLL loaded
(opc.tcp://localhost:4840/OtOpcUa, no FileNotFound/TypeLoad/
MissingMethod in startup logs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 14:55:15 -04:00
Joseph Doherty c6082aa0b9 fix(admin-e2e): register missing DI services so ClusterDetail interactive circuit boots
UnsTabDragDropE2ETests were timing out at the 'UNS Structure' nav-link
locator because AdminWebAppFactory never registered AdminHubConnectionFactory
/ HubTokenService / DataProtection — ClusterDetail.razor's @inject threw at
circuit boot, so the page never advanced past the Loading placeholder. 2 → 3
pass after the registrations land. Also documents the Modbus standard-vs-
exception_injection coverage matrix in the fixture README + cross-references
docs/drivers/AbServer-Test-Fixture.md from each Emulate test so a developer
landing on a skipped test has a direct doc pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:07:17 -04:00
Joseph Doherty b1f3e09661 test(modbus, abcip): align failing integration tests with fixtures
Two real bugs uncovered by re-running with the new fixture defaults
pointing at the shared docker host. Both are test-side, not driver-side.

AbCip — Driver_reads_seeded_DInt_from_ab_server (4 parametrized rows):
  Hardcoded 'ab://127.0.0.1:{port}/1,0' in the deviceUri instead of
  the resolved fixture.Host. The new 10.100.0.35 default (and any
  AB_SERVER_ENDPOINT override) silently couldn't reach this test —
  the driver tried to connect to a non-existent localhost:44818 and
  returned BadCommunicationError on all 4 profile rows. The sibling
  Emulate tests already use the fixture's resolved endpoint; this
  smoke test was missed in the original migration.

  Fix: deviceUri = $"ab://{fixture.Host}:{fixture.Port}/1,0".

Modbus — Float32_With_CDAB_Roundtrips_Through_Wire:
  Test wrote a Float32 to HR 100 (2 consecutive registers: 100+101).
  standard.json's writable HR range declares [100,100] only — a
  single-cell auto-incrementing register, not a 2-register pair. The
  write to register 101 was rejected with Illegal Data Address
  (BadOutOfRange).

  Fix: moved the tag from HR 100 to HR 200 (in standard.json's
  '[200, 209]' scratch range — 10 consecutive writable HRs). The
  Float32+CDAB semantic the test exercises is unchanged.

Modbus — Block_Read_Coalescing_Reduces_PDU_Count_End_To_End:
  Test read HR 300, 302, 304 — outside both the writable ranges and
  the uint16 seed list. pymodbus rejects reads to unseeded HRs even
  though 'hr size' is 2048. BadOutOfRange on every read.

  Fix: moved the tags from 300/302/304 to 200/202/204 (within the
  scratch range). The non-contiguous coalescing semantic (3 tags
  inside a 5-register window with MaxReadGap=5) is preserved.

After this commit:
  - Modbus.IntegrationTests: 6/38 pass / 32 skip / 0 fail
    (was 4 pass / 32 skip / 2 fail; 32 skips are profile-gated
    ExceptionInjectionTests — they need MODBUS_SIM_PROFILE=
    exception_injection and a different container, intentional gating)
  - AbCip.IntegrationTests: 10/12 pass / 2 skip / 0 fail
    (was 6 pass / 2 skip / 4 fail; 2 skips are Emulate tests that
    need the fixture for separate scenarios)

No driver code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:45:57 -04:00
Joseph Doherty 49644fc7fd test(fixtures): migrate integration-test fixture defaults to 10.100.0.35
CLAUDE.md "Docker Workflow" claims (per the 2026-04-28 migration note)
that all fixture-class default endpoints were rewritten to target the
shared Docker host at 10.100.0.35. Audit during today's e2e run showed
the claim was incomplete — five fixture classes still defaulted to
localhost / 127.0.0.1, causing every fixture-touching integration test
to skip with "endpoint unreachable" on a fresh box that hadn't set
the override env vars.

Files corrected:

  - tests/.../Modbus.IntegrationTests/ModbusSimulatorFixture.cs
      DefaultEndpoint: localhost:5020 → 10.100.0.35:5020
  - tests/.../S7.IntegrationTests/Snap7ServerFixture.cs
      DefaultEndpoint: localhost:1102 → 10.100.0.35:1102
  - tests/.../OpcUaClient.IntegrationTests/OpcPlcFixture.cs
      DefaultEndpoint: opc.tcp://localhost:50000 → opc.tcp://10.100.0.35:50000
  - tests/.../AbCip.IntegrationTests/AbServerFixture.cs
      Host default + ResolveHost fallback: 127.0.0.1 → 10.100.0.35
  - tests/.../AbLegacy.IntegrationTests/AbLegacyServerFixture.cs
      Host default + ResolveEndpoint fallback: 127.0.0.1 → 10.100.0.35

XML doc comments referencing the old localhost defaults were updated in
the same pass so the class-summary documentation matches the actual
default. The override-via-env-var mechanism (MODBUS_SIM_ENDPOINT,
AB_SERVER_ENDPOINT, AB_LEGACY_ENDPOINT, S7_SIM_ENDPOINT,
OPCUA_SIM_ENDPOINT) is unchanged — pointing at a real PLC or a
locally-running container still works exactly as before.

Verification:
  - Solution-wide dotnet build: 0 errors.
  - S7.IntegrationTests: 3/3 pass without env-var override.
  - OpcUaClient.IntegrationTests: 3/3 pass without env-var override.
  - Modbus.IntegrationTests: 4/38 (same as the env-var-override run —
    the 2 failures + 32 skips are pre-existing fixture-profile
    mismatches unrelated to this fix).
  - AbCip.IntegrationTests / AbLegacy.IntegrationTests: same results
    as the env-var-override run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:32:23 -04:00
Joseph Doherty 3d982d9a65 docs: sync against recent code changes
Five doc-content updates after this session's code-review resolution
sweep. No code touched; pure documentation drift correction.

1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting):
   Refreshed the deployment description from "three cooperating
   processes (Server, Admin, Galaxy.Host)" to "two cooperating
   Windows services (Server, Admin)". The legacy x86 TopShelf
   Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy
   access now flows through the in-process Tier-A GalaxyDriver
   talking gRPC to the sibling mxaccessgw gateway. Also called out
   decision #30 (AddWindowsService replacing TopShelf) inline.

2. docs/VirtualTags.md:
   - Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting"
     replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp
     regular compiler — Core.Scripting-008 / -016 retired the
     CSharpScript/ScriptRunner path).
   - Line 39: orphan-thread leak description rewritten. The
     CSharp.Scripting-era "underlying ScriptRunner keeps running on
     its thread-pool thread until the Roslyn runtime returns" is no
     longer accurate — the new pipeline binds the script as a
     regular C# Func<> delegate, so the leak is now "synchronous
     CPU-bound work on a pool thread" (same operator-visible
     effect, different mechanism).

3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows
   service"):
   Annotated both the decision body and the decision-log table row
   with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the
   replacement architecture. The original reasoning is preserved as
   audit trail per the decision-log convention.

4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1:
   Added an Implementation note describing the
   Core.Scripting-008 / -016 supersession of the original
   CSharpScript pipeline. The historical record stays; the note
   points future readers at docs/VirtualTags.md "Compile cache"
   for the current contract.

5. docs/plans/alarms-over-gateway.md "Files" section under client
   regeneration:
   Updated the .NET regeneration instructions to point at the new
   ZB.MOM.WW.MxGateway.Contracts.csproj path. The old
   clients/dotnet/MxGateway.Client.csproj no longer exists in the
   sibling repo (restructure after this plan was written) and the
   vendored-binaries situation in
   src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out
   so a reader following the plan won't chase a deleted path.

Verification: grep against docs/ for the pre-fix wordings ("three
cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner",
the wrong BadDeviceFailure hex code 0x80550000) returns no hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:57:04 -04:00
57 changed files with 376 additions and 242 deletions
+2 -2
View File
@@ -6,7 +6,7 @@ The runtime is split across two projects: `Core.Scripting` holds the Roslyn sand
## Roslyn script sandbox (`Core.Scripting`)
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp.Scripting` against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp` (regular compiler, not the scripting variant — the original `CSharpScript` pipeline was retired by the Core.Scripting-008 / -016 rewrite, see "Compile cache" below). Each script's source is pasted as the body of a synthesized `CompiledScript.Run(ScriptGlobals<TContext>)` method against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
### Compile pipeline (`ScriptEvaluator<TContext, TResult>`)
@@ -36,7 +36,7 @@ Similarly, **`System.Threading.Tasks` is now denied** (Core.Scripting-003), whic
### Per-evaluation timeout (`TimedScriptEvaluator<TContext, TResult>`)
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying `ScriptRunner` keeps running on its thread-pool thread until the Roslyn runtime returns (documented trade-off; out-of-process evaluation is a v3 concern).
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying compiled-script delegate keeps running on its `Task.Run` thread-pool thread until it returns of its own accord (the CT is checked only at evaluator entry; once the script body is running, only the script returning or throwing will release the thread). The post-rewrite delegate is a regular C# `Func<>` bound to the synthesized `CompiledScript.Run` method, so this is a vanilla "synchronous CPU-bound work on a pool thread" leak rather than anything Roslyn-specific. Documented trade-off; out-of-process evaluation is a v3 concern.
### Script logger plumbing
+13 -3
View File
@@ -583,10 +583,20 @@ language binding.
**Depends on:** A.1 merged (proto change live).
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`):
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\src\` for .NET — note the sibling
repo restructured after this plan was written; `clients/dotnet/MxGateway.Client.csproj`
no longer exists, the proto contracts now live in
`src/ZB.MOM.WW.MxGateway.Contracts/` under the new namespace
`ZB.MOM.WW.MxGateway.Contracts.Proto[.Galaxy]`; the OtOpcUa driver currently
consumes vendored binaries from the pre-restructure build — see
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/README.md`):
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just
rebuild `MxGateway.Client.csproj` after pulling A.1.
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; rebuild
`src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj`
after pulling A.1. (If unwinding the driver's vendored binaries onto the
new contracts namespace as part of the alarm work, namespace-rename + a
reimplementation of the missing `MxGatewayClient` / `MxGatewaySession`
wrappers is also in scope.)
2. **Python** — run `clients\python\generate-proto.ps1`; commit the
regenerated `_pb2.py` + `_pb2_grpc.py` files under
`clients\python\src\`.
+3 -4
View File
@@ -1,6 +1,6 @@
# High-Level Requirements
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as three cooperating processes (Server, Admin, Galaxy.Host). The Galaxy integration is one of seven shipped drivers. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 are new and cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
> **Revision** — Refreshed 2026-05-23 for the OtOpcUa v2 multi-driver platform. The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as two cooperating processes (Server, Admin). The Galaxy integration is one of seven shipped drivers and is now an in-process Tier-A driver that talks gRPC to a separately installed `mxaccessgw` gateway (sibling repo) — PR 7.2 (2026-04-30) retired the legacy out-of-process `Galaxy.Host` Windows service. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
## HLR-001: OPC UA Server
@@ -28,11 +28,10 @@ Drivers whose backend has a native change signal (e.g. Galaxy's `time_of_last_de
## HLR-007: Service Hosting
The system shall be deployed as three cooperating Windows services:
The system shall be deployed as two cooperating Windows services (the legacy `OtOpcUa.Galaxy.Host` x86 host was retired in PR 7.2 — Galaxy access now flows through the separately installed `mxaccessgw` gateway, which lives in a sibling repository and is not part of the OtOpcUa deployment):
- **OtOpcUa.Server** — .NET 10 x64, `Microsoft.Extensions.Hosting` + `AddWindowsService`, hosts all non-Galaxy drivers in-process and the OPC UA endpoint.
- **OtOpcUa.Server** — .NET 10 AnyCPU, `Microsoft.Extensions.Hosting` + `AddWindowsService` (decision #30 replaced the original TopShelf choice), hosts every driver in-process — including the new Tier-A `GalaxyDriver` that speaks gRPC to `mxaccessgw` and the OPC UA endpoint.
- **OtOpcUa.Admin** — .NET 10 x64 Blazor Server web app, hosts the admin UI, SignalR hubs for live updates, `/metrics` Prometheus endpoint, and audit log writers.
- **OtOpcUa.Galaxy.Host** — .NET Framework 4.8 x86 (TopShelf), hosts MXAccess COM + Galaxy Repository SQL + Historian plugin. Talks to `Driver.Galaxy.Proxy` inside `OtOpcUa.Server` via a named pipe (MessagePack over length-prefixed frames, per-process shared secret, SID-restricted ACL).
## HLR-008: Logging
@@ -89,7 +89,7 @@ Tie-in capability — **historian alarm sink**:
### Stream A — `Core.Scripting` (Roslyn engine + sandbox + AST inference + logger) — **2 weeks**
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers.
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers. _(Implementation note 2026-05-23 — superseded by Core.Scripting-008 / -016: the `CSharpScript`/`ScriptRunner` path was replaced with a hand-rolled `CSharpCompilation.Create``Emit(MemoryStream)` → collectible `ScriptAssemblyLoadContext.LoadFromStream` pipeline so per-publish ALC accretion is reclaimable, and engines route compiles through `CompiledScriptCache` rather than calling `ScriptEvaluator.Compile` directly. The reference list was correspondingly widened from the narrow allow-list above to the full BCL `TRUSTED_PLATFORM_ASSEMBLIES` set (filtered to `System.*` + `netstandard` + `Microsoft.Win32.Registry`) because the new pipeline can't compile against the old narrow set; `ForbiddenTypeAnalyzer` is now the sole security gate, consistent with how Core.Scripting-001 / -002 established the analyzer must be the real boundary because type forwarding makes any references-list-only restriction porous. See `docs/VirtualTags.md` "Compile cache" for the current implementation contract.)_
2. **A.2** `DependencyExtractor : CSharpSyntaxWalker`. Visits every `InvocationExpressionSyntax` targeting `ctx.GetTag` / `ctx.SetVirtualTag`; accepts only a `LiteralExpressionSyntax` argument. Non-literal arguments (concat, variable, method call) → publish-time rejection with an actionable error pointing the operator at the exact span. Outputs `IReadOnlySet<string> Inputs` + `IReadOnlySet<string> Outputs`.
3. **A.3** Compile cache. `(source_hash) → compiled Script<T>`. Recompile only when source changes. Warm on `SealedBootstrap`.
4. **A.4** Per-evaluation timeout wrapper (default 250ms; configurable per tag). Timeout = tag quality `BadInternalError` + structured warning log. Keeps a single runaway script from starving the engine.
+2 -2
View File
@@ -193,7 +193,7 @@ ConfigurationService
- Compact binary format, faster than JSON, good fit for high-frequency data change callbacks
- Simpler than gRPC on .NET 4.8 (which needs legacy `Grpc.Core` native library)
**Decided: Galaxy Host is a separate Windows service.**
**Decided: Galaxy Host is a separate Windows service.** _(Reversed by PR 7.2, 2026-04-30 — see PR 7.2's commit `ae7106d` and the project_galaxy_via_mxgateway memory entry. The legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the `OtOpcUaGalaxyHost` Windows service were retired; Galaxy access now flows through the in-process Tier-A `GalaxyDriver` talking gRPC to a separately installed `mxaccessgw` gateway sibling repo. The reasoning below was correct for the original LMX/x86-COM architecture; the gateway sibling repo now owns those constraints externally.)_
- Independent lifecycle from the OtOpcUa Server
- Can be restarted without affecting the main server or other drivers
- Galaxy.Proxy detects connection loss, sets Bad quality on Galaxy nodes, reconnects when Host comes back
@@ -801,7 +801,7 @@ aggregate runner (#253); server-side factory + seed SQL per driver (#210#213)
| 26 | Admin deploys on same server (co-hosted) | Simplifies deployment; can also run on separate management host | 2026-04-16 |
| 27 | Admin scaffold early, driver-specific screens deferred | Core CRUD for instances/drivers first; per-driver config UI added with each driver | 2026-04-16 |
| 28 | Named pipes for Galaxy IPC | Fast, no port conflicts, native to both .NET 4.8 and .NET 10 | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 (**reversed PR 7.2, 2026-04-30** — Galaxy is now an in-process Tier-A driver talking gRPC to the sibling `mxaccessgw` gateway; see the decision body above) |
| 30 | Drop TopShelf, use Microsoft.Extensions.Hosting | Built-in Windows Service support in .NET 10, no third-party dependency | 2026-04-16 |
| 31 | Mono-repo for all drivers | Simpler dependency management, single CI pipeline, shared abstractions | 2026-04-16 |
| 32 | MessagePack serialization for Galaxy IPC | Binary, fast, works on .NET 4.8+ and .NET 10 via MessagePack-CSharp NuGet | 2026-04-16 |
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
@@ -1,7 +1,7 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
@@ -2,7 +2,7 @@ using System.Diagnostics.Metrics;
using System.Threading.Channels;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,7 +1,7 @@
using System.Diagnostics.Metrics;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,8 +1,8 @@
using System.Collections.Concurrent;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
// Use the generated nested status enum for the SetBufferedUpdateInterval reply check.
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using System.Runtime.CompilerServices;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -18,39 +18,15 @@
</ItemGroup>
<ItemGroup>
<!-- Vendored mxaccessgw .NET client. Originally consumed via path-based
ProjectReference to the sibling repo, but the sibling repo restructured
and the MxGateway.Client.csproj path no longer exists. The DLLs in
libs/ are the last known-good build (May 2026); they reference proto
types from MxGateway.Contracts.dll using the pre-restructure namespace
(MxGateway.Contracts.Proto). See libs/README.md for the unwinding plan
once the sibling repo restores a client library or we migrate to the
new ZB.MOM.WW.MxGateway.Contracts.Proto namespace. -->
<Reference Include="MxGateway.Client">
<HintPath>libs\MxGateway.Client.dll</HintPath>
<Private>true</Private>
</Reference>
<Reference Include="MxGateway.Contracts">
<HintPath>libs\MxGateway.Contracts.dll</HintPath>
<Private>true</Private>
</Reference>
</ItemGroup>
<ItemGroup>
<!-- Transitive deps the vendored MxGateway.Client.dll was actually built
against (verified by reflecting GetReferencedAssemblies on the DLL —
see libs/README.md). Versions align with the sibling mxaccessgw repo's
current Server / Worker projects so binary-compat stays close to what
the team uses elsewhere. Pre-Driver.Galaxy-016 the csproj declared
`Polly` (the v7 API) instead of `Polly.Core` (the v8 API the DLL was
built against) — a package-name mistake, not just a version skew —
which would surface as a runtime MissingMethodException the first
time the client's retry pipeline ran. -->
<PackageReference Include="Google.Protobuf" Version="3.34.1" />
<PackageReference Include="Grpc.Core.Api" Version="2.76.0" />
<PackageReference Include="Grpc.Net.Client" Version="2.76.0" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.7" />
<PackageReference Include="Polly.Core" Version="8.6.6" />
<!-- Sibling mxaccessgw repo's .NET client + contracts. The sibling restored
a proper client library under clients/dotnet/ (May 2026), so this is
back on a path-based ProjectReference per the libs/README unwind plan #1.
Both projects target net10.0; the Contracts project transitively pulls
Google.Protobuf + Grpc.Core.Api, the Client project transitively pulls
Grpc.Net.Client + Polly.Core + Microsoft.Extensions.Logging.Abstractions,
so the explicit PackageReference shims that backfilled the vendored
binary references are no longer needed. -->
<ProjectReference Include="..\..\..\..\mxaccessgw\clients\dotnet\ZB.MOM.WW.MxGateway.Client\ZB.MOM.WW.MxGateway.Client.csproj"/>
</ItemGroup>
<ItemGroup>
@@ -1,101 +0,0 @@
# Vendored MxGateway client DLLs
This directory holds binary copies of `MxGateway.Client.dll` and
`MxGateway.Contracts.dll` from the sibling `mxaccessgw` repo's last known-good
build (May 2026). The DLLs are referenced from the driver's csproj as
`<Reference HintPath="…" />` items rather than `ProjectReference`.
## Provenance
Both DLLs are built from this team's own `mxaccessgw` source tree — they are
not third-party binaries. The build commit + checksums below are recorded so
future readers can verify the artefacts match the expected source without
needing to ask the original author.
| File | Source commit | SHA-256 |
|---|---|---|
| `MxGateway.Client.dll` | `dd7ca1634e2d2b8a866c81f0009bf87ee9427750` (mxaccessgw repo, pre-restructure) | `3507f770adc8c1b27b2fc4645079c6e4e02d5c65b9545c12d637cd2a080a00bd` |
| `MxGateway.Contracts.dll` | `dd7ca1634e2d2b8a866c81f0009bf87ee9427750` (mxaccessgw repo, pre-restructure) | `437dc6cb6994c7c4d858c82f69af890732c7ffbfa0463fbd8a63ce7930d251b4` |
The build commit is the same for both DLLs and is embedded as
`AssemblyInformationalVersion` inside each binary — re-verify by running:
`ilspycmd <dll> | grep AssemblyInformationalVersion`.
To re-verify the checksums (e.g. after a clone):
```bash
sha256sum libs/MxGateway.Client.dll libs/MxGateway.Contracts.dll
```
If either SHA-256 or the embedded source commit no longer matches what's
listed above, the artefact has been replaced — verify before trusting.
## Why vendored
The sibling `mxaccessgw` repo restructured: the `clients/dotnet/MxGateway.Client`
project the driver previously referenced via path-based `ProjectReference` no
longer exists, and the proto contracts moved from the `MxGateway.Contracts.Proto`
namespace to `ZB.MOM.WW.MxGateway.Contracts.Proto`. The driver's source still
expects the pre-restructure namespace, so re-pointing at the new contracts would
require a global namespace rename across ~19 driver files PLUS reimplementing
the `MxGatewayClient` / `MxGatewaySession` / `GalaxyRepositoryClient` types the
old client library provided (the sibling repo dropped the client library
entirely, keeping only the contracts).
Vendoring the binaries unblocked the build in minutes instead of hours, freezes
the gateway contract surface at a known-good version, and preserves the option
to migrate properly later without an emergency rewrite.
## What's vendored
| File | Built against |
|---|---|
| `MxGateway.Client.dll` | net10.0, references `MxGateway.Contracts.dll` |
| `MxGateway.Contracts.dll` | net10.0, proto namespace `MxGateway.Contracts.Proto[.Galaxy]` |
The NuGet packages the vendored DLLs reference (verified by reflecting
`Assembly.GetReferencedAssemblies()` against `MxGateway.Client.dll`) are
declared as direct `PackageReference` in the driver csproj — when the dropped
`ProjectReference` was in place those packages were transitively provided;
with binary references the consumer must declare them explicitly:
| Package | Reason |
|---|---|
| `Google.Protobuf` 3.34.1 | Proto message types in `MxGateway.Contracts.dll` |
| `Grpc.Core.Api` 2.76.0 | Base gRPC client types in `MxGateway.Client.dll` |
| `Grpc.Net.Client` 2.76.0 | HTTP/2 transport used by `MxGatewayClient` |
| `Microsoft.Extensions.Logging.Abstractions` 10.0.7 | `ILogger` used by the client |
| `Polly.Core` 8.6.6 | Retry pipeline used by `MxGatewayClient` |
Versions match the sibling mxaccessgw repo's current Server / Worker
projects (`ZB.MOM.WW.MxGateway.Server.csproj`,
`ZB.MOM.WW.MxGateway.Worker.csproj`) so the runtime versions stay close to
what the gateway team uses. The pre-Driver.Galaxy-016 declarations were
incorrect — most visibly `Polly 8.5.2` was declared where the DLL actually
needs `Polly.Core` (a different package: `Polly` v7 is the older fluent API;
`Polly.Core` v8 is the modern resilience-pipeline API the gateway client was
built against). A `Polly` reference would have failed at runtime with
`MissingMethodException` the first time a retry pipeline ran.
## Decompiled-source archive
The vendored DLLs are byte-for-byte the build output. The full source can be
recovered with `ilspycmd MxGateway.Client.dll > MxGateway.Client.cs` if a code
review or audit needs it.
## How to unwind
Either path closes the vendored-binary debt:
1. **Sibling repo restores `MxGateway.Client.csproj`** (or publishes a NuGet
package). Switch the csproj back to a `ProjectReference` / `PackageReference`,
delete this directory.
2. **Driver migrates to the new `ZB.MOM.WW.MxGateway.Contracts.Proto`
namespace.** Global namespace rename across the ~19 consuming source files,
plus re-implementing `MxGatewayClient` / `MxGatewaySession` /
`GalaxyRepositoryClient` (≈2,200 LoC of behavioural client code) either
inlined into this driver or as a fresh sibling library. Delete this
directory.
Either way: when unwinding, also drop the five `PackageReference` lines added
to the csproj alongside the `<Reference>` items — the new ProjectReference /
PackageReference will provide them transitively again.
@@ -101,29 +101,44 @@ else
{
<Generations ClusterId="@ClusterId"/>
}
else if (_tab == "equipment" && _currentDraft is not null)
else if (_tab is "equipment" or "uns" or "namespaces" or "drivers" or "tags" or "acls")
{
<EquipmentTab GenerationId="@_currentDraft.GenerationId"/>
}
else if (_tab == "uns" && _currentDraft is not null)
{
<UnsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "namespaces" && _currentDraft is not null)
{
<NamespacesTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "drivers" && _currentDraft is not null)
{
<DriversTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "tags" && _currentDraft is not null)
{
<TagsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "acls" && _currentDraft is not null)
{
<AclsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
@* Bug #10 fix — these six tabs are scoped to a generation. Per docs/v2/admin-ui.md the
design intent is a read-only view of the published generation when no draft is open
("Edit in draft" affordance), and the editable view of the draft when one is open.
The earlier implementation rendered nothing in the no-draft case, leaving operators
with just the "Open a draft to edit" placeholder. We now route both states through
the same tab components, gating edits via <fieldset disabled> so a button click in
the read-only state cannot silently mutate the published rows even though the tab
components themselves haven't been refactored to honor an IsReadOnly flag yet. *@
var genId = _currentDraft?.GenerationId ?? _currentPublished?.GenerationId;
var isReadOnly = _currentDraft is null;
if (genId is null)
{
<section class="panel notice rise" style="animation-delay:.02s">
No published generation yet. Click <strong>New draft</strong> above to author this cluster's first generation.
</section>
}
else
{
if (isReadOnly)
{
<section class="panel notice rise mb-3" style="animation-delay:.02s">
<strong>Read-only view</strong> of published generation @genId. Click <strong>New draft</strong> above to make changes.
</section>
}
<fieldset disabled="@isReadOnly" style="border:0;padding:0;margin:0;min-width:0;">
@switch (_tab)
{
case "equipment": <EquipmentTab GenerationId="@genId.Value"/> break;
case "uns": <UnsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "namespaces": <NamespacesTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "drivers": <DriversTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "tags": <TagsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "acls": <AclsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
}
</fieldset>
}
}
else if (_tab == "redundancy")
{
@@ -133,10 +148,6 @@ else
{
<AuditTab ClusterId="@ClusterId"/>
}
else
{
<section class="panel notice rise" style="animation-delay:.02s">Open a draft to edit this cluster's content.</section>
}
}
@code {
@@ -16,12 +16,40 @@ public sealed class ClusterNodeService(OtOpcUaConfigDbContext db)
/// tolerance covers a missed heartbeat plus publisher GC pauses.</summary>
public static readonly TimeSpan StaleThreshold = TimeSpan.FromSeconds(30);
public Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct) =>
db.ClusterNodes.AsNoTracking()
public async Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct)
{
var nodes = await db.ClusterNodes.AsNoTracking()
.Where(n => n.ClusterId == clusterId)
.OrderByDescending(n => n.ServiceLevelBase)
.ThenBy(n => n.NodeId)
.ToListAsync(ct);
.ToListAsync(ct).ConfigureAwait(false);
// Bug #12 fix follow-up — the live-node heartbeat lands on
// ClusterNodeGenerationState.LastSeenAt (written by sp_RegisterNodeGenerationApplied
// on every generation poll). The ClusterNode.LastSeenAt column is a legacy slot that
// no current writer maintains, so reading it directly would show "never STALE"
// forever for every running node. Overlay the GenerationState heartbeat onto the
// returned ClusterNode rows when it's more recent so the Redundancy tab + IsStale
// predicate reflect actual liveness without needing a new write path or schema change.
var nodeIds = nodes.Select(n => n.NodeId).ToList();
if (nodeIds.Count > 0)
{
var heartbeats = await db.ClusterNodeGenerationStates.AsNoTracking()
.Where(s => nodeIds.Contains(s.NodeId))
.Select(s => new { s.NodeId, s.LastSeenAt })
.ToListAsync(ct).ConfigureAwait(false);
var beatByNode = heartbeats.ToDictionary(s => s.NodeId, s => s.LastSeenAt);
foreach (var n in nodes)
{
if (beatByNode.TryGetValue(n.NodeId, out var hb) && hb is not null
&& (n.LastSeenAt is null || hb > n.LastSeenAt))
{
n.LastSeenAt = hb;
}
}
}
return nodes;
}
public static bool IsStale(ClusterNode node) =>
node.LastSeenAt is null || DateTime.UtcNow - node.LastSeenAt.Value > StaleThreshold;
@@ -1,6 +1,7 @@
using Microsoft.Data.SqlClient;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
using ZB.MOM.WW.OtOpcUa.Server.Redundancy;
namespace ZB.MOM.WW.OtOpcUa.Server.Hosting;
@@ -42,10 +43,20 @@ public sealed class GenerationRefreshHostedService(
RedundancyCoordinator coordinator,
ILogger<GenerationRefreshHostedService> logger,
TimeSpan? tickInterval = null,
Func<CancellationToken, Task<long?>>? currentGenerationQuery = null) : BackgroundService
Func<CancellationToken, Task<long?>>? currentGenerationQuery = null,
Func<long, NodeApplyStatus, string?, CancellationToken, Task>? registerAppliedAsync = null) : BackgroundService
{
private readonly Func<CancellationToken, Task<long?>> _generationQuery = currentGenerationQuery
?? new Func<CancellationToken, Task<long?>>(ct => DefaultQueryCurrentGenerationAsync(options, logger, ct));
// Bug #12 fix — the server now reports applied-generation state + heartbeat back to the
// central DB via sp_RegisterNodeGenerationApplied. Before this wiring the proc had zero
// callers, so dbo.ClusterNodeGenerationState stayed empty for every node and the Admin UI
// Fleet status page + cluster-detail Redundancy LastSeenAt both showed "no node state /
// never STALE" indefinitely. Tests inject a stub via the registerAppliedAsync parameter.
private readonly Func<long, NodeApplyStatus, string?, CancellationToken, Task> _registerApplied = registerAppliedAsync
?? new Func<long, NodeApplyStatus, string?, CancellationToken, Task>(
(gen, status, err, ct) => DefaultRegisterAppliedAsync(options, logger, gen, status, err, ct));
/// <summary>
/// How often the service polls <c>sp_GetCurrentGenerationForCluster</c>. Default 5 s —
/// low enough that operator publishes take effect promptly, high enough that the
@@ -97,6 +108,18 @@ public sealed class GenerationRefreshHostedService(
if (LastAppliedGenerationId is long last && current == last)
{
// Heartbeat — re-stamps LastSeenAt on dbo.ClusterNodeGenerationState so the Admin
// Fleet status page + cluster Redundancy tab can detect the node is alive without
// a generation change. Best-effort: a transient DB error here must not throw out of
// the tick (the next tick will retry) and must not block applies.
try
{
await _registerApplied(current.Value, NodeApplyStatus.Applied, null, cancellationToken).ConfigureAwait(false);
}
catch (Exception hbEx) when (hbEx is not OperationCanceledException)
{
logger.LogDebug(hbEx, "Heartbeat to sp_RegisterNodeGenerationApplied failed; will retry next tick");
}
return; // no change
}
@@ -109,14 +132,44 @@ public sealed class GenerationRefreshHostedService(
// lease is open. Publisher ticks in parallel (1s cadence) will observe the band
// transition and push it onto the OPC UA Server.ServiceLevel node.
var publishRequestId = Guid.NewGuid();
await using (leases.BeginApplyLease(current.Value, publishRequestId))
NodeApplyStatus applyStatus;
string? applyError = null;
try
{
await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false);
// Future: fire a domain event that driver hosts / virtual-tag engine /
// scripted-alarm engine subscribe to. For now the topology refresh is the
// only thing we rewire — everything else still requires a process restart.
await using (leases.BeginApplyLease(current.Value, publishRequestId))
{
await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false);
// Future: fire a domain event that driver hosts / virtual-tag engine /
// scripted-alarm engine subscribe to. For now the topology refresh is the
// only thing we rewire — everything else still requires a process restart.
}
applyStatus = NodeApplyStatus.Applied;
}
catch (Exception applyEx) when (applyEx is not OperationCanceledException)
{
applyStatus = NodeApplyStatus.Failed;
applyError = applyEx.Message;
logger.LogError(applyEx, "Apply of generation {Generation} failed; will report Failed status to central DB", current);
// fall through to register so operators see the failed apply in /fleet
}
// Always tell the central DB what happened with this apply attempt — success or
// failure. The proc upserts dbo.ClusterNodeGenerationState (CurrentGenerationId +
// LastAppliedAt + LastAppliedStatus + LastAppliedError + LastSeenAt). Failure here
// mustn't prevent us from advancing LastAppliedGenerationId — the apply already
// happened (or already failed); the publish is purely observability.
try
{
await _registerApplied(current.Value, applyStatus, applyError, cancellationToken).ConfigureAwait(false);
}
catch (Exception regEx) when (regEx is not OperationCanceledException)
{
logger.LogWarning(regEx, "sp_RegisterNodeGenerationApplied call failed for gen {Generation} status {Status}", current, applyStatus);
}
// Advance the cursor even on Failed — the proc has been told; next tick will heartbeat
// and a future generation will trigger a fresh apply attempt. Pinning the cursor on
// failure would loop us through the same broken apply every 5s.
LastAppliedGenerationId = current;
RefreshCount++;
}
@@ -157,4 +210,35 @@ public sealed class GenerationRefreshHostedService(
return null;
}
}
/// <summary>
/// Default register-applied implementation — calls <c>sp_RegisterNodeGenerationApplied</c>
/// to MERGE-upsert <see cref="ZB.MOM.WW.OtOpcUa.Configuration.Entities.ClusterNodeGenerationState"/>
/// for this node. Called both at apply completion (success or failure) and on every
/// no-change heartbeat tick so <c>LastSeenAt</c> stays fresh in the central DB and the
/// Admin UI Fleet status page + Redundancy LastSeenAt indicator can detect a healthy node.
/// Bug #12 fix — wires the previously-orphaned proc into the apply loop.
/// </summary>
private static async Task DefaultRegisterAppliedAsync(
NodeOptions options,
ILogger logger,
long generationId,
NodeApplyStatus status,
string? error,
CancellationToken cancellationToken)
{
await using var conn = new SqlConnection(options.ConfigDbConnectionString);
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
await using var cmd = conn.CreateCommand();
cmd.CommandText = "EXEC dbo.sp_RegisterNodeGenerationApplied @NodeId=@n, @GenerationId=@g, @Status=@s, @Error=@e";
cmd.Parameters.AddWithValue("@n", options.NodeId);
cmd.Parameters.AddWithValue("@g", generationId);
cmd.Parameters.AddWithValue("@s", status.ToString());
cmd.Parameters.AddWithValue("@e", (object?)error ?? DBNull.Value);
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
// Single-line trace so soak runs can see heartbeat ticks without flooding at Info.
logger.LogTrace("Reported gen {Generation} status {Status} to central DB", generationId, status);
}
}
@@ -26,7 +26,11 @@ public sealed class AbCipReadSmokeTests
await fixture.InitializeAsync();
try
{
var deviceUri = $"ab://127.0.0.1:{fixture.Port}/1,0";
// Use fixture.Host (not hardcoded 127.0.0.1) so the docker-host migration
// (10.100.0.35) and AB_SERVER_ENDPOINT overrides reach this test too. The
// sibling Emulate tests already use the fixture's resolved endpoint; this
// smoke test was missed.
var deviceUri = $"ab://{fixture.Host}:{fixture.Port}/1,0";
var drv = new AbCipDriver(new AbCipDriverOptions
{
Devices = [new AbCipDeviceOptions(deviceUri, profile.Family)],
@@ -8,7 +8,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests;
/// Reachability probe for the <c>ab_server</c> Docker container (libplctag's CIP
/// simulator built via <c>Docker/Dockerfile</c>) or any real AB PLC the
/// <c>AB_SERVER_ENDPOINT</c> env var points at. Parses
/// <c>AB_SERVER_ENDPOINT</c> (default <c>localhost:44818</c>) + TCP-connects
/// <c>AB_SERVER_ENDPOINT</c> (default <c>10.100.0.35:44818</c> — the shared Docker host) + TCP-connects
/// once at fixture construction. Tests skip via <see cref="AbServerFactAttribute"/>
/// / <see cref="AbServerTheoryAttribute"/> when the port isn't live, so
/// <c>dotnet test</c> stays green on a fresh clone without Docker running.
@@ -28,7 +28,10 @@ public sealed class AbServerFixture : IAsyncLifetime
/// instantiate the fixture with the profile matching their compose-file service.</summary>
public AbServerProfile Profile { get; }
public string Host { get; } = "127.0.0.1";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's 127.0.0.1 on 2026-04-28 alongside the rest of the Docker-host move.
// Override via AB_SERVER_ENDPOINT to point at a real PLC or a locally-running container.
public string Host { get; } = "10.100.0.35";
public int Port { get; } = AbServerProfile.DefaultPort;
public AbServerFixture() : this(KnownProfiles.ControlLogix) { }
@@ -59,7 +62,7 @@ public sealed class AbServerFixture : IAsyncLifetime
TcpProbe(ResolveHost(), ResolvePort());
private static string ResolveHost() =>
Environment.GetEnvironmentVariable(EndpointEnvVar)?.Split(':', 2)[0] ?? "127.0.0.1";
Environment.GetEnvironmentVariable(EndpointEnvVar)?.Split(':', 2)[0] ?? "10.100.0.35";
private static int ResolvePort()
{
@@ -84,7 +87,7 @@ public sealed class AbServerFixture : IAsyncLifetime
/// <summary>
/// <c>[Fact]</c>-equivalent that skips when ab_server isn't reachable — accepts a
/// live Docker listener on <c>localhost:44818</c> or an <c>AB_SERVER_ENDPOINT</c>
/// live Docker listener on <c>10.100.0.35:44818</c> or an <c>AB_SERVER_ENDPOINT</c>
/// override pointing at a real PLC.
/// </summary>
public sealed class AbServerFactAttribute : FactAttribute
@@ -29,6 +29,10 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests.Emulate;
/// <para>Runs only when <c>AB_SERVER_PROFILE=emulate</c>. ab_server has no ALMD
/// instruction + no alarm subsystem, so this tier-gated class couldn't produce a
/// meaningful result against the default simulator.</para>
/// <para>The Emulate tier is hardware-gated (Rockwell per-seat license, Windows-only,
/// conflicts with Docker Desktop's WSL 2 backend) so a permanent skip in CI is expected
/// — see <c>docs/drivers/AbServer-Test-Fixture.md</c> §"Logix Emulate golden-box tier"
/// for the full rationale + the gap matrix this test closes.</para>
/// </remarks>
[Collection("AbServerEmulate")]
[Trait("Category", "Integration")]
@@ -27,6 +27,10 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests.Emulate;
/// <para>Runs only when <c>AB_SERVER_PROFILE=emulate</c>. With ab_server
/// (the default), skips cleanly — ab_server lacks UDT / Template Object emulation
/// so this wire-level test couldn't pass against it regardless.</para>
/// <para>The Emulate tier is hardware-gated (Rockwell per-seat license, Windows-only,
/// conflicts with Docker Desktop's WSL 2 backend) so a permanent skip in CI is expected
/// — see <c>docs/drivers/AbServer-Test-Fixture.md</c> §"Logix Emulate golden-box tier"
/// for the full rationale + the gap matrix this test closes.</para>
/// </remarks>
[Collection("AbServerEmulate")]
[Trait("Category", "Integration")]
@@ -18,7 +18,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests;
/// Env-var overrides:
/// <list type="bullet">
/// <item><c>AB_LEGACY_ENDPOINT</c> — <c>host:port</c> of the PCCC-mode simulator.
/// Defaults to <c>localhost:44818</c> (EtherNet/IP port; ab_server's PCCC
/// Defaults to <c>10.100.0.35:44818</c> — the shared Docker host (EtherNet/IP port; ab_server's PCCC
/// emulation exposes PCCC-over-CIP on the same port as CIP itself).</item>
/// <item><c>AB_LEGACY_CIP_PATH</c> — routing path appended to the <c>ab://host:port/</c>
/// URI. Defaults to <c>1,0</c> (port-1/slot-0 backplane), required by ab_server
@@ -50,7 +50,10 @@ public sealed class AbLegacyServerFixture : IAsyncLifetime
/// </summary>
public const string DefaultCipPath = "1,0";
public string Host { get; } = "127.0.0.1";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's 127.0.0.1 on 2026-04-28 alongside the rest of the Docker-host move.
// Override via AB_LEGACY_ENDPOINT to point at a real PLC or a locally-running container.
public string Host { get; } = "10.100.0.35";
public int Port { get; } = DefaultPort;
/// <summary>CIP routing path portion of the device URI (after the <c>/</c> separator).
@@ -105,7 +108,7 @@ public sealed class AbLegacyServerFixture : IAsyncLifetime
private static (string Host, int Port) ResolveEndpoint()
{
var raw = Environment.GetEnvironmentVariable(EndpointEnvVar);
if (raw is null) return ("127.0.0.1", DefaultPort);
if (raw is null) return ("10.100.0.35", DefaultPort);
var parts = raw.Split(':', 2);
var port = parts.Length == 2 && int.TryParse(parts[1], out var p) ? p : DefaultPort;
return (parts[0], port);
@@ -1,7 +1,7 @@
using System.Runtime.CompilerServices;
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,7 +1,7 @@
using System.Diagnostics.Metrics;
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,5 +1,5 @@
using System.Diagnostics;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using System.Runtime.CompilerServices;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -1,6 +1,6 @@
using Google.Protobuf;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -54,7 +54,12 @@ public sealed class AddressingGrammarTests
{
if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);
var tag = new ModbusTagDefinition("Tank", ModbusRegion.HoldingRegisters, 100, ModbusDataType.Float32,
// HR 200 lives in pymodbus's scratch range (`write: [200, 209]` per standard.json),
// which gives us two consecutive writable HRs for the Float32 round-trip. The earlier
// version used HR 100 — but standard.json declares HR 100 as a single-cell auto-
// incrementing register (`write: [100, 100]`) so the second register of the Float32
// write was rejected with Illegal Data Address.
var tag = new ModbusTagDefinition("Tank", ModbusRegion.HoldingRegisters, 200, ModbusDataType.Float32,
ByteOrder: ModbusByteOrder.WordSwap);
var drv = await NewDriverAsync(tag);
@@ -87,9 +92,13 @@ public sealed class AddressingGrammarTests
if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);
// Sanity check that the simulator accepts the larger PDU coalescing produces.
var t1 = new ModbusTagDefinition("T1", ModbusRegion.HoldingRegisters, 300, ModbusDataType.Int16);
var t2 = new ModbusTagDefinition("T2", ModbusRegion.HoldingRegisters, 302, ModbusDataType.Int16);
var t3 = new ModbusTagDefinition("T3", ModbusRegion.HoldingRegisters, 304, ModbusDataType.Int16);
// Using HR 200/202/204 in the scratch range (standard.json's `uint16: 200..209`),
// not 300/302/304 — pymodbus rejects reads outside the seeded uint16 list with
// Illegal Data Address (= BadOutOfRange). The coalescing semantic the test
// exercises is identical with the scratch addresses.
var t1 = new ModbusTagDefinition("T1", ModbusRegion.HoldingRegisters, 200, ModbusDataType.Int16);
var t2 = new ModbusTagDefinition("T2", ModbusRegion.HoldingRegisters, 202, ModbusDataType.Int16);
var t3 = new ModbusTagDefinition("T3", ModbusRegion.HoldingRegisters, 204, ModbusDataType.Int16);
var opts = new ModbusDriverOptions
{
Host = _sim.Host, Port = _sim.Port, Tags = [t1, t2, t3], MaxReadGap = 5,
@@ -48,6 +48,23 @@ service + starting another. The integration tests discriminate by a
separate `MODBUS_SIM_PROFILE` env var so they skip correctly when the
wrong profile is live.
### Profile coverage matrix
The two general-purpose profiles cover disjoint test sets. A full pass
of the integration suite requires running both — serially on a single
docker host (the `:5020` collision), or in parallel on two hosts.
| Job | Bring up | Env to set | Expected outcome |
|---|---|---|---|
| `modbus-standard` | `lmxopcua-fix up modbus standard` | unset `MODBUS_SIM_PROFILE` (or set to `standard`) | Standard round-trip + AddressingGrammar suites pass; `ExceptionInjectionTests` (32 rows) skip with `MODBUS_SIM_PROFILE != exception_injection`. |
| `modbus-exception` | `lmxopcua-fix up modbus exception_injection` | `MODBUS_SIM_PROFILE=exception_injection` | `ExceptionInjectionTests` (32 rows) pass against the per-`(fc,address)` rule set; standard-profile suites (round-trip, AddressingGrammar) skip. |
The DL205 / Mitsubishi / S7-1500 profiles are similar — each gates its
own quirks suite via `MODBUS_SIM_PROFILE=<profile>`. Tests that don't
need a specific profile (the basic round-trip set) run under any of
the three pymodbus-based profiles. The `exception_injection` profile
is the only one that runs `exception_injector.py` instead of pymodbus.
## Endpoint
- Default: `localhost:5020`
@@ -5,7 +5,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests;
/// <summary>
/// Reachability probe for a Modbus TCP simulator (pymodbus in Docker, see
/// <c>Docker/docker-compose.yml</c>) or a real PLC. Parses
/// <c>MODBUS_SIM_ENDPOINT</c> (default <c>localhost:5020</c> per PR 43) and TCP-connects once at
/// <c>MODBUS_SIM_ENDPOINT</c> (default <c>10.100.0.35:5020</c> — the shared Docker host) and TCP-connects once at
/// fixture construction. Each test checks <see cref="SkipReason"/> and calls
/// <c>Assert.Skip</c> when the endpoint was unreachable, so a dev box without a running
/// simulator still passes `dotnet test` cleanly — matches the Galaxy live-smoke pattern in
@@ -29,8 +29,11 @@ public sealed class ModbusSimulatorFixture : IAsyncDisposable
// PR 43: default port is 5020 (pymodbus convention) instead of 502 (Modbus standard).
// Picking 5020 sidesteps the privileged-port admin requirement on Windows + matches the
// port baked into the pymodbus simulator JSON profiles in Docker/profiles/. Override with
// MODBUS_SIM_ENDPOINT to point at a real PLC on its native port 502.
private const string DefaultEndpoint = "localhost:5020";
// MODBUS_SIM_ENDPOINT to point at a real PLC on its native port 502, or to a
// locally-running container if the shared host is unavailable.
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's localhost on 2026-04-28 alongside the rest of the Docker-host move.
private const string DefaultEndpoint = "10.100.0.35:5020";
private const string EndpointEnvVar = "MODBUS_SIM_ENDPOINT";
public string Host { get; }
@@ -6,7 +6,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests;
/// Reachability probe for an <c>opc-plc</c> simulator (Microsoft Industrial IoT's
/// OPC UA PLC from <c>mcr.microsoft.com/iotedge/opc-plc</c>) or any real OPC UA
/// server the <c>OPCUA_SIM_ENDPOINT</c> env var points at. Parses
/// <c>OPCUA_SIM_ENDPOINT</c> (default <c>opc.tcp://localhost:50000</c>),
/// <c>OPCUA_SIM_ENDPOINT</c> (default <c>opc.tcp://10.100.0.35:50000</c> — the shared Docker host),
/// TCP-connects to the resolved host:port at collection init, and records a
/// <see cref="SkipReason"/> on failure. Tests call <c>Assert.Skip</c> on that, so
/// `dotnet test` stays green when Docker isn't running the simulator — mirrors the
@@ -32,7 +32,11 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests;
/// </remarks>
public sealed class OpcPlcFixture : IAsyncDisposable
{
private const string DefaultEndpoint = "opc.tcp://localhost:50000";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's localhost on 2026-04-28 alongside the rest of the Docker-host move.
// Override via OPCUA_SIM_ENDPOINT to point at a different host or a locally-running
// opc-plc instance.
private const string DefaultEndpoint = "opc.tcp://10.100.0.35:50000";
private const string EndpointEnvVar = "OPCUA_SIM_ENDPOINT";
/// <summary>Full <c>opc.tcp://host:port</c> URL the driver session should connect to.</summary>
@@ -5,7 +5,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.S7.IntegrationTests;
/// <summary>
/// Reachability probe for the python-snap7 simulator Docker container (see
/// <c>Docker/docker-compose.yml</c>) or a real S7 PLC. Parses <c>S7_SIM_ENDPOINT</c>
/// (default <c>localhost:1102</c>) + TCP-connects once at fixture construction.
/// (default <c>10.100.0.35:1102</c> — the shared Docker host) + TCP-connects once at fixture construction.
/// Tests check <see cref="SkipReason"/> + call <c>Assert.Skip</c> when unreachable, so
/// `dotnet test` stays green on a fresh box without the simulator installed —
/// mirrors the <c>ModbusSimulatorFixture</c> pattern.
@@ -35,7 +35,9 @@ public sealed class Snap7ServerFixture : IAsyncDisposable
{
// Default 1102 (non-privileged) matches Docker/server.py. Override with
// S7_SIM_ENDPOINT to point at a real PLC on its native 102.
private const string DefaultEndpoint = "localhost:1102";
// 10.100.0.35 = the shared Docker host (see CLAUDE.md "Docker Workflow"). Migrated
// off this VM's localhost on 2026-04-28 alongside the rest of the Docker-host move.
private const string DefaultEndpoint = "10.100.0.35:1102";
private const string EndpointEnvVar = "S7_SIM_ENDPOINT";
public string Host { get; }
@@ -6,6 +6,7 @@ using Microsoft.AspNetCore.Hosting;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.OtOpcUa.Admin.Hubs;
using ZB.MOM.WW.OtOpcUa.Admin.Security;
using ZB.MOM.WW.OtOpcUa.Configuration;
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
@@ -101,6 +102,15 @@ public sealed class AdminWebAppFactory : IAsyncDisposable
builder.Services.AddScoped<Admin.Services.DriverInstanceService>();
builder.Services.AddScoped<Admin.Services.DraftValidationService>();
// ClusterDetail.razor injects AdminHubConnectionFactory to drive the live-banner hub
// connection; the factory depends on HubTokenService, which in turn needs Data Protection.
// Without these the InteractiveServer circuit fails to instantiate the component and the
// page never advances past the "Loading…" placeholder — Playwright then times out
// waiting for any tab nav-link to appear. Mirrors Program.cs:35-36.
builder.Services.AddDataProtection();
builder.Services.AddSingleton<HubTokenService>();
builder.Services.AddScoped<Admin.Services.AdminHubConnectionFactory>();
_app = builder.Build();
_app.UseStaticFiles();
_app.UseRouting();
@@ -110,6 +110,66 @@ public sealed class GenerationRefreshHostedServiceTests : IDisposable
leases.OpenLeaseCount.ShouldBe(0, "IAsyncDisposable dispose must fire regardless of outcome");
}
// Bug #12 fix — verifies the previously-missing wiring: applies and heartbeats both
// emit sp_RegisterNodeGenerationApplied so Admin UI Fleet status + Redundancy LastSeenAt
// surface live state.
[Fact]
public async Task First_apply_reports_Applied_status_to_central_db()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var calls = new List<(long Gen, NodeApplyStatus Status, string? Error)>();
var service = NewService(coordinator, leases, currentGeneration: () => 42, registerCalls: calls);
await service.TickAsync(CancellationToken.None);
calls.Count.ShouldBe(1, "exactly one register call per apply window");
calls[0].Gen.ShouldBe(42);
calls[0].Status.ShouldBe(NodeApplyStatus.Applied);
calls[0].Error.ShouldBeNull();
}
[Fact]
public async Task No_change_tick_heartbeats_with_Applied_status()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var calls = new List<(long Gen, NodeApplyStatus Status, string? Error)>();
var service = NewService(coordinator, leases, currentGeneration: () => 42, registerCalls: calls);
await service.TickAsync(CancellationToken.None); // initial apply
await service.TickAsync(CancellationToken.None); // no-change heartbeat
await service.TickAsync(CancellationToken.None); // no-change heartbeat
calls.Count.ShouldBe(3, "one apply call + two heartbeat calls");
calls.ShouldAllBe(c => c.Gen == 42 && c.Status == NodeApplyStatus.Applied && c.Error == null);
}
[Fact]
public async Task Register_call_failure_does_not_break_apply_or_block_subsequent_ticks()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var registerCallCount = 0;
var service = new GenerationRefreshHostedService(
new NodeOptions { NodeId = "A", ClusterId = "c1", ConfigDbConnectionString = "unused" },
leases, coordinator, NullLogger<GenerationRefreshHostedService>.Instance,
tickInterval: TimeSpan.FromSeconds(1),
currentGenerationQuery: _ => Task.FromResult<long?>(42),
registerAppliedAsync: (gen, status, err, ct) =>
{
registerCallCount++;
throw new InvalidOperationException("simulated DB outage during register");
});
await service.TickAsync(CancellationToken.None); // apply succeeds, register throws
await service.TickAsync(CancellationToken.None); // heartbeat throws
registerCallCount.ShouldBe(2, "both register attempts must run");
service.LastAppliedGenerationId.ShouldBe(42, "register failure must not roll back the cursor");
}
// ---- fixture helpers ---------------------------------------------------
private async Task<RedundancyCoordinator> SeedCoordinatorAsync()
@@ -136,11 +196,15 @@ public sealed class GenerationRefreshHostedServiceTests : IDisposable
private static GenerationRefreshHostedService NewService(
RedundancyCoordinator coordinator,
ApplyLeaseRegistry leases,
Func<long?> currentGeneration) =>
Func<long?> currentGeneration,
List<(long Gen, NodeApplyStatus Status, string? Error)>? registerCalls = null) =>
new(new NodeOptions { NodeId = "A", ClusterId = "c1", ConfigDbConnectionString = "unused" },
leases, coordinator, NullLogger<GenerationRefreshHostedService>.Instance,
tickInterval: TimeSpan.FromSeconds(1),
currentGenerationQuery: _ => Task.FromResult(currentGeneration()));
currentGenerationQuery: _ => Task.FromResult(currentGeneration()),
registerAppliedAsync: registerCalls is null
? (_, _, _, _) => Task.CompletedTask
: (gen, status, err, _) => { registerCalls.Add((gen, status, err)); return Task.CompletedTask; });
private sealed class DbContextFactory(DbContextOptions<OtOpcUaConfigDbContext> options)
: IDbContextFactory<OtOpcUaConfigDbContext>