3 Commits

Author SHA1 Message Date
Joseph Doherty
f23e368a74 fix(server, admin): wire sp_RegisterNodeGenerationApplied + overlay heartbeat onto ClusterNode
dbo.sp_RegisterNodeGenerationApplied was defined by the initial
StoredProcedures migration but had zero callers in src/. The server
polled sp_GetCurrentGenerationForCluster every 5s but never reported
back, so dbo.ClusterNodeGenerationState stayed empty for every node
and both the Admin UI Fleet status page ("No node state recorded")
and the cluster-detail Redundancy LastSeenAt indicator ("never
STALE") showed broken liveness forever.

Server side (GenerationRefreshHostedService):
* New testable seam: Func<long, NodeApplyStatus, string?, CT, Task>?
  registerAppliedAsync constructor parameter, defaulting to a real
  sp_RegisterNodeGenerationApplied call against the central DB.
* TickAsync now calls the proc at two points: after every successful
  apply with NodeApplyStatus.Applied, and on every no-change tick as
  a heartbeat (also Applied) so LastSeenAt stays fresh.
* Apply failures now wrap the lease + coordinator.RefreshAsync in a
  try/catch, report NodeApplyStatus.Failed with the exception message,
  and advance LastAppliedGenerationId regardless of outcome so we
  don't loop on the same broken apply every 5s.
* Register-call failures are best-effort (LogDebug heartbeat, LogWarning
  apply-report) — a transient DB outage during reporting must not
  crash the publisher or block the next apply.

Admin side (ClusterNodeService.ListByClusterAsync): the Redundancy tab
reads ClusterNode.LastSeenAt, but no current writer maintains that
column — the heartbeat goes to ClusterNodeGenerationState.LastSeenAt.
Overlay the GenerationState heartbeat onto the returned ClusterNode
rows when more recent, so IsStale + the Redundancy table column
reflect actual liveness without a schema change or new write path.

Tests: 3 new cases on GenerationRefreshHostedServiceTests verify
first-apply reports Applied, no-change ticks heartbeat with Applied,
and register-call failure does not roll back the cursor or block
subsequent ticks. All 8 GenerationRefresh tests pass.

Verified live on node-dev-a / cluster-dev: dbo.ClusterNodeGenerationState
now populated with CurrentGenerationId=1, LastAppliedStatus=Applied,
fresh LastSeenAt. Fleet status page shows the node (KPIs NODES 1 /
APPLIED 1 / STALE 0 / FAILED 0). Redundancy tab KPI STALE went 1\xe2\x86\x920 and
the row shows a real LAST SEEN timestamp. Bonus: FleetStatusHub
SignalR push now fires the cluster-page Live update banner on every
heartbeat because there are finally state changes to push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:59 -04:00
Joseph Doherty
c8de58d6d3 fix(admin-ui): render published gen read-only on the 6 cluster-detail content tabs
The Equipment, UNS Structure, Namespaces, Drivers, Tags, and ACLs tabs
all rendered only an "Open a draft to edit" placeholder when no draft
was open — even when the cluster had a fully populated published
generation. docs/v2/admin-ui.md \xa7Cluster Detail describes these as
"read-only views of the published generation" with an "Edit in draft"
affordance; that view was never wired. The earlier code path also
correctly rendered nothing when the cluster had no published gen yet,
which was indistinguishable from the broken state.

Collapse the six per-tab conditions into one shared branch that threads
the published gen ID into the existing tab components when no draft
exists, wrapped in <fieldset disabled> so any Add/Edit button click in
the read-only state cannot silently mutate published rows even though
the tab components themselves don't yet honor an IsReadOnly flag.
Banner above the content explains the state. Surgical: zero changes to
the ~1500 LoC across the six tab components.

Verified live on cluster-dev gen 1: Drivers tab now shows the
cluster-dev-galaxy-main GalaxyMxGateway row read-only; Namespaces tab
shows cluster-dev-galaxy-ns SystemPlatform row; both with the read-only
banner and visibly disabled affordances.

Follow-up worth doing later: refactor each tab component to accept an
IsReadOnly parameter so the disabled-affordance UX is per-tab rather
than a blanket fieldset opacity wash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 02:22:32 -04:00
Joseph Doherty
8fe7c8bea6 refactor(driver-galaxy): switch to sibling-repo MxGateway client + drop vendored libs
The sibling mxaccessgw repo (clients/dotnet/) restored a proper client
library + contracts under the new ZB.MOM.WW.MxGateway namespace, so the
binary-vendoring stopgap from PR Driver.Galaxy-016 can unwind via plan #1
of libs/README.md.

- csproj: replace <Reference HintPath="libs\MxGateway.*.dll"> with a
  ProjectReference into ..\..\..\..\mxaccessgw\clients\dotnet  ZB.MOM.WW.MxGateway.Client\. The five backfill PackageReference shims
  (Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client, Polly.Core,
  Microsoft.Extensions.Logging.Abstractions) are now transitive again.
- Source: 'using MxGateway.X' -> 'using ZB.MOM.WW.MxGateway.X' across
  19 driver files + 14 test files. No fully-qualified MxGateway.* usages
  in code, so no behavioural changes — purely a using-prefix flip.
- libs/: deleted MxGateway.Client.dll, MxGateway.Contracts.dll, README.md
  (orphan after the unwind).

Verified: dotnet build clean (Release), all 245 Driver.Galaxy unit tests
pass, OtOpcUa service running with the new client DLL loaded
(opc.tcp://localhost:4840/OtOpcUa, no FileNotFound/TypeLoad/
MissingMethod in startup logs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 14:55:15 -04:00
41 changed files with 273 additions and 211 deletions

View File

@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;

View File

@@ -1,5 +1,5 @@
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;

View File

@@ -1,5 +1,5 @@
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;

View File

@@ -1,7 +1,7 @@
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;

View File

@@ -2,7 +2,7 @@ using System.Diagnostics.Metrics;
using System.Threading.Channels; using System.Threading.Channels;
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,7 +1,7 @@
using System.Diagnostics.Metrics; using System.Diagnostics.Metrics;
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,8 +1,8 @@
using System.Collections.Concurrent; using System.Collections.Concurrent;
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions; using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,5 +1,5 @@
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
// Use the generated nested status enum for the SetBufferedUpdateInterval reply check. // Use the generated nested status enum for the SetBufferedUpdateInterval reply check.
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,6 +1,6 @@
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using MxGateway.Client; using ZB.MOM.WW.MxGateway.Client;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,5 +1,5 @@
using System.Runtime.CompilerServices; using System.Runtime.CompilerServices;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -18,39 +18,15 @@
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
<!-- Vendored mxaccessgw .NET client. Originally consumed via path-based <!-- Sibling mxaccessgw repo's .NET client + contracts. The sibling restored
ProjectReference to the sibling repo, but the sibling repo restructured a proper client library under clients/dotnet/ (May 2026), so this is
and the MxGateway.Client.csproj path no longer exists. The DLLs in back on a path-based ProjectReference per the libs/README unwind plan #1.
libs/ are the last known-good build (May 2026); they reference proto Both projects target net10.0; the Contracts project transitively pulls
types from MxGateway.Contracts.dll using the pre-restructure namespace Google.Protobuf + Grpc.Core.Api, the Client project transitively pulls
(MxGateway.Contracts.Proto). See libs/README.md for the unwinding plan Grpc.Net.Client + Polly.Core + Microsoft.Extensions.Logging.Abstractions,
once the sibling repo restores a client library or we migrate to the so the explicit PackageReference shims that backfilled the vendored
new ZB.MOM.WW.MxGateway.Contracts.Proto namespace. --> binary references are no longer needed. -->
<Reference Include="MxGateway.Client"> <ProjectReference Include="..\..\..\..\mxaccessgw\clients\dotnet\ZB.MOM.WW.MxGateway.Client\ZB.MOM.WW.MxGateway.Client.csproj"/>
<HintPath>libs\MxGateway.Client.dll</HintPath>
<Private>true</Private>
</Reference>
<Reference Include="MxGateway.Contracts">
<HintPath>libs\MxGateway.Contracts.dll</HintPath>
<Private>true</Private>
</Reference>
</ItemGroup>
<ItemGroup>
<!-- Transitive deps the vendored MxGateway.Client.dll was actually built
against (verified by reflecting GetReferencedAssemblies on the DLL —
see libs/README.md). Versions align with the sibling mxaccessgw repo's
current Server / Worker projects so binary-compat stays close to what
the team uses elsewhere. Pre-Driver.Galaxy-016 the csproj declared
`Polly` (the v7 API) instead of `Polly.Core` (the v8 API the DLL was
built against) — a package-name mistake, not just a version skew —
which would surface as a runtime MissingMethodException the first
time the client's retry pipeline ran. -->
<PackageReference Include="Google.Protobuf" Version="3.34.1" />
<PackageReference Include="Grpc.Core.Api" Version="2.76.0" />
<PackageReference Include="Grpc.Net.Client" Version="2.76.0" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.7" />
<PackageReference Include="Polly.Core" Version="8.6.6" />
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>

View File

@@ -1,101 +0,0 @@
# Vendored MxGateway client DLLs
This directory holds binary copies of `MxGateway.Client.dll` and
`MxGateway.Contracts.dll` from the sibling `mxaccessgw` repo's last known-good
build (May 2026). The DLLs are referenced from the driver's csproj as
`<Reference HintPath="…" />` items rather than `ProjectReference`.
## Provenance
Both DLLs are built from this team's own `mxaccessgw` source tree — they are
not third-party binaries. The build commit + checksums below are recorded so
future readers can verify the artefacts match the expected source without
needing to ask the original author.
| File | Source commit | SHA-256 |
|---|---|---|
| `MxGateway.Client.dll` | `dd7ca1634e2d2b8a866c81f0009bf87ee9427750` (mxaccessgw repo, pre-restructure) | `3507f770adc8c1b27b2fc4645079c6e4e02d5c65b9545c12d637cd2a080a00bd` |
| `MxGateway.Contracts.dll` | `dd7ca1634e2d2b8a866c81f0009bf87ee9427750` (mxaccessgw repo, pre-restructure) | `437dc6cb6994c7c4d858c82f69af890732c7ffbfa0463fbd8a63ce7930d251b4` |
The build commit is the same for both DLLs and is embedded as
`AssemblyInformationalVersion` inside each binary — re-verify by running:
`ilspycmd <dll> | grep AssemblyInformationalVersion`.
To re-verify the checksums (e.g. after a clone):
```bash
sha256sum libs/MxGateway.Client.dll libs/MxGateway.Contracts.dll
```
If either SHA-256 or the embedded source commit no longer matches what's
listed above, the artefact has been replaced — verify before trusting.
## Why vendored
The sibling `mxaccessgw` repo restructured: the `clients/dotnet/MxGateway.Client`
project the driver previously referenced via path-based `ProjectReference` no
longer exists, and the proto contracts moved from the `MxGateway.Contracts.Proto`
namespace to `ZB.MOM.WW.MxGateway.Contracts.Proto`. The driver's source still
expects the pre-restructure namespace, so re-pointing at the new contracts would
require a global namespace rename across ~19 driver files PLUS reimplementing
the `MxGatewayClient` / `MxGatewaySession` / `GalaxyRepositoryClient` types the
old client library provided (the sibling repo dropped the client library
entirely, keeping only the contracts).
Vendoring the binaries unblocked the build in minutes instead of hours, freezes
the gateway contract surface at a known-good version, and preserves the option
to migrate properly later without an emergency rewrite.
## What's vendored
| File | Built against |
|---|---|
| `MxGateway.Client.dll` | net10.0, references `MxGateway.Contracts.dll` |
| `MxGateway.Contracts.dll` | net10.0, proto namespace `MxGateway.Contracts.Proto[.Galaxy]` |
The NuGet packages the vendored DLLs reference (verified by reflecting
`Assembly.GetReferencedAssemblies()` against `MxGateway.Client.dll`) are
declared as direct `PackageReference` in the driver csproj — when the dropped
`ProjectReference` was in place those packages were transitively provided;
with binary references the consumer must declare them explicitly:
| Package | Reason |
|---|---|
| `Google.Protobuf` 3.34.1 | Proto message types in `MxGateway.Contracts.dll` |
| `Grpc.Core.Api` 2.76.0 | Base gRPC client types in `MxGateway.Client.dll` |
| `Grpc.Net.Client` 2.76.0 | HTTP/2 transport used by `MxGatewayClient` |
| `Microsoft.Extensions.Logging.Abstractions` 10.0.7 | `ILogger` used by the client |
| `Polly.Core` 8.6.6 | Retry pipeline used by `MxGatewayClient` |
Versions match the sibling mxaccessgw repo's current Server / Worker
projects (`ZB.MOM.WW.MxGateway.Server.csproj`,
`ZB.MOM.WW.MxGateway.Worker.csproj`) so the runtime versions stay close to
what the gateway team uses. The pre-Driver.Galaxy-016 declarations were
incorrect — most visibly `Polly 8.5.2` was declared where the DLL actually
needs `Polly.Core` (a different package: `Polly` v7 is the older fluent API;
`Polly.Core` v8 is the modern resilience-pipeline API the gateway client was
built against). A `Polly` reference would have failed at runtime with
`MissingMethodException` the first time a retry pipeline ran.
## Decompiled-source archive
The vendored DLLs are byte-for-byte the build output. The full source can be
recovered with `ilspycmd MxGateway.Client.dll > MxGateway.Client.cs` if a code
review or audit needs it.
## How to unwind
Either path closes the vendored-binary debt:
1. **Sibling repo restores `MxGateway.Client.csproj`** (or publishes a NuGet
package). Switch the csproj back to a `ProjectReference` / `PackageReference`,
delete this directory.
2. **Driver migrates to the new `ZB.MOM.WW.MxGateway.Contracts.Proto`
namespace.** Global namespace rename across the ~19 consuming source files,
plus re-implementing `MxGatewayClient` / `MxGatewaySession` /
`GalaxyRepositoryClient` (≈2,200 LoC of behavioural client code) either
inlined into this driver or as a fresh sibling library. Delete this
directory.
Either way: when unwinding, also drop the five `PackageReference` lines added
to the csproj alongside the `<Reference>` items — the new ProjectReference /
PackageReference will provide them transitively again.

View File

@@ -101,29 +101,44 @@ else
{ {
<Generations ClusterId="@ClusterId"/> <Generations ClusterId="@ClusterId"/>
} }
else if (_tab == "equipment" && _currentDraft is not null) else if (_tab is "equipment" or "uns" or "namespaces" or "drivers" or "tags" or "acls")
{ {
<EquipmentTab GenerationId="@_currentDraft.GenerationId"/> @* Bug #10 fix — these six tabs are scoped to a generation. Per docs/v2/admin-ui.md the
design intent is a read-only view of the published generation when no draft is open
("Edit in draft" affordance), and the editable view of the draft when one is open.
The earlier implementation rendered nothing in the no-draft case, leaving operators
with just the "Open a draft to edit" placeholder. We now route both states through
the same tab components, gating edits via <fieldset disabled> so a button click in
the read-only state cannot silently mutate the published rows even though the tab
components themselves haven't been refactored to honor an IsReadOnly flag yet. *@
var genId = _currentDraft?.GenerationId ?? _currentPublished?.GenerationId;
var isReadOnly = _currentDraft is null;
if (genId is null)
{
<section class="panel notice rise" style="animation-delay:.02s">
No published generation yet. Click <strong>New draft</strong> above to author this cluster's first generation.
</section>
} }
else if (_tab == "uns" && _currentDraft is not null) else
{ {
<UnsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/> if (isReadOnly)
{
<section class="panel notice rise mb-3" style="animation-delay:.02s">
<strong>Read-only view</strong> of published generation @genId. Click <strong>New draft</strong> above to make changes.
</section>
} }
else if (_tab == "namespaces" && _currentDraft is not null) <fieldset disabled="@isReadOnly" style="border:0;padding:0;margin:0;min-width:0;">
@switch (_tab)
{ {
<NamespacesTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/> case "equipment": <EquipmentTab GenerationId="@genId.Value"/> break;
case "uns": <UnsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "namespaces": <NamespacesTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "drivers": <DriversTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "tags": <TagsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
case "acls": <AclsTab GenerationId="@genId.Value" ClusterId="@ClusterId"/> break;
} }
else if (_tab == "drivers" && _currentDraft is not null) </fieldset>
{
<DriversTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
} }
else if (_tab == "tags" && _currentDraft is not null)
{
<TagsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
}
else if (_tab == "acls" && _currentDraft is not null)
{
<AclsTab GenerationId="@_currentDraft.GenerationId" ClusterId="@ClusterId"/>
} }
else if (_tab == "redundancy") else if (_tab == "redundancy")
{ {
@@ -133,10 +148,6 @@ else
{ {
<AuditTab ClusterId="@ClusterId"/> <AuditTab ClusterId="@ClusterId"/>
} }
else
{
<section class="panel notice rise" style="animation-delay:.02s">Open a draft to edit this cluster's content.</section>
}
} }
@code { @code {

View File

@@ -16,12 +16,40 @@ public sealed class ClusterNodeService(OtOpcUaConfigDbContext db)
/// tolerance covers a missed heartbeat plus publisher GC pauses.</summary> /// tolerance covers a missed heartbeat plus publisher GC pauses.</summary>
public static readonly TimeSpan StaleThreshold = TimeSpan.FromSeconds(30); public static readonly TimeSpan StaleThreshold = TimeSpan.FromSeconds(30);
public Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct) => public async Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct)
db.ClusterNodes.AsNoTracking() {
var nodes = await db.ClusterNodes.AsNoTracking()
.Where(n => n.ClusterId == clusterId) .Where(n => n.ClusterId == clusterId)
.OrderByDescending(n => n.ServiceLevelBase) .OrderByDescending(n => n.ServiceLevelBase)
.ThenBy(n => n.NodeId) .ThenBy(n => n.NodeId)
.ToListAsync(ct); .ToListAsync(ct).ConfigureAwait(false);
// Bug #12 fix follow-up — the live-node heartbeat lands on
// ClusterNodeGenerationState.LastSeenAt (written by sp_RegisterNodeGenerationApplied
// on every generation poll). The ClusterNode.LastSeenAt column is a legacy slot that
// no current writer maintains, so reading it directly would show "never STALE"
// forever for every running node. Overlay the GenerationState heartbeat onto the
// returned ClusterNode rows when it's more recent so the Redundancy tab + IsStale
// predicate reflect actual liveness without needing a new write path or schema change.
var nodeIds = nodes.Select(n => n.NodeId).ToList();
if (nodeIds.Count > 0)
{
var heartbeats = await db.ClusterNodeGenerationStates.AsNoTracking()
.Where(s => nodeIds.Contains(s.NodeId))
.Select(s => new { s.NodeId, s.LastSeenAt })
.ToListAsync(ct).ConfigureAwait(false);
var beatByNode = heartbeats.ToDictionary(s => s.NodeId, s => s.LastSeenAt);
foreach (var n in nodes)
{
if (beatByNode.TryGetValue(n.NodeId, out var hb) && hb is not null
&& (n.LastSeenAt is null || hb > n.LastSeenAt))
{
n.LastSeenAt = hb;
}
}
}
return nodes;
}
public static bool IsStale(ClusterNode node) => public static bool IsStale(ClusterNode node) =>
node.LastSeenAt is null || DateTime.UtcNow - node.LastSeenAt.Value > StaleThreshold; node.LastSeenAt is null || DateTime.UtcNow - node.LastSeenAt.Value > StaleThreshold;

View File

@@ -1,6 +1,7 @@
using Microsoft.Data.SqlClient; using Microsoft.Data.SqlClient;
using Microsoft.Extensions.Hosting; using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
using ZB.MOM.WW.OtOpcUa.Server.Redundancy; using ZB.MOM.WW.OtOpcUa.Server.Redundancy;
namespace ZB.MOM.WW.OtOpcUa.Server.Hosting; namespace ZB.MOM.WW.OtOpcUa.Server.Hosting;
@@ -42,10 +43,20 @@ public sealed class GenerationRefreshHostedService(
RedundancyCoordinator coordinator, RedundancyCoordinator coordinator,
ILogger<GenerationRefreshHostedService> logger, ILogger<GenerationRefreshHostedService> logger,
TimeSpan? tickInterval = null, TimeSpan? tickInterval = null,
Func<CancellationToken, Task<long?>>? currentGenerationQuery = null) : BackgroundService Func<CancellationToken, Task<long?>>? currentGenerationQuery = null,
Func<long, NodeApplyStatus, string?, CancellationToken, Task>? registerAppliedAsync = null) : BackgroundService
{ {
private readonly Func<CancellationToken, Task<long?>> _generationQuery = currentGenerationQuery private readonly Func<CancellationToken, Task<long?>> _generationQuery = currentGenerationQuery
?? new Func<CancellationToken, Task<long?>>(ct => DefaultQueryCurrentGenerationAsync(options, logger, ct)); ?? new Func<CancellationToken, Task<long?>>(ct => DefaultQueryCurrentGenerationAsync(options, logger, ct));
// Bug #12 fix — the server now reports applied-generation state + heartbeat back to the
// central DB via sp_RegisterNodeGenerationApplied. Before this wiring the proc had zero
// callers, so dbo.ClusterNodeGenerationState stayed empty for every node and the Admin UI
// Fleet status page + cluster-detail Redundancy LastSeenAt both showed "no node state /
// never STALE" indefinitely. Tests inject a stub via the registerAppliedAsync parameter.
private readonly Func<long, NodeApplyStatus, string?, CancellationToken, Task> _registerApplied = registerAppliedAsync
?? new Func<long, NodeApplyStatus, string?, CancellationToken, Task>(
(gen, status, err, ct) => DefaultRegisterAppliedAsync(options, logger, gen, status, err, ct));
/// <summary> /// <summary>
/// How often the service polls <c>sp_GetCurrentGenerationForCluster</c>. Default 5 s — /// How often the service polls <c>sp_GetCurrentGenerationForCluster</c>. Default 5 s —
/// low enough that operator publishes take effect promptly, high enough that the /// low enough that operator publishes take effect promptly, high enough that the
@@ -97,6 +108,18 @@ public sealed class GenerationRefreshHostedService(
if (LastAppliedGenerationId is long last && current == last) if (LastAppliedGenerationId is long last && current == last)
{ {
// Heartbeat — re-stamps LastSeenAt on dbo.ClusterNodeGenerationState so the Admin
// Fleet status page + cluster Redundancy tab can detect the node is alive without
// a generation change. Best-effort: a transient DB error here must not throw out of
// the tick (the next tick will retry) and must not block applies.
try
{
await _registerApplied(current.Value, NodeApplyStatus.Applied, null, cancellationToken).ConfigureAwait(false);
}
catch (Exception hbEx) when (hbEx is not OperationCanceledException)
{
logger.LogDebug(hbEx, "Heartbeat to sp_RegisterNodeGenerationApplied failed; will retry next tick");
}
return; // no change return; // no change
} }
@@ -109,6 +132,10 @@ public sealed class GenerationRefreshHostedService(
// lease is open. Publisher ticks in parallel (1s cadence) will observe the band // lease is open. Publisher ticks in parallel (1s cadence) will observe the band
// transition and push it onto the OPC UA Server.ServiceLevel node. // transition and push it onto the OPC UA Server.ServiceLevel node.
var publishRequestId = Guid.NewGuid(); var publishRequestId = Guid.NewGuid();
NodeApplyStatus applyStatus;
string? applyError = null;
try
{
await using (leases.BeginApplyLease(current.Value, publishRequestId)) await using (leases.BeginApplyLease(current.Value, publishRequestId))
{ {
await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false); await coordinator.RefreshAsync(cancellationToken).ConfigureAwait(false);
@@ -116,7 +143,33 @@ public sealed class GenerationRefreshHostedService(
// scripted-alarm engine subscribe to. For now the topology refresh is the // scripted-alarm engine subscribe to. For now the topology refresh is the
// only thing we rewire — everything else still requires a process restart. // only thing we rewire — everything else still requires a process restart.
} }
applyStatus = NodeApplyStatus.Applied;
}
catch (Exception applyEx) when (applyEx is not OperationCanceledException)
{
applyStatus = NodeApplyStatus.Failed;
applyError = applyEx.Message;
logger.LogError(applyEx, "Apply of generation {Generation} failed; will report Failed status to central DB", current);
// fall through to register so operators see the failed apply in /fleet
}
// Always tell the central DB what happened with this apply attempt — success or
// failure. The proc upserts dbo.ClusterNodeGenerationState (CurrentGenerationId +
// LastAppliedAt + LastAppliedStatus + LastAppliedError + LastSeenAt). Failure here
// mustn't prevent us from advancing LastAppliedGenerationId — the apply already
// happened (or already failed); the publish is purely observability.
try
{
await _registerApplied(current.Value, applyStatus, applyError, cancellationToken).ConfigureAwait(false);
}
catch (Exception regEx) when (regEx is not OperationCanceledException)
{
logger.LogWarning(regEx, "sp_RegisterNodeGenerationApplied call failed for gen {Generation} status {Status}", current, applyStatus);
}
// Advance the cursor even on Failed — the proc has been told; next tick will heartbeat
// and a future generation will trigger a fresh apply attempt. Pinning the cursor on
// failure would loop us through the same broken apply every 5s.
LastAppliedGenerationId = current; LastAppliedGenerationId = current;
RefreshCount++; RefreshCount++;
} }
@@ -157,4 +210,35 @@ public sealed class GenerationRefreshHostedService(
return null; return null;
} }
} }
/// <summary>
/// Default register-applied implementation — calls <c>sp_RegisterNodeGenerationApplied</c>
/// to MERGE-upsert <see cref="ZB.MOM.WW.OtOpcUa.Configuration.Entities.ClusterNodeGenerationState"/>
/// for this node. Called both at apply completion (success or failure) and on every
/// no-change heartbeat tick so <c>LastSeenAt</c> stays fresh in the central DB and the
/// Admin UI Fleet status page + Redundancy LastSeenAt indicator can detect a healthy node.
/// Bug #12 fix — wires the previously-orphaned proc into the apply loop.
/// </summary>
private static async Task DefaultRegisterAppliedAsync(
NodeOptions options,
ILogger logger,
long generationId,
NodeApplyStatus status,
string? error,
CancellationToken cancellationToken)
{
await using var conn = new SqlConnection(options.ConfigDbConnectionString);
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
await using var cmd = conn.CreateCommand();
cmd.CommandText = "EXEC dbo.sp_RegisterNodeGenerationApplied @NodeId=@n, @GenerationId=@g, @Status=@s, @Error=@e";
cmd.Parameters.AddWithValue("@n", options.NodeId);
cmd.Parameters.AddWithValue("@g", generationId);
cmd.Parameters.AddWithValue("@s", status.ToString());
cmd.Parameters.AddWithValue("@e", (object?)error ?? DBNull.Value);
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
// Single-line trace so soak runs can see heartbeat ticks without flooding at Info.
logger.LogTrace("Reported gen {Generation} status {Status} to central DB", generationId, status);
}
} }

View File

@@ -1,7 +1,7 @@
using System.Runtime.CompilerServices; using System.Runtime.CompilerServices;
using System.Threading.Channels; using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,6 +1,6 @@
using System.Threading.Channels; using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,7 +1,7 @@
using System.Diagnostics.Metrics; using System.Diagnostics.Metrics;
using System.Threading.Channels; using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,6 +1,6 @@
using System.Threading.Channels; using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,6 +1,6 @@
using System.Threading.Channels; using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto.Galaxy; using ZB.MOM.WW.MxGateway.Contracts.Proto.Galaxy;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,5 +1,5 @@
using System.Diagnostics; using System.Diagnostics;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,6 +1,6 @@
using System.Runtime.CompilerServices; using System.Runtime.CompilerServices;
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Abstractions;

View File

@@ -1,6 +1,6 @@
using Google.Protobuf; using Google.Protobuf;
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,5 +1,5 @@
using Google.Protobuf.WellKnownTypes; using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -1,4 +1,4 @@
using MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using Shouldly; using Shouldly;
using Xunit; using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;

View File

@@ -110,6 +110,66 @@ public sealed class GenerationRefreshHostedServiceTests : IDisposable
leases.OpenLeaseCount.ShouldBe(0, "IAsyncDisposable dispose must fire regardless of outcome"); leases.OpenLeaseCount.ShouldBe(0, "IAsyncDisposable dispose must fire regardless of outcome");
} }
// Bug #12 fix — verifies the previously-missing wiring: applies and heartbeats both
// emit sp_RegisterNodeGenerationApplied so Admin UI Fleet status + Redundancy LastSeenAt
// surface live state.
[Fact]
public async Task First_apply_reports_Applied_status_to_central_db()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var calls = new List<(long Gen, NodeApplyStatus Status, string? Error)>();
var service = NewService(coordinator, leases, currentGeneration: () => 42, registerCalls: calls);
await service.TickAsync(CancellationToken.None);
calls.Count.ShouldBe(1, "exactly one register call per apply window");
calls[0].Gen.ShouldBe(42);
calls[0].Status.ShouldBe(NodeApplyStatus.Applied);
calls[0].Error.ShouldBeNull();
}
[Fact]
public async Task No_change_tick_heartbeats_with_Applied_status()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var calls = new List<(long Gen, NodeApplyStatus Status, string? Error)>();
var service = NewService(coordinator, leases, currentGeneration: () => 42, registerCalls: calls);
await service.TickAsync(CancellationToken.None); // initial apply
await service.TickAsync(CancellationToken.None); // no-change heartbeat
await service.TickAsync(CancellationToken.None); // no-change heartbeat
calls.Count.ShouldBe(3, "one apply call + two heartbeat calls");
calls.ShouldAllBe(c => c.Gen == 42 && c.Status == NodeApplyStatus.Applied && c.Error == null);
}
[Fact]
public async Task Register_call_failure_does_not_break_apply_or_block_subsequent_ticks()
{
var coordinator = await SeedCoordinatorAsync();
var leases = new ApplyLeaseRegistry();
var registerCallCount = 0;
var service = new GenerationRefreshHostedService(
new NodeOptions { NodeId = "A", ClusterId = "c1", ConfigDbConnectionString = "unused" },
leases, coordinator, NullLogger<GenerationRefreshHostedService>.Instance,
tickInterval: TimeSpan.FromSeconds(1),
currentGenerationQuery: _ => Task.FromResult<long?>(42),
registerAppliedAsync: (gen, status, err, ct) =>
{
registerCallCount++;
throw new InvalidOperationException("simulated DB outage during register");
});
await service.TickAsync(CancellationToken.None); // apply succeeds, register throws
await service.TickAsync(CancellationToken.None); // heartbeat throws
registerCallCount.ShouldBe(2, "both register attempts must run");
service.LastAppliedGenerationId.ShouldBe(42, "register failure must not roll back the cursor");
}
// ---- fixture helpers --------------------------------------------------- // ---- fixture helpers ---------------------------------------------------
private async Task<RedundancyCoordinator> SeedCoordinatorAsync() private async Task<RedundancyCoordinator> SeedCoordinatorAsync()
@@ -136,11 +196,15 @@ public sealed class GenerationRefreshHostedServiceTests : IDisposable
private static GenerationRefreshHostedService NewService( private static GenerationRefreshHostedService NewService(
RedundancyCoordinator coordinator, RedundancyCoordinator coordinator,
ApplyLeaseRegistry leases, ApplyLeaseRegistry leases,
Func<long?> currentGeneration) => Func<long?> currentGeneration,
List<(long Gen, NodeApplyStatus Status, string? Error)>? registerCalls = null) =>
new(new NodeOptions { NodeId = "A", ClusterId = "c1", ConfigDbConnectionString = "unused" }, new(new NodeOptions { NodeId = "A", ClusterId = "c1", ConfigDbConnectionString = "unused" },
leases, coordinator, NullLogger<GenerationRefreshHostedService>.Instance, leases, coordinator, NullLogger<GenerationRefreshHostedService>.Instance,
tickInterval: TimeSpan.FromSeconds(1), tickInterval: TimeSpan.FromSeconds(1),
currentGenerationQuery: _ => Task.FromResult(currentGeneration())); currentGenerationQuery: _ => Task.FromResult(currentGeneration()),
registerAppliedAsync: registerCalls is null
? (_, _, _, _) => Task.CompletedTask
: (gen, status, err, _) => { registerCalls.Add((gen, status, err)); return Task.CompletedTask; });
private sealed class DbContextFactory(DbContextOptions<OtOpcUaConfigDbContext> options) private sealed class DbContextFactory(DbContextOptions<OtOpcUaConfigDbContext> options)
: IDbContextFactory<OtOpcUaConfigDbContext> : IDbContextFactory<OtOpcUaConfigDbContext>