fix(code-review): resolve Batch 3 wave A (OpcUaServer history/guard, ControlPlane topology gate)

- OpcUaServer-002: HistoryRead-Events NumValuesPerNode==0 now maps to unbounded (int.MaxValue) instead of the backend default-cap sentinel; no Core.Abstractions contract change (+EventMaxEvents helper tests)
- OpcUaServer-004: EnsureAddressSpaceCreated guard on public mutators -> clear InvalidOperationException instead of bare NRE if called pre-start (+tests)
- OpcUaServer-003: Deferred (endUtc inclusive/exclusive needs live Wonderware boundary confirmation)
- Configuration-013: wire DraftValidator.ValidateClusterTopology into AdminOperationsActor deploy gate (read-only, no migration) (+2 tests)
This commit is contained in:
Joseph Doherty
2026-06-20 22:53:29 -04:00
parent c817d7720e
commit 94eec70fb0
8 changed files with 455 additions and 13 deletions
+3 -3
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-19 (re-review; first reviewed 2026-05-22) |
| Commit reviewed | `7286d320` (re-review; was `76d35d1`) |
| Status | Reviewed |
| Open findings | 1 |
| Open findings | 0 |
## Checklist coverage
@@ -232,13 +232,13 @@ Prior findings Configuration-001…011 remain Resolved. Notable since the first
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:243` (`ValidateClusterTopology`) |
| Status | Open |
| Status | Resolved |
**Description:** `DraftValidator.ValidateClusterTopology` is documented as the managed pre-publish guard that catches cluster-topology drift the SQL `CK_ServerCluster_RedundancyMode_NodeCount` check cannot see — specifically an operator disabling a `ClusterNode` (effective enabled-count = 1) while `RedundancyMode` stays `Hot`/`Warm`, which would boot the runtime into an invalid-topology band. It is fully unit-tested (`DraftValidatorTests` §"ValidateClusterTopology") but **no production code calls it.** The deploy gate in `AdminOperationsActor.StartDeployment` runs `DraftValidator.Validate(...)` (the snapshot rules) but never `ValidateClusterTopology(...)`, so the documented enabled-node-count guard is inert at deploy time — the only thing standing is the row-level SQL CHECK, which the doc explicitly says is insufficient.
**Recommendation:** Wire `ValidateClusterTopology` into the deploy/publish path — load the `ServerCluster` row(s) + their `ClusterNode`s and run it alongside `Validate`, folding its errors into the same reject summary. The fix belongs in `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (a different module), so it is **deferred from this module's edit scope** and recorded here against the now-dead Configuration-layer method. Cross-module: ControlPlane.
**Resolution:** _(open — fix is in the ControlPlane module's `AdminOperationsActor`, outside Configuration's edit scope)_
**Resolution:** Resolved 2026-06-20 — wired `DraftValidator.ValidateClusterTopology` into the deploy gate in the ControlPlane module's `AdminOperationsActor.HandleStartDeploymentAsync` (`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs`). Immediately after the existing `DraftValidator.Validate(draft)` call, the handler now loads the `ServerCluster` rows (ClusterId-ordered for a deterministic summary) and their `ClusterNode`s from the **same** `db` context already open in the handler — read-only via `AsNoTracking()`, no second DbContext lifetime, no schema/migration or entity change — groups the nodes by `ClusterId`, and runs `ValidateClusterTopology(cluster, nodes)` per cluster. Its errors are appended to the SAME error list (`Validate(...)` now collected into a `List<ValidationError>`), so a deploy failing either the snapshot rules or the topology guard is rejected with both sets of messages folded into the single reject summary string; ordering stays deterministic (snapshot rules first, then per-cluster topology errors in ClusterId order). The previously-inert enabled-node-count guard (e.g. `RedundancyMode = Hot` with one `ClusterNode` toggled off, effective enabled-count = 1) now rejects at deploy time rather than relying solely on the row-level SQL CHECK the doc says is insufficient. New ControlPlane tests `AdminOperationsActorTests.StartDeployment_rejects_on_invalid_cluster_topology_disabled_node` (Hot + one disabled node → `Rejected` with `ClusterEnabledNodeCountMismatch`, no coordinator dispatch, no Deployment row) and `StartDeployment_accepts_when_cluster_topology_is_valid` (Hot + two enabled nodes → `Accepted`, no topology error, row inserted) pin the wiring; the rejecting test was confirmed red against the unwired handler before the fix. ControlPlane.Tests 62/62 green; the existing `DraftValidatorTests` §"ValidateClusterTopology" (Configuration.Tests 103/103) unchanged and still green.
### Configuration-014