1aa7905676
Review at HEAD 7286d320. ControlPlane-001 (Medium): ConfigPublishCoordinator.HandleAck
now discards acks from nodes not in _expectedAcks (prevented premature SealDeployment) +
regression test. -002 (flipped-node log count), -003 (redundant mapper arms) tidied.
110 lines
5.1 KiB
Markdown
110 lines
5.1 KiB
Markdown
# Code Review — ControlPlane
|
|
|
|
<!-- Template for a per-module findings file. Copy to code-reviews/<Module>/findings.md.
|
|
See ../../REVIEW-PROCESS.md for the full process. The base README.md is generated
|
|
from these files by regen-readme.py — do not edit README.md by hand. -->
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-06-19 |
|
|
| Commit reviewed | `7286d320` |
|
|
| Status | Reviewed |
|
|
| Open findings | 0 |
|
|
|
|
## Checklist coverage
|
|
|
|
A comprehensive review completes every category, recording "No issues found" where
|
|
a category produced nothing rather than leaving it blank.
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | ControlPlane-001 (Resolved) |
|
|
| 2 | OtOpcUa conventions | No issues found |
|
|
| 3 | Concurrency & thread safety | No issues found |
|
|
| 4 | Error handling & resilience | No issues found |
|
|
| 5 | Security | No issues found |
|
|
| 6 | Performance & resource management | No issues found |
|
|
| 7 | Design-document adherence | No issues found |
|
|
| 8 | Code organization & conventions | ControlPlane-002 (Resolved), ControlPlane-003 (Resolved) |
|
|
| 9 | Testing coverage | No issues found |
|
|
| 10 | Documentation & comments | No issues found |
|
|
|
|
## Findings
|
|
|
|
<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
|
|
never reused. Findings are never deleted — close them by changing Status and
|
|
completing Resolution. -->
|
|
|
|
### ControlPlane-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Coordinators/ConfigPublishCoordinator.cs:154` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `HandleAck` accepts `ApplyAck` messages from any node, including nodes that were
|
|
not in `_expectedAcks` at dispatch time. When an unexpected node's ack is added to `_acks`,
|
|
`_acks.Count` can reach `_expectedAcks.Count` before every expected node has actually responded.
|
|
The count-based gate on line 166 (`if (_acks.Count < _expectedAcks.Count) return;`) then fires
|
|
prematurely, causing `SealDeployment()` to run while one or more expected nodes have not yet
|
|
applied the deployment. In production, unexpected acks are unlikely (only `DriverHostActor`
|
|
publishes to the `deployment-acks` topic) but the guard is missing.
|
|
|
|
**Recommendation:** Add a membership check at the start of the processing path in `HandleAck`
|
|
to discard any ack whose `NodeId` is not in `_expectedAcks`, logging it at Debug level for
|
|
diagnostics.
|
|
|
|
**Resolution:** Fixed 2026-06-19 (commit SHA pending). Added `if (!_expectedAcks.Contains(msg.NodeId))` guard at the top of the processing path in `HandleAck`, discarding unexpected-node acks before they are counted. Regression test `ApplyAck_from_unexpected_node_is_discarded_and_does_not_seal_prematurely` added to `ConfigPublishCoordinatorTests` (proven failing before fix, green after).
|
|
|
|
---
|
|
|
|
### ControlPlane-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Fleet/FleetStatusBroadcaster.cs:139` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `OnTick` uses a `bool changed` flag but passes it to the debug-log message
|
|
via `changed ? 1 : 0`. The `{Count}` placeholder in the message implies a numeric count of
|
|
flipped nodes, but the expression always yields `1` (never the true count) whenever any node
|
|
flips. Operators reading the log will see "flipped 1 nodes to Degraded" even when multiple
|
|
nodes flipped simultaneously during a cluster partition recovery.
|
|
|
|
**Recommendation:** Replace the `bool changed` flag with an `int flippedCount` counter
|
|
incremented inside the loop, and log `flippedCount` directly.
|
|
|
|
**Resolution:** Fixed 2026-06-19 (commit SHA pending). Replaced `bool changed` with `int flippedCount`, incremented per flipped node, and logged directly. All 60 tests pass.
|
|
|
|
---
|
|
|
|
### ControlPlane-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditOutcomeMapper.cs:29` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `FromAction` lists ten explicit `AuditOutcome.Success` arms that are all
|
|
shadowed by the `_ => AuditOutcome.Success` default. The explicit arms add no behavioural
|
|
differentiation and create an ongoing maintenance obligation: any new config-write verb added
|
|
to the codebase silently falls through to the correct default anyway, making the explicit list
|
|
misleading as a whitelist. The comment block documenting the known verbs is a better vehicle
|
|
for the enumeration than code.
|
|
|
|
**Recommendation:** Remove the redundant explicit Success arms and document the known success
|
|
verbs in a comment on the default arm instead. The two Denied arms must stay.
|
|
|
|
**Resolution:** Fixed 2026-06-19 (commit SHA pending). Removed the ten redundant explicit
|
|
Success arms; replaced with an explanatory comment listing the known success verbs on the
|
|
default `_ => AuditOutcome.Success` arm. All existing AuditOutcomeMapper tests still pass
|
|
(the `Outcome_is_derived_from_the_action_vocabulary` theory covers every listed verb).
|