review(Cluster): record findings + fix snapshot consistency, dispose, stale docs
Code review at HEAD 7286d320. Cluster-001 (SeedFromCurrentState reads from one
snapshot), Cluster-003 (HoconLoader double-dispose), Cluster-004 (stale akka.conf
header), Cluster-005 (ServiceLevelCalculator tests added to Cluster.Tests). Cluster-002
deferred (no production caller).
This commit is contained in:
@@ -0,0 +1,144 @@
|
||||
# Code Review — Cluster
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Cluster` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-06-19 |
|
||||
| Commit reviewed | `7286d320` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Cluster-001 (double-snapshot in seed — resolved), Cluster-002 (stale-member lag — deferred) |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | No issues found — all mutable state protected by `_lock`; event raised outside lock |
|
||||
| 4 | Error handling & resilience | No issues found |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Cluster-003 (redundant double-dispose in `HoconLoader` — resolved) |
|
||||
| 7 | Design-document adherence | Cluster-004 (stale `akka.conf` header comments — resolved) |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Cluster-005 (`ServiceLevelCalculatorTests` in wrong project — resolved) |
|
||||
| 10 | Documentation & comments | Cluster-004 (stale `akka.conf` header comments — resolved) |
|
||||
|
||||
## Findings
|
||||
|
||||
### Cluster-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/ClusterRoleInfo.cs:100-105` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `SeedFromCurrentState` takes a snapshot of `_cluster.State` into `snapshot` at
|
||||
line 89, then iterates `snapshot.Members` for roles but calls `_cluster.State.RoleLeader(role)` at
|
||||
line 102 — a second, live call to the cluster state. Between these two calls the cluster state
|
||||
could advance. This yields a `_roleLeaders` map seeded from two inconsistent views: members from
|
||||
one snapshot, leaders from another. In the worst case the leader address is not in `snapshot.Members`,
|
||||
so `snapshot.Members.FirstOrDefault(m => m.Address == leaderAddr)` returns `null` and the role is
|
||||
seeded as leaderless. The first `RoleLeaderChanged` event self-corrects.
|
||||
|
||||
**Recommendation:** Replace `_cluster.State.RoleLeader(role)` with `snapshot.RoleLeader(role)`.
|
||||
`CurrentClusterState` exposes `RoleLeader(string)` for exactly this purpose.
|
||||
|
||||
**Resolution:** Fixed 2026-06-19. Changed line 102 to call `snapshot.RoleLeader(role)` so the
|
||||
entire seed reads from one consistent state object. A unit-level regression test requires a live
|
||||
Akka cluster (TestKit); the fix is verified by code inspection + the module building clean.
|
||||
Cross-cluster regression coverage lives in `Host.IntegrationTests`.
|
||||
|
||||
---
|
||||
|
||||
### Cluster-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/ClusterRoleInfo.cs:110-133` |
|
||||
| Status | Deferred |
|
||||
|
||||
**Description:** `HandleMemberEvent` only adds members on `MemberUp` and removes on
|
||||
`MemberRemoved`. `MemberLeft`, `MemberExited`, `MemberWeaklyUp`, and `MemberDowned` (all
|
||||
`IMemberEvent` implementors) are silently dropped by the `switch` fall-through default. A member
|
||||
that is gracefully leaving stays in `_membersByRole` until the `Removed` transition (several
|
||||
seconds). Currently `MembersWithRole` has no production callers so this causes no observable
|
||||
misbehaviour, but if a future caller needs leaving-aware membership it will silently receive stale
|
||||
data.
|
||||
|
||||
**Recommendation:** Add cases for `MemberLeft`, `MemberExited`, and `MemberDowned` that remove the
|
||||
member from `_membersByRole` immediately.
|
||||
|
||||
**Resolution:** Deferred — `MembersWithRole` has no production callers today; stale window is
|
||||
bounded and self-corrects on `MemberRemoved`. Revisit when a caller is added.
|
||||
|
||||
---
|
||||
|
||||
### Cluster-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/HoconLoader.cs:12-16` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `LoadBaseConfig` has two nested `using` scopes — one for `stream` and one for
|
||||
`reader`. `StreamReader.Dispose` closes its underlying stream; the outer `using var stream` then
|
||||
disposes the same `UnmanagedMemoryStream` a second time. Double-disposal is safe for managed
|
||||
streams, but the pattern is confusing and misrepresents ownership: `StreamReader` should
|
||||
exclusively own the stream.
|
||||
|
||||
**Recommendation:** Remove the outer `using var stream`; inline the null-guard into the
|
||||
`StreamReader` constructor call.
|
||||
|
||||
**Resolution:** Fixed 2026-06-19. Removed the redundant outer `using var stream`; the
|
||||
`StreamReader` now exclusively owns the stream lifetime. Existing `HoconLoaderTests` remain green.
|
||||
|
||||
---
|
||||
|
||||
### Cluster-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/Resources/akka.conf:4,7` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** The `akka.conf` header contains two stale references:
|
||||
1. Line 4 references `AkkaHostedService.cs` which was deleted in commit `d6fac2d` when the
|
||||
bootstrap migrated to `ServiceCollectionExtensions.WithOtOpcUaClusterBootstrap` + Akka.Hosting.
|
||||
2. Line 7 attributes the tuning to `ScadaLink`, a different project unrelated to this codebase.
|
||||
|
||||
**Recommendation:** Update the header to reference `WithOtOpcUaClusterBootstrap` and remove the
|
||||
`ScadaLink` attribution.
|
||||
|
||||
**Resolution:** Fixed 2026-06-19. Updated `akka.conf` header to reference
|
||||
`ServiceCollectionExtensions.WithOtOpcUaClusterBootstrap`; removed stale `ScadaLink` attribution.
|
||||
|
||||
---
|
||||
|
||||
### Cluster-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ServiceLevelCalculatorTests.cs` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `ServiceLevelCalculator` was moved from `ControlPlane` to
|
||||
`Cluster.Redundancy` (per `docs/Redundancy.md`), but `ServiceLevelCalculatorTests` remained in
|
||||
`ControlPlane.Tests`. The `Cluster.Tests` project has no direct coverage of the module's most
|
||||
complex logic.
|
||||
|
||||
**Recommendation:** Move the tests to `Cluster.Tests`. As the `ControlPlane.Tests` file is outside
|
||||
this review scope, add equivalent tests directly in `Cluster.Tests`.
|
||||
|
||||
**Resolution:** Fixed 2026-06-19. Added `ServiceLevelCalculatorTests` to `Cluster.Tests` covering
|
||||
all tiers. The `ControlPlane.Tests` copy is left in place (out of scope).
|
||||
Reference in New Issue
Block a user