Files
lmxopcua/code-reviews/Cluster/findings.md
T
Joseph Doherty b1946194d6 review(Cluster): record findings + fix snapshot consistency, dispose, stale docs
Code review at HEAD 7286d320. Cluster-001 (SeedFromCurrentState reads from one
snapshot), Cluster-003 (HoconLoader double-dispose), Cluster-004 (stale akka.conf
header), Cluster-005 (ServiceLevelCalculator tests added to Cluster.Tests). Cluster-002
deferred (no production caller).
2026-06-19 10:22:59 -04:00

6.1 KiB

Code Review — Cluster

Field Value
Module src/Core/ZB.MOM.WW.OtOpcUa.Cluster
Reviewer Claude Code
Review date 2026-06-19
Commit reviewed 7286d320
Status Reviewed
Open findings 0

Checklist coverage

# Category Result
1 Correctness & logic bugs Cluster-001 (double-snapshot in seed — resolved), Cluster-002 (stale-member lag — deferred)
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety No issues found — all mutable state protected by _lock; event raised outside lock
4 Error handling & resilience No issues found
5 Security No issues found
6 Performance & resource management Cluster-003 (redundant double-dispose in HoconLoader — resolved)
7 Design-document adherence Cluster-004 (stale akka.conf header comments — resolved)
8 Code organization & conventions No issues found
9 Testing coverage Cluster-005 (ServiceLevelCalculatorTests in wrong project — resolved)
10 Documentation & comments Cluster-004 (stale akka.conf header comments — resolved)

Findings

Cluster-001

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Cluster/ClusterRoleInfo.cs:100-105
Status Resolved

Description: SeedFromCurrentState takes a snapshot of _cluster.State into snapshot at line 89, then iterates snapshot.Members for roles but calls _cluster.State.RoleLeader(role) at line 102 — a second, live call to the cluster state. Between these two calls the cluster state could advance. This yields a _roleLeaders map seeded from two inconsistent views: members from one snapshot, leaders from another. In the worst case the leader address is not in snapshot.Members, so snapshot.Members.FirstOrDefault(m => m.Address == leaderAddr) returns null and the role is seeded as leaderless. The first RoleLeaderChanged event self-corrects.

Recommendation: Replace _cluster.State.RoleLeader(role) with snapshot.RoleLeader(role). CurrentClusterState exposes RoleLeader(string) for exactly this purpose.

Resolution: Fixed 2026-06-19. Changed line 102 to call snapshot.RoleLeader(role) so the entire seed reads from one consistent state object. A unit-level regression test requires a live Akka cluster (TestKit); the fix is verified by code inspection + the module building clean. Cross-cluster regression coverage lives in Host.IntegrationTests.


Cluster-002

Field Value
Severity Low
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Cluster/ClusterRoleInfo.cs:110-133
Status Deferred

Description: HandleMemberEvent only adds members on MemberUp and removes on MemberRemoved. MemberLeft, MemberExited, MemberWeaklyUp, and MemberDowned (all IMemberEvent implementors) are silently dropped by the switch fall-through default. A member that is gracefully leaving stays in _membersByRole until the Removed transition (several seconds). Currently MembersWithRole has no production callers so this causes no observable misbehaviour, but if a future caller needs leaving-aware membership it will silently receive stale data.

Recommendation: Add cases for MemberLeft, MemberExited, and MemberDowned that remove the member from _membersByRole immediately.

Resolution: Deferred — MembersWithRole has no production callers today; stale window is bounded and self-corrects on MemberRemoved. Revisit when a caller is added.


Cluster-003

Field Value
Severity Low
Category Performance & resource management
Location src/Core/ZB.MOM.WW.OtOpcUa.Cluster/HoconLoader.cs:12-16
Status Resolved

Description: LoadBaseConfig has two nested using scopes — one for stream and one for reader. StreamReader.Dispose closes its underlying stream; the outer using var stream then disposes the same UnmanagedMemoryStream a second time. Double-disposal is safe for managed streams, but the pattern is confusing and misrepresents ownership: StreamReader should exclusively own the stream.

Recommendation: Remove the outer using var stream; inline the null-guard into the StreamReader constructor call.

Resolution: Fixed 2026-06-19. Removed the redundant outer using var stream; the StreamReader now exclusively owns the stream lifetime. Existing HoconLoaderTests remain green.


Cluster-004

Field Value
Severity Low
Category Documentation & comments
Location src/Core/ZB.MOM.WW.OtOpcUa.Cluster/Resources/akka.conf:4,7
Status Resolved

Description: The akka.conf header contains two stale references:

  1. Line 4 references AkkaHostedService.cs which was deleted in commit d6fac2d when the bootstrap migrated to ServiceCollectionExtensions.WithOtOpcUaClusterBootstrap + Akka.Hosting.
  2. Line 7 attributes the tuning to ScadaLink, a different project unrelated to this codebase.

Recommendation: Update the header to reference WithOtOpcUaClusterBootstrap and remove the ScadaLink attribution.

Resolution: Fixed 2026-06-19. Updated akka.conf header to reference ServiceCollectionExtensions.WithOtOpcUaClusterBootstrap; removed stale ScadaLink attribution.


Cluster-005

Field Value
Severity Low
Category Testing coverage
Location tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ServiceLevelCalculatorTests.cs
Status Resolved

Description: ServiceLevelCalculator was moved from ControlPlane to Cluster.Redundancy (per docs/Redundancy.md), but ServiceLevelCalculatorTests remained in ControlPlane.Tests. The Cluster.Tests project has no direct coverage of the module's most complex logic.

Recommendation: Move the tests to Cluster.Tests. As the ControlPlane.Tests file is outside this review scope, add equivalent tests directly in Cluster.Tests.

Resolution: Fixed 2026-06-19. Added ServiceLevelCalculatorTests to Cluster.Tests covering all tiers. The ControlPlane.Tests copy is left in place (out of scope).