Phase 6.3 Stream B + D core - ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry #89

Merged
dohertj2 merged 1 commits from phase-6-3-stream-b-service-level into v2 2026-04-19 09:58:34 -04:00
Owner

Pure-logic heart of Phase 6.3. OPC UA node wiring (C), topology loader (A), Admin UI (E), client interop tests (F) are follow-up tasks #145-150.

Summary

  • ServiceLevelCalculator — pure 8-state matrix per decision #154. Maintenance=0 / NoData=1 / InvalidTopology=2 / RecoveringBackup=30 / BackupMidApply=50 / IsolatedBackup=80 / AuthoritativeBackup=100 / RecoveringPrimary=180 / PrimaryMidApply=200 / IsolatedPrimary=230 / AuthoritativePrimary=255. Isolated-Backup at 80 does NOT auto-promote (non-transparent model).
  • ServiceLevelBand enum labels every band for logs + Admin UI.
  • RecoveryStateManager — holds Recovering band until (60s dwell) AND (publish witness). Re-fault resets.
  • ApplyLeaseRegistry — keyed on (GenerationId, PublishRequestId) per decision #162. BeginApplyLease returns IAsyncDisposable; closes on every exit path including exception + dispose-twice. Watchdog PruneStale closes leases older than ApplyMaxDuration (10m default) so ServiceLevel can’t stick at mid-apply after a crashed publisher.

Test plan

  • 40 new tests: 27 calculator (reserved bands override; every operational band; apply dominates peer-unreachable; standalone path; classify round-trip), 6 recovery (dwell + witness gates + re-fault reset), 7 apply-lease (dispose on exception; concurrent isolation; watchdog closes stale; watchdog leaves recent alone).
  • Full solution dotnet test: 1137 passing (Phase 6.2 = 1097, Phase 6.3 B+D core = +40).

🤖 Generated with Claude Code

Pure-logic heart of Phase 6.3. OPC UA node wiring (C), topology loader (A), Admin UI (E), client interop tests (F) are follow-up tasks #145-150. ## Summary - **ServiceLevelCalculator** — pure 8-state matrix per decision #154. Maintenance=0 / NoData=1 / InvalidTopology=2 / RecoveringBackup=30 / BackupMidApply=50 / IsolatedBackup=80 / AuthoritativeBackup=100 / RecoveringPrimary=180 / PrimaryMidApply=200 / IsolatedPrimary=230 / AuthoritativePrimary=255. Isolated-Backup at 80 does NOT auto-promote (non-transparent model). - **ServiceLevelBand** enum labels every band for logs + Admin UI. - **RecoveryStateManager** — holds Recovering band until (60s dwell) AND (publish witness). Re-fault resets. - **ApplyLeaseRegistry** — keyed on (GenerationId, PublishRequestId) per decision #162. BeginApplyLease returns IAsyncDisposable; closes on every exit path including exception + dispose-twice. Watchdog PruneStale closes leases older than ApplyMaxDuration (10m default) so ServiceLevel can’t stick at mid-apply after a crashed publisher. ## Test plan - [x] 40 new tests: 27 calculator (reserved bands override; every operational band; apply dominates peer-unreachable; standalone path; classify round-trip), 6 recovery (dwell + witness gates + re-fault reset), 7 apply-lease (dispose on exception; concurrent isolation; watchdog closes stale; watchdog leaves recent alone). - [x] Full solution `dotnet test`: 1137 passing (Phase 6.2 = 1097, Phase 6.3 B+D core = +40). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
dohertj2 added 1 commit 2026-04-19 09:58:25 -04:00
Lands the pure-logic heart of Phase 6.3. OPC UA node wiring (Stream C),
RedundancyCoordinator topology loader (Stream A), Admin UI + metrics (Stream E),
and client interop tests (Stream F) are follow-up work — tracked as
tasks #145-150.

New Server.Redundancy sub-namespace:

- ServiceLevelCalculator — pure 8-state matrix per decision #154. Inputs:
  role, selfHealthy, peerUa/HttpHealthy, applyInProgress, recoveryDwellMet,
  topologyValid, operatorMaintenance. Output: OPC UA Part 5 §6.3.34 Byte.
  Reserved bands (0=Maintenance, 1=NoData, 2=InvalidTopology) override
  everything; operational bands occupy 30..255.
  Key invariants:
    * Authoritative-Primary = 255, Authoritative-Backup = 100.
    * Isolated-Primary = 230 (retains authority with peer down).
    * Isolated-Backup = 80 (does NOT auto-promote — non-transparent model).
    * Primary-Mid-Apply = 200, Backup-Mid-Apply = 50; apply dominates
      peer-unreachable per Stream C.4 integration expectation.
    * Recovering-Primary = 180, Recovering-Backup = 30.
    * Standalone treats healthy as Authoritative-Primary (no peer concept).
- ServiceLevelBand enum — labels every numeric band for logs + Admin UI.
  Values match the calculator table exactly; compliance script asserts
  drift detection.
- RecoveryStateManager — holds Recovering band until (dwell ≥ 60s default)
  AND (one publish witness observed). Re-fault resets both gates so a
  flapping node doesn't shortcut through recovery twice.
- ApplyLeaseRegistry — keyed on (ConfigGenerationId, PublishRequestId) per
  decision #162. BeginApplyLease returns an IAsyncDisposable so every exit
  path (success, exception, cancellation, dispose-twice) closes the lease.
  ApplyMaxDuration watchdog (10 min default) via PruneStale tick forces
  close after a crashed publisher so ServiceLevel can't stick at mid-apply.

Tests (40 new, all pass):
- ServiceLevelCalculatorTests (27): reserved bands override; self-unhealthy
  → NoData; invalid topology demotes both nodes to 2; authoritative primary
  255; backup 100; isolated primary 230 retains authority; isolated backup
  80 does not promote; http-only unreachable triggers isolated; mid-apply
  primary 200; mid-apply backup 50; apply dominates peer-unreachable; recovering
  primary 180; recovering backup 30; standalone treats healthy as 255;
  classify round-trips every band including Unknown sentinel.
- RecoveryStateManagerTests (6): never-faulted auto-meets dwell; faulted-only
  returns true (semantics-doc test — coordinator short-circuits on
  selfHealthy=false); recovered without witness never meets; witness without
  dwell never meets; witness + dwell-elapsed meets; re-fault resets.
- ApplyLeaseRegistryTests (7): empty registry not-in-progress; begin+dispose
  closes; dispose on exception still closes; dispose twice safe; concurrent
  leases isolated; watchdog closes stale; watchdog leaves recent alone.

Full solution dotnet test: 1137 passing (Phase 6.2 shipped at 1097, Phase 6.3
B + D core = +40 = 1137). Pre-existing Client.CLI Subscribe flake unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dohertj2 merged commit eb3625b327 into v2 2026-04-19 09:58:34 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/lmxopcua#89