Phase 6.1 Stream B (partial) - Tier registry invariant + MemoryTracking with hybrid formula #79

Merged
dohertj2 merged 2 commits from phase-6-1-stream-b-stability into v2 2026-04-19 08:05:04 -04:00
Owner

Lands Stream B.1 + B.2 of Phase 6.1. MemoryRecycle (B.3), ScheduledRecycleScheduler (B.4), WedgeDetector (B.5) follow in subsequent commits on this branch.

Summary

  • B.1DriverTypeMetadata gains a required DriverTier field. Every registered driver type must declare its tier so MemoryTracking, resilience policies, and the upcoming MemoryRecycle/WedgeDetector resolve the right defaults. Compliance check "every driver type has a non-null Tier" is now structurally impossible to fail.
  • B.2Core.Stability.MemoryTracking per decision #146: captures post-init baseline as median of 5-min warmup samples, then classifies against soft = max(multiplier × baseline, baseline + floor) with hard = 2 × soft. Per-tier constants: A mult=3 floor=50 MB, B mult=3 floor=100 MB, C mult=2 floor=500 MB. Never kills — HardBreach just returns a signal; the process-level recycle action is Tier C only and lives in the separate MemoryRecycle (decisions #74, #145).

Test plan

  • 15 new unit tests pass: warmup-phase returns Warming until window elapses; median-baseline capture on transition; per-tier constants match decision table; floor-wins vs multiplier-wins for soft threshold; hard = 2 × soft; per-threshold classification (below/at/above).
  • Registry: theory asserts A/B/C round-trip.
  • Full solution dotnet test: 963 passing (baseline 906, +57 net across Phase 6.1 so far).
  • B.3: MemoryRecycle — Tier C supervisor hook.
  • B.4: ScheduledRecycleScheduler — weekly cron for Tier C.
  • B.5: Demand-aware WedgeDetector.

🤖 Generated with Claude Code

Lands Stream B.1 + B.2 of Phase 6.1. MemoryRecycle (B.3), ScheduledRecycleScheduler (B.4), WedgeDetector (B.5) follow in subsequent commits on this branch. ## Summary - **B.1** — `DriverTypeMetadata` gains a required `DriverTier` field. Every registered driver type must declare its tier so MemoryTracking, resilience policies, and the upcoming MemoryRecycle/WedgeDetector resolve the right defaults. Compliance check "every driver type has a non-null Tier" is now structurally impossible to fail. - **B.2** — `Core.Stability.MemoryTracking` per decision #146: captures post-init baseline as median of 5-min warmup samples, then classifies against `soft = max(multiplier × baseline, baseline + floor)` with `hard = 2 × soft`. Per-tier constants: A mult=3 floor=50 MB, B mult=3 floor=100 MB, C mult=2 floor=500 MB. **Never kills** — HardBreach just returns a signal; the process-level recycle action is Tier C only and lives in the separate MemoryRecycle (decisions #74, #145). ## Test plan - [x] 15 new unit tests pass: warmup-phase returns Warming until window elapses; median-baseline capture on transition; per-tier constants match decision table; floor-wins vs multiplier-wins for soft threshold; hard = 2 × soft; per-threshold classification (below/at/above). - [x] Registry: theory asserts A/B/C round-trip. - [x] Full solution `dotnet test`: 963 passing (baseline 906, +57 net across Phase 6.1 so far). - [ ] B.3: MemoryRecycle — Tier C supervisor hook. - [ ] B.4: ScheduledRecycleScheduler — weekly cron for Tier C. - [ ] B.5: Demand-aware WedgeDetector. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
dohertj2 added 1 commit 2026-04-19 07:39:34 -04:00
Stream B.1 — registry invariant:
- DriverTypeMetadata gains a required `DriverTier Tier` field. Every registered
  driver type must declare its stability tier so the downstream MemoryTracking,
  MemoryRecycle, and resilience-policy layers can resolve the right defaults.
  Stamped-at-registration-time enforcement makes the "every driver type has a
  non-null Tier" compliance check structurally impossible to fail.
- DriverTypeRegistry API unchanged; one new property on the record.

Stream B.2 — MemoryTracking (Core.Stability):
- Tier-agnostic tracker per decision #146: captures baseline as the median of
  samples collected during a post-init warmup window (default 5 min), then
  classifies each subsequent sample with the hybrid formula
  `soft = max(multiplier × baseline, baseline + floor)`, `hard = 2 × soft`.
- Per-tier constants wired: Tier A mult=3 floor=50 MB, Tier B mult=3 floor=100 MB,
  Tier C mult=2 floor=500 MB.
- Never kills. Hard-breach action returns HardBreach; the supervisor that acts
  on that signal (MemoryRecycle) is Tier C only per decisions #74, #145 and
  lands in the next B.3 commit on this branch.
- Two phases: WarmingUp (samples collected, Warming returned) and Steady
  (baseline captured, soft/hard checks active). Transition is automatic when
  the warmup window elapses.

Tests (15 new, all pass):
- Warming phase returns Warming until the window elapses.
- Window-elapsed captures median baseline + transitions to Steady.
- Per-tier constants match decision #146 table exactly.
- Soft threshold uses max() — small baseline → floor wins; large baseline →
  multiplier wins.
- Hard = 2 × soft.
- Sample below soft = None; at soft = SoftBreach; at/above hard = HardBreach.
- DriverTypeRegistry: theory asserts Tier round-trips for A/B/C.

Full solution dotnet test: 963 passing (baseline 906, +57 net for Phase 6.1
Stream A + Stream B.1/B.2). Pre-existing Client.CLI Subscribe flake unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dohertj2 added 1 commit 2026-04-19 08:04:53 -04:00
Closes out Stream B per docs/v2/implementation/phase-6-1-resilience-and-observability.md.

Core.Abstractions:
- IDriverSupervisor — process-level supervisor contract a Tier C driver's
  out-of-process topology provides (Galaxy Proxy/Supervisor implements this in
  a follow-up Driver.Galaxy wiring PR). Concerns: DriverInstanceId + RecycleAsync.
  Tier A/B drivers don't implement this; Stream B code asserts tier == C before
  ever calling it.

Core.Stability:
- MemoryRecycle — companion to MemoryTracking. On HardBreach, invokes the
  supervisor IFF tier == C AND a supervisor is wired. Tier A/B HardBreach logs
  a promotion-to-Tier-C recommendation and returns false. Soft/None/Warming
  never triggers a recycle at any tier.
- ScheduledRecycleScheduler — Tier C opt-in periodic recycler per decision #67.
  Ctor throws for Tier A/B (structural guard — scheduled recycle on an
  in-process driver would kill every OPC UA session and every co-hosted
  driver). TickAsync(now) advances the schedule by one interval per fire;
  RequestRecycleNowAsync drives an ad-hoc recycle without shifting the cron.
- WedgeDetector — demand-aware per decision #147. Classify(state, demand, now)
  returns:
    * NotApplicable when driver state != Healthy
    * Idle when Healthy + no pending work (bulkhead=0 && monitored=0 && historic=0)
    * Healthy when Healthy + pending work + progress within threshold
    * Faulted when Healthy + pending work + no progress within threshold
  Threshold clamps to min 60 s. DemandSignal.HasPendingWork ORs the three counters.
  The three false-wedge cases the plan calls out all stay Healthy: idle
  subscription-only, slow historian backfill making progress, write-only burst
  with drained bulkhead.

Tests (22 new, all pass):
- MemoryRecycleTests (7): Tier C hard-breach requests recycle; Tier A/B
  hard-breach never requests; Tier C without supervisor no-ops; soft-breach
  at every tier never requests; None/Warming never request.
- ScheduledRecycleSchedulerTests (6): ctor throws for A/B; zero/negative
  interval throws; tick before due no-ops; tick at/after due fires once and
  advances; RequestRecycleNow fires immediately without shifting schedule;
  multiple fires across ticks advance one interval each.
- WedgeDetectorTests (9): threshold clamp to 60 s; unhealthy driver always
  NotApplicable; idle subscription stays Idle; pending+fresh progress stays
  Healthy; pending+stale progress is Faulted; MonitoredItems active but no
  publish is Faulted; MonitoredItems active with fresh publish stays Healthy;
  historian backfill with fresh progress stays Healthy; write-only burst with
  empty bulkhead is Idle; HasPendingWork theory for any non-zero counter.

Full solution dotnet test: 989 passing (baseline 906, +83 for Phase 6.1 so far).
Pre-existing Client.CLI Subscribe flake unchanged.

Stream B complete. Next up: Stream C (health endpoints + structured logging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dohertj2 merged commit 6b3a67fd9e into v2 2026-04-19 08:05:04 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/lmxopcua#79