Phase 6.1 Stream D - LiteDB generation-sealed cache + ResilientConfigReader + UsingStaleConfig flag #81

Merged
dohertj2 merged 1 commits from phase-6-1-stream-d-litedb-sealed-cache into v2 2026-04-19 08:35:34 -04:00
Owner

Closes Stream D per docs/v2/implementation/phase-6-1-resilience-and-observability.md.

Summary

  • D.1GenerationSealedCache writes <cache-root>/<clusterId>/<generationId>.db as a read-only LiteDB file per generation. Atomic CURRENT pointer via temp-file + File.Replace. Prior generations preserved on disk for audit.
  • D.2ResilientConfigReader wraps a central-DB fetch with timeout 2s → retry N× jittered → fallback to sealed cache. Cancellation never retried. Write-path is expected to bypass this wrapper + fail hard.
  • D.3StaleConfigFlag (Volatile.Read/Write bool). MarkStale on cache fallback, MarkFresh on central-DB success. Surface hook already wired into /healthz body in Stream C; Admin /hosts wires in Stream E.
  • D.4 — Three core scenarios covered by tests:
    • SQL-kill serves sealed snapshot + flips UsingStaleConfig true.
    • Corrupt sealed file / missing file / corrupt pointer → all fail closed with GenerationCacheUnavailableException; mixed-generation reads are structurally impossible.
    • First-boot no-snapshot → throws immediately (InitializeAsync for a driver fails with a clear config-DB-required error; caller doesn’t see silent degradation).

Test plan

  • 17 new tests: 10 GenerationSealedCache (round-trip / sealed-file-is-readonly / pointer-advances / prior-gen-preserved / 3 corruption scenarios / idempotent reseal / cluster isolation), 4 ResilientConfigReader (success / retry-to-cache / no-cache-throws / cancellation-not-retried), 3 StaleConfigFlag (default / toggle / concurrent).
  • Full solution dotnet test: 1033 passing (baseline 906, +127 for Phase 6.1).
  • Follow-up: wire ResilientConfigReader + sealed-cache write hook into the real Configuration read paths + sp_PublishGeneration — lands in Stream E / Admin refresh.

🤖 Generated with Claude Code

Closes Stream D per docs/v2/implementation/phase-6-1-resilience-and-observability.md. ## Summary - **D.1** — `GenerationSealedCache` writes `<cache-root>/<clusterId>/<generationId>.db` as a read-only LiteDB file per generation. Atomic `CURRENT` pointer via temp-file + `File.Replace`. Prior generations preserved on disk for audit. - **D.2** — `ResilientConfigReader` wraps a central-DB fetch with timeout 2s → retry N× jittered → fallback to sealed cache. Cancellation never retried. Write-path is expected to bypass this wrapper + fail hard. - **D.3** — `StaleConfigFlag` (Volatile.Read/Write bool). MarkStale on cache fallback, MarkFresh on central-DB success. Surface hook already wired into /healthz body in Stream C; Admin /hosts wires in Stream E. - **D.4** — Three core scenarios covered by tests: - SQL-kill serves sealed snapshot + flips UsingStaleConfig true. - Corrupt sealed file / missing file / corrupt pointer → all fail closed with `GenerationCacheUnavailableException`; mixed-generation reads are structurally impossible. - First-boot no-snapshot → throws immediately (InitializeAsync for a driver fails with a clear config-DB-required error; caller doesn’t see silent degradation). ## Test plan - [x] 17 new tests: 10 GenerationSealedCache (round-trip / sealed-file-is-readonly / pointer-advances / prior-gen-preserved / 3 corruption scenarios / idempotent reseal / cluster isolation), 4 ResilientConfigReader (success / retry-to-cache / no-cache-throws / cancellation-not-retried), 3 StaleConfigFlag (default / toggle / concurrent). - [x] Full solution `dotnet test`: 1033 passing (baseline 906, +127 for Phase 6.1). - [ ] Follow-up: wire `ResilientConfigReader` + sealed-cache write hook into the real Configuration read paths + `sp_PublishGeneration` — lands in Stream E / Admin refresh. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
dohertj2 added 1 commit 2026-04-19 08:35:24 -04:00
Closes Stream D per docs/v2/implementation/phase-6-1-resilience-and-observability.md.

New Configuration.LocalCache types (alongside the existing single-file
LiteDbConfigCache):

- GenerationSealedCache — file-per-generation sealed snapshots per decision
  #148. Each SealAsync writes <cache-root>/<clusterId>/<generationId>.db as a
  read-only LiteDB file, then atomically publishes the CURRENT pointer via
  temp-file + File.Replace. Prior-generation files stay on disk for audit.
  Mixed-generation reads are structurally impossible: ReadCurrentAsync opens
  the single file named by CURRENT. Corruption of the pointer or the sealed
  file raises GenerationCacheUnavailableException — fails closed, never falls
  back silently to an older generation. TryGetCurrentGenerationId returns the
  pointer value or null for diagnostics.

- StaleConfigFlag — thread-safe (Volatile.Read/Write) bool. MarkStale when a
  read fell back to the cache; MarkFresh when a central-DB read succeeded.
  Surfaced on /healthz body and Admin /hosts (Stream C wiring already in
  place).

- ResilientConfigReader — wraps a central-DB fetch function with the Stream
  D.2 pipeline: timeout 2 s → retry N× jittered (skipped when retryCount=0) →
  fallback to the sealed cache. Toggles StaleConfigFlag per outcome. Read path
  only — the write path is expected to bypass this wrapper and fail hard on
  DB outage so inconsistent writes never land. Cancellation passes through
  and is NOT retried.

Configuration.csproj:
- Polly.Core 8.6.6 + Microsoft.Extensions.Logging.Abstractions added.

Tests (17 new, all pass):
- GenerationSealedCacheTests (10): first-boot-no-snapshot throws
  GenerationCacheUnavailableException (D.4 scenario C), seal-then-read round
  trip, sealed file is ReadOnly on disk, pointer advances to latest, prior
  generation file preserved, corrupt sealed file fails closed, missing sealed
  file fails closed, corrupt pointer fails closed (D.4 scenario B), same
  generation sealed twice is idempotent, independent clusters don't
  interfere.
- ResilientConfigReaderTests (4): central-DB success returns value + marks
  fresh; central-DB failure exhausts retries + falls back to cache + marks
  stale (D.4 scenario A); central-DB + cache both unavailable throws;
  cancellation not retried.
- StaleConfigFlagTests (3): default is fresh; toggles; concurrent writes
  converge.

Full solution dotnet test: 1033 passing (baseline 906, +127 net across Phase
6.1 Streams A/B/C/D). Pre-existing Client.CLI Subscribe flake unchanged.

Integration into Configuration read paths (DriverInstance enumeration,
LdapGroupRoleMapping fetches, etc.) + the sp_PublishGeneration hook that
writes sealed files lands in the Phase 6.1 Stream E / Admin-refresh PR where
the DB integration surfaces are already touched. Existing LiteDbConfigCache
continues serving its single-file role for the NodeBootstrap path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dohertj2 merged commit 8d81715079 into v2 2026-04-19 08:35:34 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/lmxopcua#81