Closes Stream D per docs/v2/implementation/phase-6-1-resilience-and-observability.md.
New Configuration.LocalCache types (alongside the existing single-file
LiteDbConfigCache):
- GenerationSealedCache — file-per-generation sealed snapshots per decision
#148. Each SealAsync writes <cache-root>/<clusterId>/<generationId>.db as a
read-only LiteDB file, then atomically publishes the CURRENT pointer via
temp-file + File.Replace. Prior-generation files stay on disk for audit.
Mixed-generation reads are structurally impossible: ReadCurrentAsync opens
the single file named by CURRENT. Corruption of the pointer or the sealed
file raises GenerationCacheUnavailableException — fails closed, never falls
back silently to an older generation. TryGetCurrentGenerationId returns the
pointer value or null for diagnostics.
- StaleConfigFlag — thread-safe (Volatile.Read/Write) bool. MarkStale when a
read fell back to the cache; MarkFresh when a central-DB read succeeded.
Surfaced on /healthz body and Admin /hosts (Stream C wiring already in
place).
- ResilientConfigReader — wraps a central-DB fetch function with the Stream
D.2 pipeline: timeout 2 s → retry N× jittered (skipped when retryCount=0) →
fallback to the sealed cache. Toggles StaleConfigFlag per outcome. Read path
only — the write path is expected to bypass this wrapper and fail hard on
DB outage so inconsistent writes never land. Cancellation passes through
and is NOT retried.
Configuration.csproj:
- Polly.Core 8.6.6 + Microsoft.Extensions.Logging.Abstractions added.
Tests (17 new, all pass):
- GenerationSealedCacheTests (10): first-boot-no-snapshot throws
GenerationCacheUnavailableException (D.4 scenario C), seal-then-read round
trip, sealed file is ReadOnly on disk, pointer advances to latest, prior
generation file preserved, corrupt sealed file fails closed, missing sealed
file fails closed, corrupt pointer fails closed (D.4 scenario B), same
generation sealed twice is idempotent, independent clusters don't
interfere.
- ResilientConfigReaderTests (4): central-DB success returns value + marks
fresh; central-DB failure exhausts retries + falls back to cache + marks
stale (D.4 scenario A); central-DB + cache both unavailable throws;
cancellation not retried.
- StaleConfigFlagTests (3): default is fresh; toggles; concurrent writes
converge.
Full solution dotnet test: 1033 passing (baseline 906, +127 net across Phase
6.1 Streams A/B/C/D). Pre-existing Client.CLI Subscribe flake unchanged.
Integration into Configuration read paths (DriverInstance enumeration,
LdapGroupRoleMapping fetches, etc.) + the sp_PublishGeneration hook that
writes sealed files lands in the Phase 6.1 Stream E / Admin-refresh PR where
the DB integration surfaces are already touched. Existing LiteDbConfigCache
continues serving its single-file role for the NodeBootstrap path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DriverHostState enum lives in Configuration.Enums/ rather than reusing Core.Abstractions.HostState so the Configuration project stays free of driver-runtime dependencies (it's referenced by both the Admin process and the Server process, so pulling in the driver-abstractions assembly to every Admin build would be unnecessary weight). The server-side publisher hosted service (follow-up PR 34) will translate HostStatusChangedEventArgs.NewState to this enum on every transition.
No foreign key to ClusterNode — a Server may start reporting host status before its ClusterNode row exists (first-boot bootstrap), and we'd rather keep the status row than drop it. The Admin-side service that renders the dashboard will left-join on NodeId when presenting. Two indexes declared: IX_DriverHostStatus_Node drives the per-cluster drill-down (Admin UI joins ClusterNode on ClusterId to pick which NodeIds to fetch), IX_DriverHostStatus_LastSeen drives the stale-row query (now - LastSeen > threshold).
EF migration AddDriverHostStatus creates the table + PK + both indexes. Model snapshot updated. SchemaComplianceTests expected-tables list extended. DriverHostStatusTests (3 new cases, category SchemaCompliance, uses the shared fixture DB): composite key allows same (host, driver) across different nodes AND same (node, host) across different drivers — both real-world cases the publisher needs to support; upsert-in-place pattern (fetch-by-composite-PK, mutate, save) produces one row not two — the pattern the publisher will use; State enum persists as string not int — reading the DB via ADO.NET returns 'Faulted' not '3'.
Configuration.Tests SchemaCompliance suite: 10 pass / 0 fail (7 prior + 3 new). Configuration build clean. No Server or Admin code changes yet — publisher + /hosts page are PR 34.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>