IDriver.DriverInstanceId is declared as string in Core.Abstractions; keeping
the pipeline key as Guid meant every call site would need .ToString() / Guid.Parse
at the boundary. Switching the Resilience types to string removes that friction
and lets OtOpcUaServer pass driver.DriverInstanceId directly to the builder in
the upcoming server-dispatch wiring PR.
- DriverResiliencePipelineBuilder.GetOrCreate + Invalidate + PipelineKey
- CapabilityInvoker.ctor + _driverInstanceId field
Tests: all 48 Core.Tests still pass. The Invalidate test's keepId / dropId now
use distinct "drv-keep" / "drv-drop" literals (previously both were distinct
Guid.NewGuid() values, which the sed-driven refactor had collapsed to the same
literal — caught pre-commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the first chunk of the Phase 6.1 Stream A resilience layer per
docs/v2/implementation/phase-6-1-resilience-and-observability.md §Stream A.
Downstream CapabilityInvoker (A.3) + driver-dispatch wiring land in follow-up
PRs on the same branch.
Core.Abstractions additions:
- WriteIdempotentAttribute — marker for tag-definition records that opt into
auto-retry on IWritable.WriteAsync. Absence = no retry per decisions #44, #45,
#143. Read once via reflection at driver-init time; no per-write cost.
- DriverCapability enum — enumerates the 8 capability surface points
(Read / Write / Discover / Subscribe / Probe / AlarmSubscribe / AlarmAcknowledge
/ HistoryRead). AlarmAcknowledge is write-shaped (no retry by default).
- DriverTier enum — A/B/C per driver-stability.md §2-4. Stream B.1 wires this
into DriverTypeMetadata; surfaced here because the resilience policy defaults
key on it.
Core.Resilience new namespace:
- DriverResilienceOptions — per-tier × per-capability policy defaults.
GetTierDefaults(tier) is the source of truth:
* Tier A: Read 2s/3 retries, Write 2s/0 retries, breaker threshold 5
* Tier B: Read 4s/3, Write 4s/0, breaker threshold 5
* Tier C: Read 10s/1, Write 10s/0, breaker threshold 0 (supervisor handles
process-level breaker per decision #68)
Resolve(capability) overlays CapabilityPolicies on top of the defaults.
- DriverResiliencePipelineBuilder — composes Timeout → Retry (capability-
permitting, never on cancellation) → CircuitBreaker (tier-permitting) →
Bulkhead. Pipelines cached in a lock-free ConcurrentDictionary keyed on
(DriverInstanceId, HostName, DriverCapability) per decision #144 — one dead
PLC behind a multi-device driver does not open the breaker for healthy
siblings. Invalidate(driverInstanceId) supports Admin-triggered reload.
Tests (30 new, all pass):
- DriverResilienceOptionsTests: tier-default coverage for every capability,
Write + AlarmAcknowledge never retry at any tier, Tier C disables breaker,
resolve-with-override layering.
- DriverResiliencePipelineBuilderTests: Read retries transients, Write does NOT
retry on failure (decision #44 guard), dead-host isolation from sibling hosts,
pipeline reuse for same triple, per-capability isolation, breaker opens after
threshold on Tier A, timeout fires, cancellation is not retried,
invalidation scoped to matching instance.
Polly.Core 8.6.6 added to Core.csproj. Full solution dotnet test: 936 passing
(baseline 906 + 30 new). One pre-existing Client.CLI Subscribe flake unchanged
by this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>