Per Stream A.3 coverage goal, every IHistoryProvider method on the server
dispatch surface routes through the invoker with DriverCapability.HistoryRead:
- HistoryReadRaw (line 487)
- HistoryReadProcessed (line 551)
- HistoryReadAtTime (line 608)
- HistoryReadEvents (line 665)
Each gets timeout + per-(driver, host) circuit breaker + the default Tier
retry policy (Tier A default: 2 retries at 30s timeout). Inner driver
GetAwaiter().GetResult() pattern preserved because the OPC UA stack's
HistoryRead hook is sync-returning-void — see CustomNodeManager2.
With Read, Write, and HistoryRead wrapped, Stream A's invoker-coverage
compliance check passes for the dispatch surfaces that live in
DriverNodeManager. Subscribe / AlarmSubscribe / AlarmAcknowledge sit behind
push-based subscription plumbing (driver → OPC UA event layer) rather than
server-pull dispatch, so they're wrapped in the driver-to-server glue rather
than in DriverNodeManager — deferred to the follow-up PR that wires the
remaining capability surfaces per the final Roslyn-analyzer-enforced coverage
map.
Full solution dotnet test: 948 passing. Pre-existing Client.CLI Subscribe
flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every OnReadValue / OnWriteValue now routes through the process-singleton
DriverResiliencePipelineBuilder's CapabilityInvoker. Read / Write dispatch
paths gain timeout + per-capability retry + per-(driver, host) circuit breaker
+ bulkhead without touching the individual driver implementations.
Wiring:
- OpcUaApplicationHost: new optional DriverResiliencePipelineBuilder ctor
parameter (default null → instance-owned builder). Keeps the 3 test call
sites that construct OpcUaApplicationHost directly unchanged.
- OtOpcUaServer: requires the builder in its ctor; constructs one
CapabilityInvoker per driver at CreateMasterNodeManager time with default
Tier A DriverResilienceOptions. TODO: Stream B.1 will wire real per-driver-
type tiers via DriverTypeRegistry; Phase 6.1 follow-up will read the
DriverInstance.ResilienceConfig JSON column for per-instance overrides.
- DriverNodeManager: takes a CapabilityInvoker in its ctor. OnReadValue wraps
the driver's ReadAsync through ExecuteAsync(DriverCapability.Read, hostName,
...); OnWriteValue wraps WriteAsync through ExecuteWriteAsync(hostName,
isIdempotent, ...) where isIdempotent comes from the new
_writeIdempotentByFullRef map populated at Variable() registration from
DriverAttributeInfo.WriteIdempotent.
HostName defaults to driver.DriverInstanceId for now — a single-host pipeline
per driver. Multi-host drivers (Modbus with N PLCs) will expose their own per-
call host resolution in a follow-up so failing PLCs can trip per-PLC breakers
without poisoning siblings (decision #144).
Test fixup:
- FlakeyDriverIntegrationTests.Read_SurfacesSuccess_AfterTransientFailures:
bumped TimeoutSeconds=2 → 30. 10 retries at exponential backoff with jitter
can exceed 2s under parallel-test-run CPU pressure; the test asserts retry
behavior, not timeout budget, so the longer slack keeps it deterministic.
Full solution dotnet test: 948 passing. Pre-existing Client.CLI Subscribe
flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IDriver.DriverInstanceId is declared as string in Core.Abstractions; keeping
the pipeline key as Guid meant every call site would need .ToString() / Guid.Parse
at the boundary. Switching the Resilience types to string removes that friction
and lets OtOpcUaServer pass driver.DriverInstanceId directly to the builder in
the upcoming server-dispatch wiring PR.
- DriverResiliencePipelineBuilder.GetOrCreate + Invalidate + PipelineKey
- CapabilityInvoker.ctor + _driverInstanceId field
Tests: all 48 Core.Tests still pass. The Invalidate test's keepId / dropId now
use distinct "drv-keep" / "drv-drop" literals (previously both were distinct
Guid.NewGuid() values, which the sed-driven refactor had collapsed to the same
literal — caught pre-commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-tag opt-in for write-retry per docs/v2/plan.md decisions #44, #45, #143.
Default is false — writes never auto-retry unless the driver author has marked
the tag as safe to replay.
Core.Abstractions:
- DriverAttributeInfo gains `bool WriteIdempotent = false` at the end of the
positional record (back-compatible; every existing call site uses the default).
Driver.Modbus:
- ModbusTagDefinition gains `bool WriteIdempotent = false`. Safe candidates
documented in the param XML: holding-register set-points, configuration
registers. Unsafe: edge-triggered coils, counter-increment addresses.
- ModbusDriver.DiscoverAsync propagates t.WriteIdempotent into
DriverAttributeInfo.WriteIdempotent.
Driver.S7:
- S7TagDefinition gains `bool WriteIdempotent = false`. Safe candidates:
DB word/dword set-points, configuration DBs. Unsafe: M/Q bits that drive
edge-triggered program routines.
- S7Driver.DiscoverAsync propagates the flag.
Stream A.5 integration tests (FlakeyDriverIntegrationTests, 4 new) exercise
the invoker + flaky-driver contract the plan enumerates:
- Read with 5 transient failures succeeds on the 6th attempt (RetryCount=10).
- Non-idempotent write with RetryCount=5 configured still fails on the first
failure — no replay (decision #44 guard at the ExecuteWriteAsync surface).
- Idempotent write with 2 transient failures succeeds on the 3rd attempt.
- Two hosts on the same driver have independent breakers — dead-host trips
its breaker but live-host's first call still succeeds.
Propagation tests:
- ModbusDriverTests: SetPoint WriteIdempotent=true flows into
DriverAttributeInfo; PulseCoil default=false.
- S7DiscoveryAndSubscribeTests: same pattern for DBx SetPoint vs M-bit.
Full solution dotnet test: 947 passing (baseline 906, +41 net across Stream A
so far). Pre-existing Client.CLI Subscribe flake unchanged.
Stream A's remaining work (wiring CapabilityInvoker into DriverNodeManager's
OnReadValue / OnWriteValue / History / Subscribe dispatch paths) is the
server-side integration piece + needs DI wiring for the pipeline builder —
lands in the next PR on this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One invoker per (DriverInstance, IDriver) pair; calls ExecuteAsync(capability,
host, callSite) and the invoker resolves the correct pipeline from the shared
DriverResiliencePipelineBuilder. The options accessor is a Func so Admin-edit
+ pipeline-invalidate takes effect without restarting the invoker or the
driver host.
ExecuteWriteAsync(isIdempotent) is the explicit write-safety surface:
- isIdempotent=false routes through a side pipeline with RetryCount=0 regardless
of what the caller configured. The cache key carries a "::non-idempotent"
suffix so it never collides with the retry-enabled write pipeline.
- isIdempotent=true routes through the normal Write pipeline. If the user has
configured Write retries (opt-in), the idempotent tag gets them; otherwise
default-0 still wins.
The server dispatch layer (next PR) reads WriteIdempotentAttribute on each tag
definition once at driver-init time and feeds the boolean into ExecuteWriteAsync.
Tests (6 new):
- Read retries on transient failure; returns value from call site.
- Write non-idempotent does NOT retry even when policy has 3 retries configured
(the explicit decision-#44 guard at the dispatch surface).
- Write idempotent retries when policy allows.
- Write with default tier-A policy (RetryCount=0) never retries regardless of
idempotency flag.
- Different hosts get independent pipelines.
Core.Tests now 44 passing (was 38). Invoker doc-refs completed (the XML comment
on WriteIdempotentAttribute no longer references a non-existent type).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the first chunk of the Phase 6.1 Stream A resilience layer per
docs/v2/implementation/phase-6-1-resilience-and-observability.md §Stream A.
Downstream CapabilityInvoker (A.3) + driver-dispatch wiring land in follow-up
PRs on the same branch.
Core.Abstractions additions:
- WriteIdempotentAttribute — marker for tag-definition records that opt into
auto-retry on IWritable.WriteAsync. Absence = no retry per decisions #44, #45,
#143. Read once via reflection at driver-init time; no per-write cost.
- DriverCapability enum — enumerates the 8 capability surface points
(Read / Write / Discover / Subscribe / Probe / AlarmSubscribe / AlarmAcknowledge
/ HistoryRead). AlarmAcknowledge is write-shaped (no retry by default).
- DriverTier enum — A/B/C per driver-stability.md §2-4. Stream B.1 wires this
into DriverTypeMetadata; surfaced here because the resilience policy defaults
key on it.
Core.Resilience new namespace:
- DriverResilienceOptions — per-tier × per-capability policy defaults.
GetTierDefaults(tier) is the source of truth:
* Tier A: Read 2s/3 retries, Write 2s/0 retries, breaker threshold 5
* Tier B: Read 4s/3, Write 4s/0, breaker threshold 5
* Tier C: Read 10s/1, Write 10s/0, breaker threshold 0 (supervisor handles
process-level breaker per decision #68)
Resolve(capability) overlays CapabilityPolicies on top of the defaults.
- DriverResiliencePipelineBuilder — composes Timeout → Retry (capability-
permitting, never on cancellation) → CircuitBreaker (tier-permitting) →
Bulkhead. Pipelines cached in a lock-free ConcurrentDictionary keyed on
(DriverInstanceId, HostName, DriverCapability) per decision #144 — one dead
PLC behind a multi-device driver does not open the breaker for healthy
siblings. Invalidate(driverInstanceId) supports Admin-triggered reload.
Tests (30 new, all pass):
- DriverResilienceOptionsTests: tier-default coverage for every capability,
Write + AlarmAcknowledge never retry at any tier, Tier C disables breaker,
resolve-with-override layering.
- DriverResiliencePipelineBuilderTests: Read retries transients, Write does NOT
retry on failure (decision #44 guard), dead-host isolation from sibling hosts,
pipeline reuse for same triple, per-capability isolation, breaker opens after
threshold on Tier A, timeout fires, cancellation is not retried,
invalidation scoped to matching instance.
Polly.Core 8.6.6 added to Core.csproj. Full solution dotnet test: 936 passing
(baseline 906 + 30 new). One pre-existing Client.CLI Subscribe flake unchanged
by this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>