M4 R4.4: client-side multi-historian redundancy

Adds AVEVA.Historian.Client.Redundancy — HistorianRedundantClient orchestrates
N single-historian members (IHistorianMember; default HistorianClientMember
over HistorianClient) as one logical client. Pure client-side, no server-side
redundancy protocol, no RE.

- Reads fail over to the next member in priority order. Streaming reads only
  fail over BEFORE the first row is observed; a mid-stream failure propagates
  (failing over mid-stream would risk duplicated/skipped rows).
- Writes fan out: WriteFanout AllMembers | PreferredOnly, with All | Any ack
  policy, returning a per-member HistorianRedundantWriteResult.
- Per-member health: FailureThreshold demotes a failing member out of the
  preferred pool; a background watchdog (PeriodicTimer) + CheckHealthAsync
  re-probe and restore recovered members. GetStatus() snapshot + ActiveMember.
- Composes with R4.1: back a member's writes with a HistorianStoreForwardWriter
  so a down member buffers and replays on recovery — the pragmatic client-side
  equivalent of native ReSyncTags.

14 unit tests (no server): failover order, mid-stream no-failover, all-fail
aggregation, probe-any-up, fan-out ack policies, PreferredOnly, soft reject,
health demotion + CheckHealthAsync restore, watchdog recovery. Full suite 307
green. Roadmap R4.4 marked shipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
This commit is contained in:
Joseph Doherty
2026-06-21 22:46:10 -04:00
parent a9000ec06a
commit 60b3673f01
10 changed files with 1019 additions and 2 deletions
+2 -2
View File
@@ -274,7 +274,7 @@ Only if the use case demands them. Each is a real subsystem, not an op.
| R4.1 | Store-and-forward | ✅ **SHIPPED (2026-06-21) — pragmatic durable outbox.** `AVEVA.Historian.Client.StoreForward`: `HistorianStoreForwardWriter` buffers historical-value + event writes to an `IHistorianOutboxStore` (`FileHistorianOutboxStore` = crash-durable atomic JSON-per-entry, FIFO by filename sequence, corrupt-file quarantine; `InMemoryHistorianOutboxStore` for tests) and replays them through an `IHistorianWriteSink` (default `HistorianClientWriteSink`). Background drain loop retries on reconnect; FIFO head-of-line blocking with optional `MaxDeliveryAttempts` dead-lettering; `DropOldest`/`Reject` overflow policy; `GetStatusAsync` snapshot (Pending/Storing/ErrorOccurred mirrors the server SF semantics). 12 unit tests (durability-across-restart, reconnect-drain, head-of-line order, dead-letter, overflow, background loop). **NOT** the bit-faithful native SF cache (`Forward*Snapshot` decode) — that stays deferred; pure client-side, no RE. | high; consider "good enough" |
| R4.2 | Revision / edit writes | `AddRevisionValue(s)` go via the **non-WCF storage-engine pipe** (`STransactPipeClient2`) — separate transport RE | high |
| R4.3 | Real store-forward **status** | duplex push (`SetStoreForwardEvent`) or a decoded pull endpoint — see store-forward plan | medium |
| R4.4 | Multi-historian / redundancy | client-side orchestration over N single-historian sessions (failover, ReSyncTags, partner watchdog) — build last | medium |
| R4.4 | Multi-historian / redundancy | **SHIPPED (2026-06-21) — client-side orchestration.** `AVEVA.Historian.Client.Redundancy`: `HistorianRedundantClient` fronts N `IHistorianMember`s (default `HistorianClientMember` over `HistorianClient`) as one logical client. Reads fail over to the next member in priority order — streaming reads only fail over *before the first row* (mid-stream failures propagate to avoid dup/gap); writes fan out (`AllMembers`/`PreferredOnly`) with `All`/`Any` ack policy returning a per-member `HistorianRedundantWriteResult`. Per-member health (`FailureThreshold` demotion) + background watchdog (`CheckHealthAsync`/`PeriodicTimer`) restores recovered members; `GetStatus()` snapshot. Composes with R4.1: back a member's writes with a `HistorianStoreForwardWriter` for the pragmatic ReSyncTags equivalent (down member buffers + replays). 14 unit tests (failover order, mid-stream no-failover, ack policies, fanout modes, watchdog recovery, all-fail aggregation). Pure client-side, no server-side redundancy protocol, no RE. | medium |
---
@@ -331,4 +331,4 @@ event-send). M3/M4 as demand dictates.
| M1 cheap surface | TRIVIAL/BOUNDED | ML | most remaining read/config | ✅ **done** (reachable surface; rest bounded out) |
| M2 event send | CAPTURE | SM | headline write capability | ✅ **done** |
| M3 historical writes | BOUNDED | M | backfill | ✅ **SHIPPED + LIVE-VALIDATED (2026-06-21)**`AddHistoricalValuesAsync` over gRPC = `HistoryService.AddStreamValues` ("ON" buffer) + tag-GUID resolve. Pure-managed SDK write read back live. All 5 analog types (Float/Double/Int2/Int4/UInt4). WCF still blocked (D2) |
| M4 SF / revisions / redundancy | HARD | L×N | parity completeness | **R4.1 store-and-forward SHIPPED** (pragmatic durable outbox, 2026-06-21); R4.2 revisions deferred (same pipe wall), R4.3 SF status + R4.4 redundancy deferred |
| M4 SF / revisions / redundancy | HARD | L×N | parity completeness | **R4.1 store-and-forward + R4.4 redundancy SHIPPED** (pragmatic, client-side, 2026-06-21); R4.2 revisions deferred (storage-engine-pipe wall), R4.3 real SF status deferred (RE-heavy, needs SF-active server) |