Closes the last open code-review finding. The unreachable
IngestCachedTelemetryAsync path now carries production cached-call
lifecycle traffic, delivering the design's "AuditLog + SiteCalls in one
MS SQL transaction" guarantee. Before this commit, the SiteCalls
operational half had NO production transport at all — central's
SiteCallAuditActor.OnUpsertAsync had zero producers, so cached-call
operational state never reached the central mirror.
Site-side partition (so neither path double-emits):
- ISiteAuditQueue.ReadPendingCachedTelemetryAsync — new method returning
rows where Kind ∈ {CachedSubmit, ApiCallCached, DbWriteCached,
CachedResolve} AND ForwardState = Pending.
- ISiteAuditQueue.ReadPendingAsync — XML doc updated, SQLite impl now
filters Kind NOT IN the cached set so cached rows no longer ride the
audit-only drain.
New cached-drain in SiteAuditTelemetryActor:
- Optional IOperationTrackingStore? ctor param (null on central
composition roots — the cached scheduler is never armed there).
- Independent CachedDrain message + scheduler tick parallel to the
existing Drain — a stall on one path can't block the other; shared
lifecycle CTS gates both.
- OnCachedDrainAsync: reads cached audit rows, joins each with its
matching SiteCallOperational snapshot via CorrelationId →
TrackedOperationId from the tracking store, builds CachedTelemetryBatch,
pushes via IngestCachedTelemetryAsync, marks ack'd rows Forwarded.
- Orphan rows (no tracking snapshot, thrown tracking-store call,
missing CorrelationId) logged at Warning + skipped — they stay
Pending so reconciliation/retry picks them up later. Best-effort
contract preserved.
Central side: AuditLogIngestActor.OnCachedTelemetryAsync was already
implemented (M3 Bundle G dead code today, alive after this commit) —
performs InsertIfNotExists for AuditLog + UpsertAsync for SiteCalls
inside a BeginTransactionAsync. The handler is idempotent on EventId,
so any duplicate arrivals from concurrent push + reconciliation are
silent no-ops.
Composition root: AkkaHostedService now resolves IOperationTrackingStore
via GetService<>() (site-only) and threads it through the actor's
Props.Create.
Tests added (+3 in SiteAuditTelemetryActorTests):
- Cached rows route through the new transport, not the audit-only drain.
- Orphan cached row (no tracking match) is logged + skipped, drain
doesn't crash.
- Ordinary audit rows still flow through the audit-only drain unchanged.
- ParentExecutionIdCorrelationTests now unions both queues to assert
all expected Kinds remain covered after the partition.
Build clean; AuditLog.Tests 250/251 (the 1 fail is the pre-existing
date-sensitive PartitionPurgeTests integration flake explicitly accepted
across the session); SiteRuntime.Tests 302/302.
README regenerated: 0 pending of 481 total.
Session-final totals: 136 of 136 originally-open Theme findings closed
across 11 commits (10 themed batches + this architectural close).
Caller-provided SourceNode wins (preserves reconciled rows from other nodes);
otherwise the writer fills it from the local INodeIdentityProvider.NodeName.
Reads from the provider on every write — singleton lifetime means zero overhead.
The headline ParentExecutionIdCorrelationTests intermittently failed under
full-suite parallel load, seeing 6 of 7 routed-run rows (NotifySend missing).
Root cause: WaitForSiteRowsPersistedAsync checked only a row *count*, which a
partial snapshot could satisfy before the last-emitted NotifySend row settled,
letting the SiteAuditTelemetryActor drain a partial batch. Fix is test-only:
wait on the specific audit Kinds (guaranteeing NotifySend is durably in SQLite
before the assertion) and widen the assertion ceiling 30s -> 90s for drain
headroom under load. Also drops leftover // DIAG sampler debug scaffolding.