feat(audit): close AuditLog-001 — wire combined-telemetry dual-write transport
Closes the last open code-review finding. The unreachable
IngestCachedTelemetryAsync path now carries production cached-call
lifecycle traffic, delivering the design's "AuditLog + SiteCalls in one
MS SQL transaction" guarantee. Before this commit, the SiteCalls
operational half had NO production transport at all — central's
SiteCallAuditActor.OnUpsertAsync had zero producers, so cached-call
operational state never reached the central mirror.
Site-side partition (so neither path double-emits):
- ISiteAuditQueue.ReadPendingCachedTelemetryAsync — new method returning
rows where Kind ∈ {CachedSubmit, ApiCallCached, DbWriteCached,
CachedResolve} AND ForwardState = Pending.
- ISiteAuditQueue.ReadPendingAsync — XML doc updated, SQLite impl now
filters Kind NOT IN the cached set so cached rows no longer ride the
audit-only drain.
New cached-drain in SiteAuditTelemetryActor:
- Optional IOperationTrackingStore? ctor param (null on central
composition roots — the cached scheduler is never armed there).
- Independent CachedDrain message + scheduler tick parallel to the
existing Drain — a stall on one path can't block the other; shared
lifecycle CTS gates both.
- OnCachedDrainAsync: reads cached audit rows, joins each with its
matching SiteCallOperational snapshot via CorrelationId →
TrackedOperationId from the tracking store, builds CachedTelemetryBatch,
pushes via IngestCachedTelemetryAsync, marks ack'd rows Forwarded.
- Orphan rows (no tracking snapshot, thrown tracking-store call,
missing CorrelationId) logged at Warning + skipped — they stay
Pending so reconciliation/retry picks them up later. Best-effort
contract preserved.
Central side: AuditLogIngestActor.OnCachedTelemetryAsync was already
implemented (M3 Bundle G dead code today, alive after this commit) —
performs InsertIfNotExists for AuditLog + UpsertAsync for SiteCalls
inside a BeginTransactionAsync. The handler is idempotent on EventId,
so any duplicate arrivals from concurrent push + reconciliation are
silent no-ops.
Composition root: AkkaHostedService now resolves IOperationTrackingStore
via GetService<>() (site-only) and threads it through the actor's
Props.Create.
Tests added (+3 in SiteAuditTelemetryActorTests):
- Cached rows route through the new transport, not the audit-only drain.
- Orphan cached row (no tracking match) is logged + skipped, drain
doesn't crash.
- Ordinary audit rows still flow through the audit-only drain unchanged.
- ParentExecutionIdCorrelationTests now unions both queues to assert
all expected Kinds remain covered after the partition.
Build clean; AuditLog.Tests 250/251 (the 1 fail is the pre-existing
date-sensitive PartitionPurgeTests integration flake explicitly accepted
across the session); SiteRuntime.Tests 302/302.
README regenerated: 0 pending of 481 total.
Session-final totals: 136 of 136 originally-open Theme findings closed
across 11 commits (10 themed batches + this architectural close).
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Open findings | 1 |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -65,7 +65,7 @@ chain doesn't reject a central composition root that mistakenly calls the site b
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.AuditLog/Site/Telemetry/ISiteStreamAuditClient.cs:45`, `src/ScadaLink.AuditLog/Site/Telemetry/ClusterClientSiteAuditClient.cs:86`, `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs:198` |
|
||||
|
||||
**Description**
|
||||
@@ -101,9 +101,74 @@ unreachable `OnCachedTelemetryAsync` dual-write code (after confirming the
|
||||
`AuditLogIngestActorCombinedTelemetryTests` integration tests exercise it via direct
|
||||
actor injection only).
|
||||
|
||||
**Resolution**
|
||||
**Resolution (2026-05-28):**
|
||||
|
||||
_Unresolved._
|
||||
Wired the combined-telemetry transport end-to-end via recommendation (a). The
|
||||
previously-unreachable `IngestCachedTelemetryAsync` client path now carries
|
||||
cached-call lifecycle rows from the site SQLite hot-path through to the central
|
||||
`AuditLogIngestActor.OnCachedTelemetryAsync` dual-write transaction. Changes:
|
||||
|
||||
- **`ISiteAuditQueue`** (`src/ScadaLink.Commons/Interfaces/Services/ISiteAuditQueue.cs`):
|
||||
added `ReadPendingCachedTelemetryAsync(int, CancellationToken)` returning
|
||||
rows in `AuditForwardState.Pending` whose `Kind` is one of `CachedSubmit`,
|
||||
`ApiCallCached`, `DbWriteCached`, `CachedResolve`. Updated `ReadPendingAsync`
|
||||
XML doc to call out the partition.
|
||||
- **`SqliteAuditWriter`** (`src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs`):
|
||||
implemented `ReadPendingCachedTelemetryAsync` with a `Kind IN (...)` filter
|
||||
reusing the existing `_readConnection` / `_readLock` decoupling; modified
|
||||
`ReadPendingAsync` to add the symmetric `Kind NOT IN (...)` predicate so the
|
||||
audit-only drain no longer double-emits cached rows.
|
||||
- **`SiteAuditTelemetryActor`** (`src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs`):
|
||||
added an optional `IOperationTrackingStore?` constructor parameter, a sibling
|
||||
`CachedDrain` self-tick message, and an `OnCachedDrainAsync` handler running
|
||||
in parallel with the existing audit-only drain. The cached-drain reads the
|
||||
partitioned audit rows, joins each with the matching tracking-store
|
||||
snapshot (looked up by `TrackedOperationId` via `CorrelationId`), builds a
|
||||
`CachedTelemetryBatch`, pushes via `IngestCachedTelemetryAsync`, and marks
|
||||
ack'd EventIds Forwarded. Orphan rows (no matching tracking snapshot, or a
|
||||
thrown tracking-store call) are logged + skipped so the bad row never
|
||||
blocks the rest of the batch; rows stay Pending and reconciliation /
|
||||
retention handles them. The lifecycle CTS (AuditLog-010) gates both drains
|
||||
uniformly.
|
||||
- **`AkkaHostedService`** (`src/ScadaLink.Host/Actors/AkkaHostedService.cs`):
|
||||
resolves `IOperationTrackingStore` via `GetService` (site-only registration)
|
||||
and threads it through the actor's `Props.Create`. Central composition
|
||||
roots and tests that don't register the tracking store get the legacy
|
||||
audit-only behaviour — the cached scheduler is never armed.
|
||||
- **Tests** (`tests/ScadaLink.AuditLog.Tests/Site/Telemetry/SiteAuditTelemetryActorTests.cs`):
|
||||
added three regression tests asserting (1) cached rows route through
|
||||
`IngestCachedTelemetryAsync` and NOT `IngestAuditEventsAsync`, (2) an
|
||||
orphan row with no tracking snapshot is logged + skipped without crashing
|
||||
the drain, (3) the audit-only drain still flows when the cached drain is
|
||||
disabled (null tracking store). Updated `WaitForSiteRowsPersistedAsync` in
|
||||
`ParentExecutionIdCorrelationTests` to union `ReadPendingCachedTelemetryAsync`
|
||||
into the durability check — its `ReadPendingAsync(256) ∪ ReadForwardedAsync(256)`
|
||||
assertion previously missed the cached kinds after the partition change.
|
||||
|
||||
**Design notes / caveats.**
|
||||
|
||||
- *Operational state at emission time is the latest tracking row, not the
|
||||
per-event status.* The original spec described one combined packet per
|
||||
lifecycle event, but the production wiring keeps the existing
|
||||
`CachedCallTelemetryForwarder` dual-write (audit + tracking) and uses the
|
||||
drain as a join. Central's `SiteCalls` upsert is monotonic so this is
|
||||
consistent with the broader design — the audit row preserves per-event
|
||||
granularity, the SiteCalls mirror reflects "most recent known" state.
|
||||
- *Test-only `CombinedTelemetryDispatcher` wire push is now redundant but
|
||||
harmless.* The dispatcher's manual `IngestCachedTelemetryAsync` call in
|
||||
`CombinedTelemetryHarness` / `ParentExecutionIdCorrelationTests` still
|
||||
executes; central's idempotent `InsertIfNotExistsAsync` swallows the
|
||||
duplicate so it's a no-op. Removing it is a separate clean-up.
|
||||
- *Per-actor cancellation gates both drains.* The lifecycle CTS (AuditLog-010)
|
||||
is shared so `PostStop` cancels in-flight cached lookups + pushes at the
|
||||
same instant as audit-only drains.
|
||||
|
||||
Build: `dotnet build ScadaLink.slnx` — 0 warnings, 0 errors.
|
||||
Tests: `dotnet test tests/ScadaLink.AuditLog.Tests` — 250 passed, 1 failed
|
||||
(`PartitionPurgeTests.EndToEnd_OldestPartition_PurgedViaActor_NewerKept` —
|
||||
pre-existing MS-SQL date-sensitive flake, called out in the prompt as
|
||||
acceptable). `dotnet test tests/ScadaLink.SiteRuntime.Tests` — all 302
|
||||
passed.
|
||||
|
||||
### AuditLog-002 — `SupervisorStrategy` comments claim Resume semantics but code returns the default Restart decider
|
||||
|
||||
|
||||
@@ -41,15 +41,15 @@ module file and counted in **Total**.
|
||||
|----------|---------------|
|
||||
| Critical | 0 |
|
||||
| High | 0 |
|
||||
| Medium | 1 |
|
||||
| Medium | 0 |
|
||||
| Low | 0 |
|
||||
| **Total** | **1** |
|
||||
| **Total** | **0** |
|
||||
|
||||
## Module Status
|
||||
|
||||
| Module | Last reviewed | Commit | Open (C/H/M/L) | Open | Total |
|
||||
|--------|---------------|--------|----------------|------|-------|
|
||||
| [AuditLog](AuditLog/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/0 | 1 | 11 |
|
||||
| [AuditLog](AuditLog/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 11 |
|
||||
| [CLI](CLI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
|
||||
| [CentralUI](CentralUI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 33 |
|
||||
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 14 |
|
||||
@@ -88,11 +88,9 @@ _None open._
|
||||
|
||||
_None open._
|
||||
|
||||
### Medium (1)
|
||||
### Medium (0)
|
||||
|
||||
| ID | Module | Title |
|
||||
|----|--------|-------|
|
||||
| AuditLog-001 | [AuditLog](AuditLog/findings.md) | Combined-telemetry transport is plumbed end-to-end but never invoked in production |
|
||||
_None open._
|
||||
|
||||
### Low (0)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user