Commit Graph

748 Commits

Author SHA1 Message Date
Joseph Doherty
66f6724c5d test(auditlog): outage + reconciliation recovery end-to-end (#23 M6) 2026-05-20 19:32:01 -04:00
Joseph Doherty
ef49b55cf6 fix(health): decouple AuditCentralHealthSnapshot from ActorSystem (#23 M6)
The snapshot's per-site stalled latch now lives on the snapshot itself
and is fed by SiteAuditTelemetryStalledTracker via ApplyStalled, removing
the chain that required ActorSystem at DI composition time. The tracker
is now constructed by AkkaHostedService once ActorSystem.Create returns,
with a lock-guarded auxiliary-disposable list so concurrent host
start/stop in tests cannot race the enumeration.
2026-05-20 19:25:28 -04:00
Joseph Doherty
2744011ce9 feat(health): surface AuditRedactionFailure in central snapshot (#23 M6) 2026-05-20 19:13:19 -04:00
Joseph Doherty
70ed8d4557 feat(health): CentralAuditWriteFailures + AuditCentralHealthSnapshot (#23 M6) 2026-05-20 19:11:52 -04:00
Joseph Doherty
42333a72ed feat(health): SiteAuditTelemetryStalledTracker subscribes to EventStream (#23 M6) 2026-05-20 19:07:44 -04:00
Joseph Doherty
e93f655ce4 feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6) 2026-05-20 19:02:01 -04:00
Joseph Doherty
75b060e0a8 feat(auditlog): AuditLogPartitionMaintenanceService monthly roll-forward (#23 M6) 2026-05-20 18:51:43 -04:00
Joseph Doherty
cc2d6e91f1 fix(auditlog): SiteAuditReconciliationActor captures EventStream before await (#23 M6) 2026-05-20 18:39:19 -04:00
Joseph Doherty
660fdc4e93 feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6)
Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition
purge. On a configurable timer (default 24 hours) the actor:
  1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for
     monthly boundaries whose latest OccurredAtUtc is older than
     DateTime.UtcNow - AuditLogOptions.RetentionDays.
  2. For each eligible boundary calls SwitchOutPartitionAsync, which runs
     the drop-and-rebuild dance around UX_AuditLog_EventId.
  3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on
     the actor-system EventStream so the Bundle E central health collector
     and ops surfaces can subscribe without coupling to this actor.

Co-changes:
* SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the
  switch via COUNT_BIG over the per-partition  filter so the count
  reflects what the switch removed, not a post-purge scan of a table that
  no longer exists. All stub implementations updated.
* AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for
  tests, Interval property resolving either.
* AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs.

Behavior:
* Continue-on-error per boundary — one partition that throws does NOT
  abandon the rest of the tick.
* DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core
  service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor.
* SupervisorStrategy Resume keeps the singleton alive across leaked
  exceptions.
* EventStream capture BEFORE the first await — Context is unsafe after
  await in async receive handlers (same pattern as Sender-capture in
  AuditLogIngestActor.OnIngestAsync).

Tests:
* Tick_Fires_OnDailyInterval — visible timer side effect.
* Tick_OldPartitions_SwitchedOut — both seeded boundaries purged.
* Tick_NewerPartitions_Untouched — empty enumerator → no switches.
* Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries
  RowsDeleted and DurationMs.
* Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error.
* Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window
  computed from UtcNow - RetentionDays.
* EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit +
  MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged,
  Apr-2026 row kept, AuditLogPurgedEvent observed via probe.
2026-05-20 18:36:31 -04:00
Joseph Doherty
6069a20e0f fix(configdb): replace SwitchOutPartitionAsync stub with drop-and-rebuild dance (#23 M6)
Replaces M1's NotSupportedException stub with the production drop-DROP-INDEX
→ CREATE-staging → SWITCH PARTITION → DROP-staging → CREATE-INDEX dance
documented in alog.md §4. UX_AuditLog_EventId is intentionally non-aligned
with ps_AuditLog_Month so single-column EventId uniqueness can be enforced
cheaply for InsertIfNotExistsAsync; SQL Server rejects ALTER TABLE SWITCH
while a non-aligned unique index is present, so the implementation drops
it, switches the partition data into a GUID-suffixed staging table on
[PRIMARY], drops staging (discarding the rows), and rebuilds the unique
index — all inside an explicit transaction with a CATCH that guarantees
the unique index is rebuilt regardless of failure point.

Also adds GetPartitionBoundariesOlderThanAsync to IAuditLogRepository: a
CROSS APPLY over sys.partition_range_values + per-partition MAX(OccurredAtUtc)
to enumerate retention-eligible months for the M6 purge actor (next commit).

Tests verify:
* Old partition's rows are removed; other months untouched
* UX_AuditLog_EventId is rebuilt after a successful switch
* InsertIfNotExistsAsync's first-write-wins idempotency still holds after switch
* On engineered SWITCH failure (inbound FK from a probe table), SqlException
  propagates AND UX_AuditLog_EventId is still present (CATCH branch ran)
* GetPartitionBoundariesOlderThanAsync returns only boundaries whose partition's
  MAX(OccurredAtUtc) is strictly older than the threshold; empty partitions
  excluded
2026-05-20 18:20:55 -04:00
Joseph Doherty
c763bd9a04 feat(auditlog): SiteAuditReconciliationActor central singleton (#23 M6) 2026-05-20 18:10:42 -04:00
Joseph Doherty
640fd07454 feat(comms): site-side PullAuditEvents handler (#23 M6) 2026-05-20 17:58:43 -04:00
Joseph Doherty
25d9acbce3 feat(comms): PullAuditEvents RPC for audit reconciliation (#23 M6) 2026-05-20 17:48:30 -04:00
Joseph Doherty
b0584f7a08 docs(audit): add M6 reconciliation+purge+partition+health plan (#23)
6 bundles: proto+site handler, reconciliation actor, purge actor with
drop-and-rebuild around UX index, partition maintenance, four health
metrics, integration tests. M5 realities baked in.
2026-05-20 17:44:12 -04:00
Joseph Doherty
db05af897e docs(audit): roadmap corrections after M5
M6 head records M5 realities:
- IOptionsMonitor hot-reload pattern verified; M6 retention config can
  reuse.
- AuditRedactionFailure counter site-only in M5; M6 wires central side.
- Filter integration is at 3 writer entry points; purge actor doesn't
  emit so no filter integration needed.
- SwitchOutPartitionAsync drop-and-rebuild dance required (M1 reality
  + M6-T4 already documents it).
- M6 should land the real ISiteStreamAuditClient (Option A) so push
  telemetry leaves NoOp behind.
2026-05-20 17:43:44 -04:00
Joseph Doherty
adc490b690 Merge branch 'feature/audit-log-m5-payload-redaction': Audit Log #23 M5 Payload + Redaction
M5 ships the payload filter pipeline. IAuditPayloadFilter runs between
event construction and writer call:
- Stage 1: HTTP header redaction (Authorization/Cookie/Set-Cookie/X-Api-Key
  default list from M1-T9; case-insensitive name match against JSON
  {headers,body} shape).
- Stage 2: Body regex redaction (global + per-target). Patterns compiled
  at startup with 100ms budget; runtime 50ms timeout guard against
  catastrophic backtracking. Over-redact on exception + increment counter.
- Stage 3: SQL parameter redaction (Channel=DbOutbound, per-connection
  opt-in via PerTargetOverrides[connection].RedactSqlParamsMatching).
- Stage 4: UTF-8 boundary-safe truncation. Default cap 8 KB; error cap
  64 KB on Status NOT IN (Delivered/Submitted/Forwarded). PayloadTruncated
  set to true when applied.

Filter wired into all three writer entry points:
- FallbackAuditWriter (site chain) — filter before SqliteAuditWriter.
- CentralAuditWriter (central direct-write) — filter before
  IAuditLogRepository.InsertIfNotExistsAsync (NotificationOutbox dispatcher,
  AuditWriteMiddleware).
- AuditLogIngestActor — filter before dual-write transaction.

Health metric SiteAuditRedactionFailureCounter wired through the existing
M2 Bundle G + M4 Bundle B health-bridge pattern; central-side counter
deferred to M6 (the milestone that ships the full central health surface).

Hot-reload via IOptionsMonitor + per-call CurrentValue read. Regex cache
keyed by pattern string so changing the config naturally invalidates old
patterns.

Shipped: 11 commits, ~49 net new tests across AuditLog.Tests,
HealthMonitoring.Tests, PerformanceTests. Full solution 24/24 test projects
green. infra/* untouched on any branch commit.
2026-05-20 17:43:19 -04:00
Joseph Doherty
1856b63f0c test(auditlog): redaction safety net edge cases (#23 M5) 2026-05-20 17:38:59 -04:00
Joseph Doherty
4eeda45f0e test(auditlog): hot-path latency budget for IAuditPayloadFilter (#23 M5) 2026-05-20 17:36:29 -04:00
Joseph Doherty
b409afda2e feat(auditlog): hot-reloadable AuditLogOptions + regex cache invalidation (#23 M5) 2026-05-20 17:35:15 -04:00
Joseph Doherty
23c0fd417e feat(health): AuditRedactionFailure counter + bridge (#23 M5)
Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor
over-redactions as a Site Health metric so a misconfigured /
catastrophic regex shows up on /monitoring/health rather than
disappearing into a NoOp sink.

  - SiteHealthReport: new 'AuditRedactionFailure' int field
    (defaulted to 0 for back-compat with existing producers/tests).
  - ISiteHealthCollector / SiteHealthCollector:
    new IncrementAuditRedactionFailure() — per-interval atomic
    counter with Interlocked, reset on CollectReport, mirroring
    the M2 Bundle G SiteAuditWriteFailures pattern.
  - HealthMetricsAuditRedactionFailureCounter: new bridge in
    ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter
    increments to ISiteHealthCollector — mirrors
    HealthMetricsAuditWriteFailureCounter one-for-one.
  - AddAuditLogHealthMetricsBridge: now ALSO Replaces the
    NoOpAuditRedactionFailureCounter binding with the health-metrics
    bridge, so a single AddAuditLogHealthMetricsBridge() call wires
    both the M2 Bundle G write-failure counter and the M5 Bundle C
    redaction-failure counter into the health report.

Site-side only for M5 — the filter also runs on CentralAuditWriter
and AuditLogIngestActor (where it just keeps the NoOp default), but
a central-side health-metric surface for AuditRedactionFailure is
deferred to M6 alongside the rest of the central health collector
work.

Tests:
  - AuditRedactionFailureMetricTests (HealthMonitoring) covers the
    SiteHealthCollector increment/report/reset shape (3 tests).
  - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers
    the AuditLog → HealthMonitoring bridge (3 tests).
  - Existing CountCapturingHealthCollector stub in
    DeploymentManagerRedeployTests extended with the new no-op
    interface method.

Verified: dotnet build clean, all 24 test projects green
(the only Failed at first ScadaLink.SiteRuntime.Tests run was the
known-flaky InstanceActorChildAttributeRaceTests; passes on re-run
in isolation and full suite, unrelated to these changes).
2026-05-20 17:28:33 -04:00
Joseph Doherty
9b1379ed9b feat(auditlog): wire IAuditPayloadFilter into all writer paths (#23 M5)
Bundle C task M5-T6 — plugs the IAuditPayloadFilter singleton into the
three audit writer entry points so every event is truncated + redacted
before persistence, regardless of which path it took to disk:

  - FallbackAuditWriter (site hot path): filter runs before the primary
    SQLite write AND the ring-buffer enqueue, so a recovery drain replays
    rows that are already capped/redacted.
  - CentralAuditWriter (central direct-write): filter runs before the
    per-call IAuditLogRepository.InsertIfNotExistsAsync.
  - AuditLogIngestActor (site→central telemetry):
      - OnIngestAsync resolves the filter from the per-message scope and
        applies it to each row before IngestedAtUtc stamping.
      - OnCachedTelemetryAsync (M3 dual-write) applies the filter to the
        audit half of every CachedTelemetryEntry before the audit-insert
        + site-call-upsert transaction.

Filter parameter is optional (nullable) on each constructor so the
existing test composition roots that don't pass one keep working unchanged
— production DI wiring in AddAuditLog always passes the real filter
through. ICentralAuditWriter registration switched from the open-ctor
form to a factory so the filter flows through it.

Tests: FilterIntegrationTests covers all three writer paths end-to-end
(4 tests). Full ScadaLink.AuditLog.Tests suite: 146 passed, 0 failed,
0 skipped.
2026-05-20 17:21:57 -04:00
Joseph Doherty
5a7f3e8bf6 feat(auditlog): per-connection SQL parameter redaction opt-in (#23 M5) 2026-05-20 17:11:53 -04:00
Joseph Doherty
37f17dc4a8 feat(auditlog): body regex redaction with over-redaction safety net (#23 M5) 2026-05-20 17:09:36 -04:00
Joseph Doherty
ad7b330f43 feat(auditlog): HTTP header redaction stage (#23 M5) 2026-05-20 17:07:01 -04:00
Joseph Doherty
bba2ef1b4d feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety (#23 M5) 2026-05-20 17:01:13 -04:00
Joseph Doherty
25cdf857c9 feat(auditlog): IAuditPayloadFilter contract (#23 M5) 2026-05-20 16:59:10 -04:00
Joseph Doherty
e7b40c1c50 docs(audit): add M5 payload+redaction implementation plan (#23)
4 bundles: filter+truncation, redactors (header/body/SQL-param), wire
into all emission paths + health metric, config+perf+safety-net.

Vocabulary translation locked: error-row cap (64 KB) on Status NOT IN
(Delivered, Submitted, Forwarded). Filter integration point in each
writer (FallbackAuditWriter, CentralAuditWriter, AuditLogIngestActor)
BEFORE storage call.
2026-05-20 16:56:56 -04:00
Joseph Doherty
dae6de2c48 docs(audit): roadmap corrections after M4
M5 head records M4 realities:
- AuditingDbConnection/Command/DataReader decorators need filter plug-in
  at WriteAsync emission point.
- CentralAuditWriter + FallbackAuditWriter are both filter integration
  points for the direct-write + chained-write paths.
- InboundAPI middleware RequestSummary populated, ResponseSummary=null
  pending response-body buffering decision in M5.
- UseWhen(/api/) path-scoped middleware gives natural per-target
  redaction hook.
- Error-row cap raised on Status IN (Failed, Parked, Discarded,
  Attempted, Skipped) per M1 vocab reconciliation.
2026-05-20 16:56:18 -04:00
Joseph Doherty
ac7fc9ce4d Merge branch 'feature/audit-log-m4-remaining-boundaries': Audit Log #23 M4 Remaining Boundary Emission
M4 closes the script-trust-boundary emission gaps:
- Sync DB writes/reads via AuditingDbConnection decorator (Channel=DbOutbound,
  Kind=DbWrite; Extra carries op + rowsAffected/rowsReturned).
- Notification Outbox dispatcher: NotifyDeliver(Attempted) per attempt;
  NotifyDeliver(Delivered/Parked/Discarded) on terminal. Direct-write via
  new ICentralAuditWriter (CentralAuditWriter implementation wraps
  IAuditLogRepository.InsertIfNotExistsAsync with scope-per-call).
- Site Notify.To().Send() emits NotifySend(Submitted) via the existing
  IAuditWriter site path; correlation via NotificationId.
- Inbound API AuditWriteMiddleware emits InboundRequest on success,
  InboundAuthFailure on 401/403; Actor = API key NAME (never material);
  registered via UseWhen(/api/) AFTER UseAuthentication/UseAuthorization;
  audit failure NEVER changes HTTP response.

Audit-write-failure-never-aborts-action proven end-to-end across all five
new code paths via AuditWriteFailureSafetyTests (broken ICentralAuditWriter
+ broken IAuditWriter scenarios all green).

Shipped: 12 commits, ~62 net new tests across SiteRuntime / NotificationOutbox
/ AuditLog / InboundAPI tests. Full solution 2763 tests passing. No
regressions. infra/* untouched on any branch commit.
2026-05-20 16:55:45 -04:00
Joseph Doherty
065c8259ae test(auditlog): audit failures never abort user-facing actions (#23 M4) 2026-05-20 16:50:48 -04:00
Joseph Doherty
a7eea0a795 test(auditlog): Inbound API request audit end-to-end (#23 M4) 2026-05-20 16:48:27 -04:00
Joseph Doherty
02727b3a66 test(auditlog): Notify dispatcher audit trail end-to-end (#23 M4) 2026-05-20 16:47:09 -04:00
Joseph Doherty
56b26339ca test(auditlog): DB sync emission end-to-end (#23 M4) 2026-05-20 16:43:55 -04:00
Joseph Doherty
1c862989b4 feat(inbound): register AuditWriteMiddleware in pipeline (#23 M4) 2026-05-20 16:35:13 -04:00
Joseph Doherty
3c3f7770c1 feat(inbound): AuditWriteMiddleware emitting InboundRequest/InboundAuthFailure (#23 M4) 2026-05-20 16:35:03 -04:00
Joseph Doherty
855df759b5 feat(siteruntime): emit NotifySend(Submitted) on site-side Notify.To().Send (#23 M4)
Audit Log #23 M4 Bundle C — Task C1: every script-initiated
Notify.To(list).Send(...) now emits exactly one
Notification/NotifySend audit row via the IAuditWriter wired through
ScriptRuntimeContext. The row carries Status=Submitted,
Target=list name, RequestSummary={subject,body} JSON (M5 will redact),
CorrelationId=NotificationId (parsed as Guid), provenance from context,
ForwardState=Pending.

Emission is best-effort per alog.md §7: a thrown audit writer is logged
and swallowed inside the helper; the original NotificationId still flows
back to the script and the underlying S&F enqueue still happened.

Mirrors the M2 Bundle F ExternalSystem.Call wrapper pattern.

Tests: 7 new tests in NotifySendAuditEmissionTests covering submitted-
status, list-name target, request-summary JSON shape, writer-throws
fail-safe, provenance, NotificationId/CorrelationId round-trip, and the
null-writer degrade path.
2026-05-20 16:18:46 -04:00
Joseph Doherty
6de377a39e feat(notif): emit NotifyDeliver(terminal) on terminal transitions (#23 M4)
M4 Bundle B (B3) — NotificationOutboxActor emits a second NotifyDeliver
audit row carrying the terminal AuditStatus whenever a notification
transitions to a terminal state (Delivered, Parked, Discarded).

- Dispatcher: after the B2 Attempted row, a Delivered or Parked row is
  emitted when the post-outcome status is terminal. Discarded is never
  produced by the dispatcher — only by the manual discard path.
- Missing-adapter park: now emits both Attempted and terminal Parked,
  both carrying the same explanatory error.
- Manual discard (DiscardAsync): after the row update, emits a terminal
  Discarded NotifyDeliver row with no error message (operator-driven
  cancellation, not a delivery error).
- MapNotificationStatusToAuditStatus + IsTerminal helpers added; terminal
  emission shares BuildNotifyDeliverEvent with the B2 Attempted path so
  the two rows carry identical correlation/provenance fields.

Audit failure NEVER aborts the user-facing action: every emission is
wrapped in try/catch (defensive — the CentralAuditWriter itself swallows).
2026-05-20 16:12:44 -04:00
Joseph Doherty
1dfd67a90d feat(notif): emit NotifyDeliver(Attempted) per dispatcher attempt (#23 M4)
M4 Bundle B (B2) — NotificationOutboxActor's dispatcher loop emits a single
AuditChannel.Notification / AuditKind.NotifyDeliver row with AuditStatus.Attempted
for every delivery attempt (success, transient failure, permanent failure,
and the missing-adapter park).

- BuildNotifyDeliverEvent helper populates correlation id (parsed from the
  string NotificationId — sites generate Guid.NewGuid().ToString("N"),
  non-Guid ids fall through as null), list-name target, source site/instance/script
  provenance, and Actor=null (central dispatch has no authenticated end-user).
- Attempt duration is measured around the adapter call and recorded as
  DurationMs so KPIs can compute per-attempt latency.
- Emission is fire-and-forget (the writer swallows internally) and wrapped
  in try/catch — audit failure NEVER aborts the user-facing dispatch.

Terminal-state emission lands separately in B3.
2026-05-20 16:08:06 -04:00
Joseph Doherty
b31747a632 feat(notif): NotificationOutboxActor + CentralAuditWriter wired (#23 M4)
M4 Bundle B (B1) — add the central-only ICentralAuditWriter implementation
and inject it into NotificationOutboxActor so subsequent tasks (B2/B3) can
route attempt + terminal lifecycle events through the direct-write audit path.

- CentralAuditWriter: thin wrapper around IAuditLogRepository.InsertIfNotExistsAsync;
  scope-per-call (matches AuditLogIngestActor / NotificationOutboxActor pattern);
  stamps IngestedAtUtc; swallows all internal failures (alog.md §13).
- Registered as a singleton in AddAuditLog.
- NotificationOutboxActor ctor takes ICentralAuditWriter (validated non-null).
- Host wiring resolves the writer once from the root provider and passes it
  into the singleton's Props.Create call.
- Existing TestKit fixtures updated with a NoOpCentralAuditWriter helper so
  tests that don't exercise audit emission still compile and pass.
2026-05-20 16:04:01 -04:00
Joseph Doherty
e4d902753b feat(siteruntime): emit DbOutbound.DbWrite on sync Database.Execute*/ExecuteReader (#23 M4)
Audit Log #23 — M4 Bundle A (Tasks A1+A2): every script-initiated
synchronous DB call routed through Database.Connection(name) now emits
exactly one DbOutbound/DbWrite audit row.

Implementation — three thin ADO.NET decorators in
src/ScadaLink.SiteRuntime/Scripts/:

  - AuditingDbConnection: wraps the gateway-returned DbConnection so
    CreateDbCommand() hands the script an AuditingDbCommand. All other
    ADO.NET surface forwards unchanged.
  - AuditingDbCommand: intercepts ExecuteNonQuery / ExecuteScalar /
    ExecuteReader (sync + async). On terminal:
      Channel = DbOutbound, Kind = DbWrite, Status = Delivered|Failed,
      Extra = {"op":"write","rowsAffected":N}   (Execute*),
              {"op":"read","rowsReturned":N}   (ExecuteReader),
      RequestSummary = JSON of SQL + parameter values (default capture;
                       redaction in M5),
      Target = "<connection>.<first 60 chars of SQL>",
      DurationMs captured via Stopwatch,
      Provenance from ScriptRuntimeContext (SourceSiteId,
                       SourceInstanceId, SourceScript).
  - AuditingDbDataReader: counts rows on Read/ReadAsync and fires the
    audit emission exactly once on Close/CloseAsync/Dispose.

DatabaseHelper now takes an IAuditWriter; ScriptRuntimeContext.Database
threads through _auditWriter. When the writer is null (tests / minimal
hosts) Connection() returns the raw inner DbConnection unchanged.

Best-effort emission (alog.md §7): mirrors M2 Bundle F's 3-layer
fail-safe — build, write, continuation. Audit-build, audit-write, and
audit-continuation faults are logged + swallowed; the original ADO.NET
result (or original exception) flows back to the script untouched. The
SiteAuditWriteFailures counter increments automatically through the
existing FallbackAuditWriter (Bundle G).

Tests — tests/ScadaLink.SiteRuntime.Tests/Scripts/DatabaseSyncEmissionTests.cs
(7 new, all passing):
  1. Execute / INSERT success — one DbWrite row, op=write, rowsAffected=1.
  2. ExecuteScalar success — one DbWrite row, op=write.
  3. Execute throws — Status=Failed, ErrorMessage + ErrorDetail set.
  4. ExecuteReader success — op=read, rowsReturned counts rows pulled.
  5. AuditWriter throws — original ADO.NET rowsAffected returned, no
     events captured, no exception propagates.
  6. Provenance populated from context.
  7. DurationMs recorded non-zero.

Tests use Microsoft.Data.Sqlite in-memory (already transitively
available via SiteRuntime). Total SiteRuntime test suite: 251 passing
(244 baseline + 7 new). Full solution test suite passes.
2026-05-20 15:54:54 -04:00
Joseph Doherty
c410fc6d43 docs(audit): add M4 remaining-boundaries implementation plan (#23)
5 bundles: DB sync emissions, NotificationOutbox central, site Notify.Send,
Inbound API middleware, integration tests. M3-reality vocab baked in
(DbWrite/NotifyDeliver/NotifySend/InboundRequest/InboundAuthFailure).
2026-05-20 15:41:13 -04:00
Joseph Doherty
f48efa7ca8 docs(audit): roadmap corrections after M3
M4 head now records M3 realities:
- Vocabulary translation table from pre-M1 spec strings to M1-aligned
  enum values (DbWrite vs SyncWrite/SyncRead; NotifyDeliver vs
  Notification.Attempt/Terminal; InboundRequest/InboundAuthFailure vs
  ApiInbound.Completed; Failed vs PermanentFailure).
- Mapper consolidation: 4 DTO mappers exist; extract single helper
  before M4 adds more channels.
- OnCachedTelemetryWithoutDualWriteAsync test-mode fallback may be
  deprecated in M4.
- Site SQLite drain for OperationTrackingStore: only dual-write
  transaction writes central today; plan drain if M4 needs in-flight
  tracking visibility.
- SiteCallAuditActor wired but unused on M3 hot path; M4/M6 natural
  first direct caller.
2026-05-20 15:40:33 -04:00
Joseph Doherty
d20e8f4e9d Merge branch 'feature/audit-log-m3-cached-operations': Audit Log #23 M3 Cached Operations + Dual-Write
M3 ships the cached-call lifecycle: ExternalSystem.CachedCall and
Database.CachedWrite each produce 3-5 audit rows + 1 SiteCalls row
sharing the same TrackedOperationId. Site emits the combined packet
(AuditEvent + SiteCallOperational); central writes both rows in one
MS SQL transaction.

Inlines the minimum-viable Site Call Audit (#22) surface:
SiteCalls table + ISiteCallAuditRepository + SiteCallAuditActor.
Reconciliation, KPIs, central->site Retry/Discard relay deferred.

Shipped (23 commits, ~120 net new tests, 24/24 test projects green):
- TrackedOperationId strong type + OperationTrackingStore site-local
  SQLite + Tracking.Status script API.
- CachedCallTelemetry combined operational+audit packet (additive per
  Commons REQ-COM-5a — never renamed CachedOperationTelemetry).
- SiteCalls MS SQL table + monotonic upsert repository (operational
  state, no partitioning) + migration.
- ScadaLink.SiteCallAudit new project + SiteCallAuditActor cluster
  singleton.
- sitestream.proto extended with IngestCachedTelemetry RPC +
  SiteCallOperationalDto + CachedTelemetryPacket/Batch.
- AuditLogIngestActor combined-telemetry handler with per-entry
  BeginTransactionAsync; rollback on either-throw; per-entry try/catch
  isolates failures; central singleton stays alive (Resume).
- ScriptRuntimeContext.ExternalSystem.CachedCall + Database.CachedWrite
  wrappers emit CachedSubmit on enqueue + handle immediate-success path
  (no S&F retry) with direct Attempted+CachedResolve emission.
- StoreAndForward observer hook (ICachedCallLifecycleObserver) +
  CachedCallLifecycleBridge translates S&F outcomes to combined
  telemetry; per-attempt rows carry Kind=ApiCallCached/DbWriteCached,
  Status=Attempted (HttpStatus/ErrorMessage capture success/failure);
  terminal carries Kind=CachedResolve, Status=Delivered/Failed/Parked/
  Discarded.
- Component-level e2e via TestKit + MsSqlMigrationFixture +
  DirectActorSiteStreamAuditClient extracted to shared Integration/
  Infrastructure/ + CombinedTelemetryHarness/Dispatcher helpers.
- Health metric SiteAuditWriteFailures still wired (M2). Bridge from
  ICachedCallTelemetryForwarder to AuditWriter chain.

Invariants honored: append-only AuditLog (writer role DENY UPDATE/DELETE
from M1); audit-failure-never-aborts-script (three-layer fail-safe
preserved); central singleton supervisor=Resume; idempotent at central
on EventId (M2 race-fix from Bundle A) + monotonic at central on
TrackedOperationId. infra/* never touched on any branch commit
(verified empty via 'git log main..feature/audit-log-m3-cached-operations -- infra/').

Site->central gRPC client still NoOpSiteStreamAuditClient in production
until M6; cached telemetry rows accumulate at site as Pending in
production.
2026-05-20 15:39:41 -04:00
Joseph Doherty
73a19c6f02 refactor(auditlog): remove dead-code ternary in CachedCallLifecycleBridge (#23 M3 final-review fix) 2026-05-20 15:39:10 -04:00
Joseph Doherty
c3d4e6b1e0 test(auditlog): combined telemetry idempotency on retried packets (#23 M3) 2026-05-20 15:27:14 -04:00
Joseph Doherty
f063b35633 test(auditlog): cached DB write combined telemetry end-to-end (#23 M3) 2026-05-20 15:26:04 -04:00
Joseph Doherty
f4a7be4929 test(auditlog): cached call combined telemetry end-to-end (#23 M3) 2026-05-20 15:25:10 -04:00
Joseph Doherty
a3b0fb7f08 refactor(auditlog-tests): extract DirectActorSiteStreamAuditClient + add IngestCachedTelemetry support (#23 M3) 2026-05-20 15:21:44 -04:00
Joseph Doherty
f81750b2aa fix(siteruntime): immediate-success CachedCall emits terminal telemetry (#23 M3)
Bundle E left a gap in ExternalSystem.CachedCall: when the underlying HTTP
call succeeds immediately (WasBuffered=false), the store-and-forward retry
loop is never engaged and the ICachedCallLifecycleObserver hook never
fires. As a result Tracking.Status(id) would stay in Submitted forever and
the audit log would be missing the Attempted + CachedResolve pair the M3
contract requires.

Fix: capture the ExternalCallResult returned by IExternalSystemClient.
CachedCallAsync. When WasBuffered=false, emit the two missing telemetry
packets from the helper itself:

- ApiCallCached / Attempted   (per-attempt mechanics row, HttpStatus +
                              ErrorMessage extracted via the same regex
                              the synchronous Call() audit row uses)
- CachedResolve / Delivered   on Success, or
- CachedResolve / Failed      on Success=false (immediate permanent
                              failure or transient failure without S&F).

The terminal CachedResolve row carries TerminalAtUtc so SiteCallAudit can
recognise the row as eligible for purge.

The WasBuffered=true path is unaffected — the S&F retry loop owns the
Attempted + Resolve emissions there via the CachedCallLifecycleBridge.
Database.CachedWrite is unaffected too because IDatabaseGateway.
CachedWriteAsync always enqueues into S&F (no immediate-success path).

Both new emissions are best-effort: a throwing forwarder is logged and
swallowed (alog.md §7) and each row is independently try/catch-wrapped so
a single fault cannot drop both halves of the terminal pair.

Tests in ExternalSystemCachedCallEmissionTests:
- CachedCall_ImmediateSuccess_EmitsAttemptedAndCachedResolve
- CachedCall_ImmediateFailure_EmitsAttemptedAndCachedResolveFailed
- CachedCall_BufferedPath_DoesNotEmitTerminalTelemetryFromHelper

Full suite: 244 SiteRuntime tests (3 new), 200 Host tests, all green.
2026-05-20 15:15:11 -04:00
Joseph Doherty
6fe23a4d9b feat(host): register SiteCallAuditActor + CachedCallTelemetry forwarder/bridge (#22, #23 M3)
M3 Bundle F (Task F1) wires the cached-call audit pipeline through the
composition roots:

- Central: register SiteCallAuditActor as a cluster singleton + proxy
  (mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs
  calls .AddSiteCallAudit() on the central role.
- Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge
  in AddAuditLog (lazy factory — Central nodes degrade to audit-only
  emission because IOperationTrackingStore is site-only).
- Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so
  StoreAndForwardService picks it up via DI.
- Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the
  site id to StoreAndForwardService without creating a
  StoreAndForward -> HealthMonitoring project-reference cycle.
- ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script
  scope and threads it into ScriptRuntimeContext.

CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now
nullable so Central DI validation succeeds with the lazy registration; the
forwarder's tracking-half emission is a no-op when the store is absent.

Tests:
- AkkaHostedServiceAuditWiringTests: Central host builds with
  AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site
  resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext.
- Full solution: 194 Host tests green, 241 SiteRuntime tests green, every
  other suite unchanged.
2026-05-20 15:10:47 -04:00