Commit Graph

735 Commits

Author SHA1 Message Date
Joseph Doherty
b0584f7a08 docs(audit): add M6 reconciliation+purge+partition+health plan (#23)
6 bundles: proto+site handler, reconciliation actor, purge actor with
drop-and-rebuild around UX index, partition maintenance, four health
metrics, integration tests. M5 realities baked in.
2026-05-20 17:44:12 -04:00
Joseph Doherty
db05af897e docs(audit): roadmap corrections after M5
M6 head records M5 realities:
- IOptionsMonitor hot-reload pattern verified; M6 retention config can
  reuse.
- AuditRedactionFailure counter site-only in M5; M6 wires central side.
- Filter integration is at 3 writer entry points; purge actor doesn't
  emit so no filter integration needed.
- SwitchOutPartitionAsync drop-and-rebuild dance required (M1 reality
  + M6-T4 already documents it).
- M6 should land the real ISiteStreamAuditClient (Option A) so push
  telemetry leaves NoOp behind.
2026-05-20 17:43:44 -04:00
Joseph Doherty
adc490b690 Merge branch 'feature/audit-log-m5-payload-redaction': Audit Log #23 M5 Payload + Redaction
M5 ships the payload filter pipeline. IAuditPayloadFilter runs between
event construction and writer call:
- Stage 1: HTTP header redaction (Authorization/Cookie/Set-Cookie/X-Api-Key
  default list from M1-T9; case-insensitive name match against JSON
  {headers,body} shape).
- Stage 2: Body regex redaction (global + per-target). Patterns compiled
  at startup with 100ms budget; runtime 50ms timeout guard against
  catastrophic backtracking. Over-redact on exception + increment counter.
- Stage 3: SQL parameter redaction (Channel=DbOutbound, per-connection
  opt-in via PerTargetOverrides[connection].RedactSqlParamsMatching).
- Stage 4: UTF-8 boundary-safe truncation. Default cap 8 KB; error cap
  64 KB on Status NOT IN (Delivered/Submitted/Forwarded). PayloadTruncated
  set to true when applied.

Filter wired into all three writer entry points:
- FallbackAuditWriter (site chain) — filter before SqliteAuditWriter.
- CentralAuditWriter (central direct-write) — filter before
  IAuditLogRepository.InsertIfNotExistsAsync (NotificationOutbox dispatcher,
  AuditWriteMiddleware).
- AuditLogIngestActor — filter before dual-write transaction.

Health metric SiteAuditRedactionFailureCounter wired through the existing
M2 Bundle G + M4 Bundle B health-bridge pattern; central-side counter
deferred to M6 (the milestone that ships the full central health surface).

Hot-reload via IOptionsMonitor + per-call CurrentValue read. Regex cache
keyed by pattern string so changing the config naturally invalidates old
patterns.

Shipped: 11 commits, ~49 net new tests across AuditLog.Tests,
HealthMonitoring.Tests, PerformanceTests. Full solution 24/24 test projects
green. infra/* untouched on any branch commit.
2026-05-20 17:43:19 -04:00
Joseph Doherty
1856b63f0c test(auditlog): redaction safety net edge cases (#23 M5) 2026-05-20 17:38:59 -04:00
Joseph Doherty
4eeda45f0e test(auditlog): hot-path latency budget for IAuditPayloadFilter (#23 M5) 2026-05-20 17:36:29 -04:00
Joseph Doherty
b409afda2e feat(auditlog): hot-reloadable AuditLogOptions + regex cache invalidation (#23 M5) 2026-05-20 17:35:15 -04:00
Joseph Doherty
23c0fd417e feat(health): AuditRedactionFailure counter + bridge (#23 M5)
Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor
over-redactions as a Site Health metric so a misconfigured /
catastrophic regex shows up on /monitoring/health rather than
disappearing into a NoOp sink.

  - SiteHealthReport: new 'AuditRedactionFailure' int field
    (defaulted to 0 for back-compat with existing producers/tests).
  - ISiteHealthCollector / SiteHealthCollector:
    new IncrementAuditRedactionFailure() — per-interval atomic
    counter with Interlocked, reset on CollectReport, mirroring
    the M2 Bundle G SiteAuditWriteFailures pattern.
  - HealthMetricsAuditRedactionFailureCounter: new bridge in
    ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter
    increments to ISiteHealthCollector — mirrors
    HealthMetricsAuditWriteFailureCounter one-for-one.
  - AddAuditLogHealthMetricsBridge: now ALSO Replaces the
    NoOpAuditRedactionFailureCounter binding with the health-metrics
    bridge, so a single AddAuditLogHealthMetricsBridge() call wires
    both the M2 Bundle G write-failure counter and the M5 Bundle C
    redaction-failure counter into the health report.

Site-side only for M5 — the filter also runs on CentralAuditWriter
and AuditLogIngestActor (where it just keeps the NoOp default), but
a central-side health-metric surface for AuditRedactionFailure is
deferred to M6 alongside the rest of the central health collector
work.

Tests:
  - AuditRedactionFailureMetricTests (HealthMonitoring) covers the
    SiteHealthCollector increment/report/reset shape (3 tests).
  - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers
    the AuditLog → HealthMonitoring bridge (3 tests).
  - Existing CountCapturingHealthCollector stub in
    DeploymentManagerRedeployTests extended with the new no-op
    interface method.

Verified: dotnet build clean, all 24 test projects green
(the only Failed at first ScadaLink.SiteRuntime.Tests run was the
known-flaky InstanceActorChildAttributeRaceTests; passes on re-run
in isolation and full suite, unrelated to these changes).
2026-05-20 17:28:33 -04:00
Joseph Doherty
9b1379ed9b feat(auditlog): wire IAuditPayloadFilter into all writer paths (#23 M5)
Bundle C task M5-T6 — plugs the IAuditPayloadFilter singleton into the
three audit writer entry points so every event is truncated + redacted
before persistence, regardless of which path it took to disk:

  - FallbackAuditWriter (site hot path): filter runs before the primary
    SQLite write AND the ring-buffer enqueue, so a recovery drain replays
    rows that are already capped/redacted.
  - CentralAuditWriter (central direct-write): filter runs before the
    per-call IAuditLogRepository.InsertIfNotExistsAsync.
  - AuditLogIngestActor (site→central telemetry):
      - OnIngestAsync resolves the filter from the per-message scope and
        applies it to each row before IngestedAtUtc stamping.
      - OnCachedTelemetryAsync (M3 dual-write) applies the filter to the
        audit half of every CachedTelemetryEntry before the audit-insert
        + site-call-upsert transaction.

Filter parameter is optional (nullable) on each constructor so the
existing test composition roots that don't pass one keep working unchanged
— production DI wiring in AddAuditLog always passes the real filter
through. ICentralAuditWriter registration switched from the open-ctor
form to a factory so the filter flows through it.

Tests: FilterIntegrationTests covers all three writer paths end-to-end
(4 tests). Full ScadaLink.AuditLog.Tests suite: 146 passed, 0 failed,
0 skipped.
2026-05-20 17:21:57 -04:00
Joseph Doherty
5a7f3e8bf6 feat(auditlog): per-connection SQL parameter redaction opt-in (#23 M5) 2026-05-20 17:11:53 -04:00
Joseph Doherty
37f17dc4a8 feat(auditlog): body regex redaction with over-redaction safety net (#23 M5) 2026-05-20 17:09:36 -04:00
Joseph Doherty
ad7b330f43 feat(auditlog): HTTP header redaction stage (#23 M5) 2026-05-20 17:07:01 -04:00
Joseph Doherty
bba2ef1b4d feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety (#23 M5) 2026-05-20 17:01:13 -04:00
Joseph Doherty
25cdf857c9 feat(auditlog): IAuditPayloadFilter contract (#23 M5) 2026-05-20 16:59:10 -04:00
Joseph Doherty
e7b40c1c50 docs(audit): add M5 payload+redaction implementation plan (#23)
4 bundles: filter+truncation, redactors (header/body/SQL-param), wire
into all emission paths + health metric, config+perf+safety-net.

Vocabulary translation locked: error-row cap (64 KB) on Status NOT IN
(Delivered, Submitted, Forwarded). Filter integration point in each
writer (FallbackAuditWriter, CentralAuditWriter, AuditLogIngestActor)
BEFORE storage call.
2026-05-20 16:56:56 -04:00
Joseph Doherty
dae6de2c48 docs(audit): roadmap corrections after M4
M5 head records M4 realities:
- AuditingDbConnection/Command/DataReader decorators need filter plug-in
  at WriteAsync emission point.
- CentralAuditWriter + FallbackAuditWriter are both filter integration
  points for the direct-write + chained-write paths.
- InboundAPI middleware RequestSummary populated, ResponseSummary=null
  pending response-body buffering decision in M5.
- UseWhen(/api/) path-scoped middleware gives natural per-target
  redaction hook.
- Error-row cap raised on Status IN (Failed, Parked, Discarded,
  Attempted, Skipped) per M1 vocab reconciliation.
2026-05-20 16:56:18 -04:00
Joseph Doherty
ac7fc9ce4d Merge branch 'feature/audit-log-m4-remaining-boundaries': Audit Log #23 M4 Remaining Boundary Emission
M4 closes the script-trust-boundary emission gaps:
- Sync DB writes/reads via AuditingDbConnection decorator (Channel=DbOutbound,
  Kind=DbWrite; Extra carries op + rowsAffected/rowsReturned).
- Notification Outbox dispatcher: NotifyDeliver(Attempted) per attempt;
  NotifyDeliver(Delivered/Parked/Discarded) on terminal. Direct-write via
  new ICentralAuditWriter (CentralAuditWriter implementation wraps
  IAuditLogRepository.InsertIfNotExistsAsync with scope-per-call).
- Site Notify.To().Send() emits NotifySend(Submitted) via the existing
  IAuditWriter site path; correlation via NotificationId.
- Inbound API AuditWriteMiddleware emits InboundRequest on success,
  InboundAuthFailure on 401/403; Actor = API key NAME (never material);
  registered via UseWhen(/api/) AFTER UseAuthentication/UseAuthorization;
  audit failure NEVER changes HTTP response.

Audit-write-failure-never-aborts-action proven end-to-end across all five
new code paths via AuditWriteFailureSafetyTests (broken ICentralAuditWriter
+ broken IAuditWriter scenarios all green).

Shipped: 12 commits, ~62 net new tests across SiteRuntime / NotificationOutbox
/ AuditLog / InboundAPI tests. Full solution 2763 tests passing. No
regressions. infra/* untouched on any branch commit.
2026-05-20 16:55:45 -04:00
Joseph Doherty
065c8259ae test(auditlog): audit failures never abort user-facing actions (#23 M4) 2026-05-20 16:50:48 -04:00
Joseph Doherty
a7eea0a795 test(auditlog): Inbound API request audit end-to-end (#23 M4) 2026-05-20 16:48:27 -04:00
Joseph Doherty
02727b3a66 test(auditlog): Notify dispatcher audit trail end-to-end (#23 M4) 2026-05-20 16:47:09 -04:00
Joseph Doherty
56b26339ca test(auditlog): DB sync emission end-to-end (#23 M4) 2026-05-20 16:43:55 -04:00
Joseph Doherty
1c862989b4 feat(inbound): register AuditWriteMiddleware in pipeline (#23 M4) 2026-05-20 16:35:13 -04:00
Joseph Doherty
3c3f7770c1 feat(inbound): AuditWriteMiddleware emitting InboundRequest/InboundAuthFailure (#23 M4) 2026-05-20 16:35:03 -04:00
Joseph Doherty
855df759b5 feat(siteruntime): emit NotifySend(Submitted) on site-side Notify.To().Send (#23 M4)
Audit Log #23 M4 Bundle C — Task C1: every script-initiated
Notify.To(list).Send(...) now emits exactly one
Notification/NotifySend audit row via the IAuditWriter wired through
ScriptRuntimeContext. The row carries Status=Submitted,
Target=list name, RequestSummary={subject,body} JSON (M5 will redact),
CorrelationId=NotificationId (parsed as Guid), provenance from context,
ForwardState=Pending.

Emission is best-effort per alog.md §7: a thrown audit writer is logged
and swallowed inside the helper; the original NotificationId still flows
back to the script and the underlying S&F enqueue still happened.

Mirrors the M2 Bundle F ExternalSystem.Call wrapper pattern.

Tests: 7 new tests in NotifySendAuditEmissionTests covering submitted-
status, list-name target, request-summary JSON shape, writer-throws
fail-safe, provenance, NotificationId/CorrelationId round-trip, and the
null-writer degrade path.
2026-05-20 16:18:46 -04:00
Joseph Doherty
6de377a39e feat(notif): emit NotifyDeliver(terminal) on terminal transitions (#23 M4)
M4 Bundle B (B3) — NotificationOutboxActor emits a second NotifyDeliver
audit row carrying the terminal AuditStatus whenever a notification
transitions to a terminal state (Delivered, Parked, Discarded).

- Dispatcher: after the B2 Attempted row, a Delivered or Parked row is
  emitted when the post-outcome status is terminal. Discarded is never
  produced by the dispatcher — only by the manual discard path.
- Missing-adapter park: now emits both Attempted and terminal Parked,
  both carrying the same explanatory error.
- Manual discard (DiscardAsync): after the row update, emits a terminal
  Discarded NotifyDeliver row with no error message (operator-driven
  cancellation, not a delivery error).
- MapNotificationStatusToAuditStatus + IsTerminal helpers added; terminal
  emission shares BuildNotifyDeliverEvent with the B2 Attempted path so
  the two rows carry identical correlation/provenance fields.

Audit failure NEVER aborts the user-facing action: every emission is
wrapped in try/catch (defensive — the CentralAuditWriter itself swallows).
2026-05-20 16:12:44 -04:00
Joseph Doherty
1dfd67a90d feat(notif): emit NotifyDeliver(Attempted) per dispatcher attempt (#23 M4)
M4 Bundle B (B2) — NotificationOutboxActor's dispatcher loop emits a single
AuditChannel.Notification / AuditKind.NotifyDeliver row with AuditStatus.Attempted
for every delivery attempt (success, transient failure, permanent failure,
and the missing-adapter park).

- BuildNotifyDeliverEvent helper populates correlation id (parsed from the
  string NotificationId — sites generate Guid.NewGuid().ToString("N"),
  non-Guid ids fall through as null), list-name target, source site/instance/script
  provenance, and Actor=null (central dispatch has no authenticated end-user).
- Attempt duration is measured around the adapter call and recorded as
  DurationMs so KPIs can compute per-attempt latency.
- Emission is fire-and-forget (the writer swallows internally) and wrapped
  in try/catch — audit failure NEVER aborts the user-facing dispatch.

Terminal-state emission lands separately in B3.
2026-05-20 16:08:06 -04:00
Joseph Doherty
b31747a632 feat(notif): NotificationOutboxActor + CentralAuditWriter wired (#23 M4)
M4 Bundle B (B1) — add the central-only ICentralAuditWriter implementation
and inject it into NotificationOutboxActor so subsequent tasks (B2/B3) can
route attempt + terminal lifecycle events through the direct-write audit path.

- CentralAuditWriter: thin wrapper around IAuditLogRepository.InsertIfNotExistsAsync;
  scope-per-call (matches AuditLogIngestActor / NotificationOutboxActor pattern);
  stamps IngestedAtUtc; swallows all internal failures (alog.md §13).
- Registered as a singleton in AddAuditLog.
- NotificationOutboxActor ctor takes ICentralAuditWriter (validated non-null).
- Host wiring resolves the writer once from the root provider and passes it
  into the singleton's Props.Create call.
- Existing TestKit fixtures updated with a NoOpCentralAuditWriter helper so
  tests that don't exercise audit emission still compile and pass.
2026-05-20 16:04:01 -04:00
Joseph Doherty
e4d902753b feat(siteruntime): emit DbOutbound.DbWrite on sync Database.Execute*/ExecuteReader (#23 M4)
Audit Log #23 — M4 Bundle A (Tasks A1+A2): every script-initiated
synchronous DB call routed through Database.Connection(name) now emits
exactly one DbOutbound/DbWrite audit row.

Implementation — three thin ADO.NET decorators in
src/ScadaLink.SiteRuntime/Scripts/:

  - AuditingDbConnection: wraps the gateway-returned DbConnection so
    CreateDbCommand() hands the script an AuditingDbCommand. All other
    ADO.NET surface forwards unchanged.
  - AuditingDbCommand: intercepts ExecuteNonQuery / ExecuteScalar /
    ExecuteReader (sync + async). On terminal:
      Channel = DbOutbound, Kind = DbWrite, Status = Delivered|Failed,
      Extra = {"op":"write","rowsAffected":N}   (Execute*),
              {"op":"read","rowsReturned":N}   (ExecuteReader),
      RequestSummary = JSON of SQL + parameter values (default capture;
                       redaction in M5),
      Target = "<connection>.<first 60 chars of SQL>",
      DurationMs captured via Stopwatch,
      Provenance from ScriptRuntimeContext (SourceSiteId,
                       SourceInstanceId, SourceScript).
  - AuditingDbDataReader: counts rows on Read/ReadAsync and fires the
    audit emission exactly once on Close/CloseAsync/Dispose.

DatabaseHelper now takes an IAuditWriter; ScriptRuntimeContext.Database
threads through _auditWriter. When the writer is null (tests / minimal
hosts) Connection() returns the raw inner DbConnection unchanged.

Best-effort emission (alog.md §7): mirrors M2 Bundle F's 3-layer
fail-safe — build, write, continuation. Audit-build, audit-write, and
audit-continuation faults are logged + swallowed; the original ADO.NET
result (or original exception) flows back to the script untouched. The
SiteAuditWriteFailures counter increments automatically through the
existing FallbackAuditWriter (Bundle G).

Tests — tests/ScadaLink.SiteRuntime.Tests/Scripts/DatabaseSyncEmissionTests.cs
(7 new, all passing):
  1. Execute / INSERT success — one DbWrite row, op=write, rowsAffected=1.
  2. ExecuteScalar success — one DbWrite row, op=write.
  3. Execute throws — Status=Failed, ErrorMessage + ErrorDetail set.
  4. ExecuteReader success — op=read, rowsReturned counts rows pulled.
  5. AuditWriter throws — original ADO.NET rowsAffected returned, no
     events captured, no exception propagates.
  6. Provenance populated from context.
  7. DurationMs recorded non-zero.

Tests use Microsoft.Data.Sqlite in-memory (already transitively
available via SiteRuntime). Total SiteRuntime test suite: 251 passing
(244 baseline + 7 new). Full solution test suite passes.
2026-05-20 15:54:54 -04:00
Joseph Doherty
c410fc6d43 docs(audit): add M4 remaining-boundaries implementation plan (#23)
5 bundles: DB sync emissions, NotificationOutbox central, site Notify.Send,
Inbound API middleware, integration tests. M3-reality vocab baked in
(DbWrite/NotifyDeliver/NotifySend/InboundRequest/InboundAuthFailure).
2026-05-20 15:41:13 -04:00
Joseph Doherty
f48efa7ca8 docs(audit): roadmap corrections after M3
M4 head now records M3 realities:
- Vocabulary translation table from pre-M1 spec strings to M1-aligned
  enum values (DbWrite vs SyncWrite/SyncRead; NotifyDeliver vs
  Notification.Attempt/Terminal; InboundRequest/InboundAuthFailure vs
  ApiInbound.Completed; Failed vs PermanentFailure).
- Mapper consolidation: 4 DTO mappers exist; extract single helper
  before M4 adds more channels.
- OnCachedTelemetryWithoutDualWriteAsync test-mode fallback may be
  deprecated in M4.
- Site SQLite drain for OperationTrackingStore: only dual-write
  transaction writes central today; plan drain if M4 needs in-flight
  tracking visibility.
- SiteCallAuditActor wired but unused on M3 hot path; M4/M6 natural
  first direct caller.
2026-05-20 15:40:33 -04:00
Joseph Doherty
d20e8f4e9d Merge branch 'feature/audit-log-m3-cached-operations': Audit Log #23 M3 Cached Operations + Dual-Write
M3 ships the cached-call lifecycle: ExternalSystem.CachedCall and
Database.CachedWrite each produce 3-5 audit rows + 1 SiteCalls row
sharing the same TrackedOperationId. Site emits the combined packet
(AuditEvent + SiteCallOperational); central writes both rows in one
MS SQL transaction.

Inlines the minimum-viable Site Call Audit (#22) surface:
SiteCalls table + ISiteCallAuditRepository + SiteCallAuditActor.
Reconciliation, KPIs, central->site Retry/Discard relay deferred.

Shipped (23 commits, ~120 net new tests, 24/24 test projects green):
- TrackedOperationId strong type + OperationTrackingStore site-local
  SQLite + Tracking.Status script API.
- CachedCallTelemetry combined operational+audit packet (additive per
  Commons REQ-COM-5a — never renamed CachedOperationTelemetry).
- SiteCalls MS SQL table + monotonic upsert repository (operational
  state, no partitioning) + migration.
- ScadaLink.SiteCallAudit new project + SiteCallAuditActor cluster
  singleton.
- sitestream.proto extended with IngestCachedTelemetry RPC +
  SiteCallOperationalDto + CachedTelemetryPacket/Batch.
- AuditLogIngestActor combined-telemetry handler with per-entry
  BeginTransactionAsync; rollback on either-throw; per-entry try/catch
  isolates failures; central singleton stays alive (Resume).
- ScriptRuntimeContext.ExternalSystem.CachedCall + Database.CachedWrite
  wrappers emit CachedSubmit on enqueue + handle immediate-success path
  (no S&F retry) with direct Attempted+CachedResolve emission.
- StoreAndForward observer hook (ICachedCallLifecycleObserver) +
  CachedCallLifecycleBridge translates S&F outcomes to combined
  telemetry; per-attempt rows carry Kind=ApiCallCached/DbWriteCached,
  Status=Attempted (HttpStatus/ErrorMessage capture success/failure);
  terminal carries Kind=CachedResolve, Status=Delivered/Failed/Parked/
  Discarded.
- Component-level e2e via TestKit + MsSqlMigrationFixture +
  DirectActorSiteStreamAuditClient extracted to shared Integration/
  Infrastructure/ + CombinedTelemetryHarness/Dispatcher helpers.
- Health metric SiteAuditWriteFailures still wired (M2). Bridge from
  ICachedCallTelemetryForwarder to AuditWriter chain.

Invariants honored: append-only AuditLog (writer role DENY UPDATE/DELETE
from M1); audit-failure-never-aborts-script (three-layer fail-safe
preserved); central singleton supervisor=Resume; idempotent at central
on EventId (M2 race-fix from Bundle A) + monotonic at central on
TrackedOperationId. infra/* never touched on any branch commit
(verified empty via 'git log main..feature/audit-log-m3-cached-operations -- infra/').

Site->central gRPC client still NoOpSiteStreamAuditClient in production
until M6; cached telemetry rows accumulate at site as Pending in
production.
2026-05-20 15:39:41 -04:00
Joseph Doherty
73a19c6f02 refactor(auditlog): remove dead-code ternary in CachedCallLifecycleBridge (#23 M3 final-review fix) 2026-05-20 15:39:10 -04:00
Joseph Doherty
c3d4e6b1e0 test(auditlog): combined telemetry idempotency on retried packets (#23 M3) 2026-05-20 15:27:14 -04:00
Joseph Doherty
f063b35633 test(auditlog): cached DB write combined telemetry end-to-end (#23 M3) 2026-05-20 15:26:04 -04:00
Joseph Doherty
f4a7be4929 test(auditlog): cached call combined telemetry end-to-end (#23 M3) 2026-05-20 15:25:10 -04:00
Joseph Doherty
a3b0fb7f08 refactor(auditlog-tests): extract DirectActorSiteStreamAuditClient + add IngestCachedTelemetry support (#23 M3) 2026-05-20 15:21:44 -04:00
Joseph Doherty
f81750b2aa fix(siteruntime): immediate-success CachedCall emits terminal telemetry (#23 M3)
Bundle E left a gap in ExternalSystem.CachedCall: when the underlying HTTP
call succeeds immediately (WasBuffered=false), the store-and-forward retry
loop is never engaged and the ICachedCallLifecycleObserver hook never
fires. As a result Tracking.Status(id) would stay in Submitted forever and
the audit log would be missing the Attempted + CachedResolve pair the M3
contract requires.

Fix: capture the ExternalCallResult returned by IExternalSystemClient.
CachedCallAsync. When WasBuffered=false, emit the two missing telemetry
packets from the helper itself:

- ApiCallCached / Attempted   (per-attempt mechanics row, HttpStatus +
                              ErrorMessage extracted via the same regex
                              the synchronous Call() audit row uses)
- CachedResolve / Delivered   on Success, or
- CachedResolve / Failed      on Success=false (immediate permanent
                              failure or transient failure without S&F).

The terminal CachedResolve row carries TerminalAtUtc so SiteCallAudit can
recognise the row as eligible for purge.

The WasBuffered=true path is unaffected — the S&F retry loop owns the
Attempted + Resolve emissions there via the CachedCallLifecycleBridge.
Database.CachedWrite is unaffected too because IDatabaseGateway.
CachedWriteAsync always enqueues into S&F (no immediate-success path).

Both new emissions are best-effort: a throwing forwarder is logged and
swallowed (alog.md §7) and each row is independently try/catch-wrapped so
a single fault cannot drop both halves of the terminal pair.

Tests in ExternalSystemCachedCallEmissionTests:
- CachedCall_ImmediateSuccess_EmitsAttemptedAndCachedResolve
- CachedCall_ImmediateFailure_EmitsAttemptedAndCachedResolveFailed
- CachedCall_BufferedPath_DoesNotEmitTerminalTelemetryFromHelper

Full suite: 244 SiteRuntime tests (3 new), 200 Host tests, all green.
2026-05-20 15:15:11 -04:00
Joseph Doherty
6fe23a4d9b feat(host): register SiteCallAuditActor + CachedCallTelemetry forwarder/bridge (#22, #23 M3)
M3 Bundle F (Task F1) wires the cached-call audit pipeline through the
composition roots:

- Central: register SiteCallAuditActor as a cluster singleton + proxy
  (mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs
  calls .AddSiteCallAudit() on the central role.
- Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge
  in AddAuditLog (lazy factory — Central nodes degrade to audit-only
  emission because IOperationTrackingStore is site-only).
- Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so
  StoreAndForwardService picks it up via DI.
- Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the
  site id to StoreAndForwardService without creating a
  StoreAndForward -> HealthMonitoring project-reference cycle.
- ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script
  scope and threads it into ScriptRuntimeContext.

CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now
nullable so Central DI validation succeeds with the lazy registration; the
forwarder's tracking-half emission is a no-op when the store is absent.

Tests:
- AkkaHostedServiceAuditWiringTests: Central host builds with
  AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site
  resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext.
- Full solution: 194 Host tests green, 241 SiteRuntime tests green, every
  other suite unchanged.
2026-05-20 15:10:47 -04:00
Joseph Doherty
047988e4c8 feat(siteruntime): Database.CachedWrite emits combined telemetry + S&F audit bridge (#23 M3)
Wire the M3 cached-call audit pipeline end-to-end for the database
channel and close the loop between the S&F lifecycle observer and the
site-side dual emitter.

* DatabaseCachedWriteEmissionTests covers Database.CachedWrite (set up
  in Bundle E3): mints a TrackedOperationId, emits one CachedSubmit
  packet on DbOutbound, threads the id into IDatabaseGateway, and is
  best-effort on a thrown forwarder. Mirrors ExternalSystem.CachedCall
  coverage from E3.

* CachedCallLifecycleBridge (new) implements ICachedCallLifecycleObserver
  and lives alongside CachedCallTelemetryForwarder. The bridge ingests
  per-attempt notifications from the S&F retry loop and fans them out
  to the forwarder:
    - TransientFailure -> 1 Attempted row
    - Delivered        -> Attempted + CachedResolve(Delivered)
    - PermanentFailure -> Attempted + CachedResolve(Parked)
    - ParkedMaxRetries -> Attempted + CachedResolve(Parked)
  Channel string -> AuditKind mapping (ApiOutbound->ApiCallCached,
  DbOutbound->DbWriteCached). Best-effort top-level catch swallows any
  unexpected throw so the S&F retry bookkeeping is never disturbed.

* Bridge tests (7) cover all four outcomes, channel mapping, provenance
  propagation, and the no-throw-on-forwarder-failure contract.

Bundle F (Host registration) will instantiate the bridge and inject it
into StoreAndForwardService.cachedCallObserver, closing the wiring path
end-to-end.

Bundle E task E6.
2026-05-20 14:55:17 -04:00
Joseph Doherty
63eb1f4225 feat(snf): per-attempt and terminal cached-call lifecycle observer (#23 M3)
Hook the store-and-forward retry loop so the audit pipeline can emit
per-attempt + terminal telemetry under the original TrackedOperationId
(Bundle E Tasks E4 + E5).

New seam:

* ICachedCallLifecycleObserver + CachedCallAttemptContext in
  Commons.Interfaces.Services. Outcome enum
  (Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries)
  is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F)
  will map it to the AuditKind/AuditStatus pair when building the
  CachedCallTelemetry packet.

* StoreAndForwardService gains an optional cachedCallObserver
  constructor parameter + siteId. RetryMessageAsync fires the observer
  exactly once per attempt with the appropriate outcome:
    - handler returns true               -> Delivered
    - handler returns false              -> PermanentFailure (and parks)
    - handler throws + retries remaining -> TransientFailure
    - handler throws + max retries hit   -> ParkedMaxRetries (and parks)

Hook is best-effort: a thrown observer is logged + swallowed so a
failing audit pipeline can never be misclassified as a transient
delivery failure or corrupt the retry-count bookkeeping (alog.md §7).

Only cached-call categories (ExternalSystem, CachedDbWrite) generate
notifications — Notification category has its own central-side
audit pipeline (Notification Outbox / #21).

Pre-M3 callers that didn't thread a TrackedOperationId into the S&F
message id are silently skipped — the observer requires a parseable id
by contract. New S&F callers stamp the id as messageId (Bundle E3).

Bundle E tasks E4 + E5.
2026-05-20 14:52:34 -04:00
Joseph Doherty
42430dd10a feat(siteruntime): ExternalSystem.CachedCall emits CachedSubmit telemetry (#23 M3)
Rework ScriptRuntimeContext.ExternalSystem.CachedCall to fit the M3
combined-telemetry model:

* Mints a fresh TrackedOperationId and emits one CachedSubmit packet
  via ICachedCallTelemetryForwarder BEFORE handing the call off — the
  SiteCalls row is materialised before the first delivery attempt so
  Tracking.Status(id) can observe a Submitted row even if immediate
  delivery resolves before the helper returns.
* Threads the TrackedOperationId into IExternalSystemClient.CachedCallAsync
  as a new optional parameter (and into IDatabaseGateway.CachedWriteAsync
  for the Database mirror set up here for E6). The gateway uses the id
  as the StoreAndForward messageId so the retry loop (Tasks E4/E5) can
  recover it from StoreAndForwardMessage.Id.
* Returns the TrackedOperationId rather than ExternalCallResult — the
  script's contract is now "get a tracking handle, observe outcome via
  Tracking.Status". Best-effort emission: a thrown forwarder is logged
  + swallowed; the original call still runs and the id is still returned.

DatabaseHelper gets the matching siteId / sourceScript / forwarder
fields and a parallel CachedSubmit emitter (Channel=DbOutbound) so Task
E6's Database.CachedWrite mirror plugs in without further runtime
wiring.

New ICachedCallTelemetryForwarder seam in Commons.Interfaces.Services
so SiteRuntime depends on Commons (existing arrow) rather than
ScadaLink.AuditLog (would have introduced a new dependency).

Bundle E task E3 (and helper-shape work for E6).
2026-05-20 14:48:05 -04:00
Joseph Doherty
2145b29d4d feat(auditlog): CachedCallTelemetryForwarder for site-side dual emission (#23 M3)
Sister to SiteAuditTelemetryActor: takes a combined CachedCallTelemetry
packet and fans it out to the two site-local stores.

* AuditEvent half writes through IAuditWriter (the M2 FallbackAuditWriter
  + SqliteAuditWriter chain — same site SQLite hot-path as sync calls).
* SiteCallOperational half maps Audit.Kind to the matching
  IOperationTrackingStore method:
    - CachedSubmit  -> RecordEnqueueAsync (insert-if-not-exists)
    - ApiCallCached / DbWriteCached -> RecordAttemptAsync (monotonic)
    - CachedResolve -> RecordTerminalAsync (first-write-wins)

Best-effort contract (alog.md §7): independent try/catch per half so a
thrown writer cannot starve the tracking row (and vice-versa); both
failures are logged at warning level and swallowed — the calling script
never sees them.

Wire push deferred to M6 — the NoOp ISiteStreamAuditClient binding stays
in effect; the forwarder writes only to the local stores in M3. The
existing SiteAuditTelemetryActor drain loop will sweep the audit rows
once a real gRPC client lands.

Bundle E task E2.
2026-05-20 14:41:15 -04:00
Joseph Doherty
73719ee066 feat(auditlog): extend ISiteStreamAuditClient with IngestCachedTelemetryAsync (#23 M3)
Add the second site→central RPC seam alongside the existing M2
IngestAuditEventsAsync. The Bundle D proto already lit up
IngestCachedTelemetry (CachedTelemetryBatch / IngestAck) so this commit
just plumbs the client-side abstraction:

* ISiteStreamAuditClient gains IngestCachedTelemetryAsync(batch, ct).
* NoOpSiteStreamAuditClient implements it returning an empty IngestAck
  (same shape as M2 — production gRPC client lands in M6).
* SyncCallEmissionEndToEndTests' DirectActorSiteStreamAuditClient stub
  throws NotSupportedException from the new method so a regression that
  accidentally routes a cached packet through the sync stub fails loudly.
* New NoOpSiteStreamAuditClientTests cover the null-guard + empty-ack
  contract for both batch shapes.

Bundle E task E1.
2026-05-20 14:39:24 -04:00
Joseph Doherty
0a97fff906 feat(auditlog): combined telemetry dual-write transaction (#23 M3) 2026-05-20 14:33:14 -04:00
Joseph Doherty
2b54290c7f feat(comms): IngestCachedTelemetry RPC + CachedTelemetryPacket proto (#23 M3) 2026-05-20 14:24:13 -04:00
Joseph Doherty
de110f8b42 feat(scaudit): SiteCallAuditActor minimum surface (#22, #23 M3)
Bundle C of Audit Log #23 M3.

Adds the ScadaLink.SiteCallAudit project + matching tests project, mirroring
the ScadaLink.AuditLog scaffolding pattern (net10.0, central package
management, InternalsVisibleTo to the tests assembly).

SiteCallAuditActor is the central singleton entry point for Site Call Audit
(#22): it receives UpsertSiteCallCommand and persists the SiteCall via
ISiteCallAuditRepository.UpsertAsync (monotonic, idempotent — out-of-order
or duplicate updates are silent no-ops at the repo). Audit-write failures
NEVER abort the user-facing action (CLAUDE.md): repository throws are
caught + logged, the actor replies Accepted=false, and the singleton stays
alive (Resume supervisor strategy as defence in depth).

Two constructors mirror AuditLogIngestActor:
- IServiceProvider production constructor resolves the scoped EF repository
  from a fresh DI scope per message.
- ISiteCallAuditRepository test constructor injects a concrete repository so
  the TestKit tests exercise the real monotonic-upsert SQL end to end.

UpsertSiteCallCommand + UpsertSiteCallReply live in ScadaLink.Commons (same
home as IngestAuditEventsCommand) so Bundle D's gRPC server can construct
them without taking a project reference on the actor's host project.

AddSiteCallAudit() is a placeholder for symmetry with AddAuditLog /
AddNotificationOutbox; Bundle F will populate it with the actor's Props
factory + options bindings.

Tests (Akka.TestKit.Xunit2 + MsSqlMigrationFixture via project ref to
ScadaLink.ConfigurationDatabase.Tests, mirroring Bundle D2):
- Receive_UpsertSiteCallCommand_Persists_Replies_Accepted
- Receive_DuplicateUpsert_OlderStatus_NoOp_StillRepliesAccepted (idempotency)
- Receive_RepoThrowsTransient_RepliesAccepted_False_ActorStaysAlive

Reconciliation, KPIs, and the central->site Retry/Discard relay are
deferred per CLAUDE.md scope discipline.

ScadaLink.slnx updated to include both new projects.

All 3 new tests pass against the running infra/mssql container; full suite
(2683 tests across 27 projects) passes with no regressions.
2026-05-20 14:18:49 -04:00
Joseph Doherty
bedfa6b8f3 feat(configdb): ISiteCallAuditRepository + EF impl, monotonic upsert (#22, #23 M3)
Bundle B3 of Audit Log #23 M3: data-access layer for the central SiteCalls
table introduced in B1+B2. UpsertAsync is insert-if-not-exists then
monotonic-status update so out-of-order telemetry, duplicate gRPC packets,
and reconciliation pulls all converge on the same row without rolling
state backward.

- src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs:
  UpsertAsync (monotonic), GetAsync, QueryAsync, PurgeTerminalAsync.
- src/ScadaLink.Commons/Types/Audit/SiteCallQueryFilter.cs +
  SiteCallPaging.cs: filter (Channel/SourceSite/Status/Target/time range)
  and keyset paging cursor on (CreatedAtUtc DESC, TrackedOperationId DESC),
  mirrored on M1's AuditLog* equivalents.
- src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs:
  raw-SQL InsertIfNotExists + conditional UPDATE with inline CASE rank
  compare (Submitted=0, Forwarded=1, Attempted/Skipped=2, terminal=3 —
  terminal statuses are mutually exclusive so e.g. Delivered cannot
  overwrite Parked). Duplicate-key violations (SQL 2601/2627) are
  swallowed at Debug, identical to AuditLogRepository's race-fix.
  QueryAsync uses FromSqlInterpolated because EF Core 10 cannot translate
  string.Compare against the value-converted TrackedOperationId column
  inside an expression tree.
- ServiceCollectionExtensions wires the repository (scoped, after
  IAuditLogRepository).
- 12 integration tests in tests/ScadaLink.ConfigurationDatabase.Tests/
  Repositories/ (MsSqlMigrationFixture + [SkippableFact]): fresh insert,
  monotonic advance, older-status no-op, same-status no-op,
  terminal-over-terminal no-op, 50-way concurrent-insert race produces
  exactly one row, Get known/unknown, filter by site, keyset paging no
  overlap, purge terminal-and-old, purge keeps non-terminal-and-recent.
2026-05-20 14:10:24 -04:00
Joseph Doherty
6667f345fa feat(configdb): add SiteCalls migration (#22, #23 M3)
Bundle B2 of Audit Log #23 M3: EF-generated migration that creates the
SiteCalls operational-state table on [PRIMARY], with the simple clustered
PK on TrackedOperationId and the two named indexes the entity config
declares.

No partition function / scheme / DB-role restriction — SiteCalls holds
mutable operational state (insert-once + monotonic-status update at the
repo layer), unlike the partitioned append-only AuditLog table from M1.

- Migration: 20260520180431_AddSiteCallsTable.cs (auto-generated;
  EF emitted CREATE TABLE + 2 indexes without customisation needed).
- Model snapshot updated alongside.
- Integration test: tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/
  AddSiteCallsTableMigrationTests.cs. Uses the existing MsSqlMigrationFixture
  with [SkippableFact] + Skip.IfNot(fixture.Available). Asserts table +
  twelve columns + PK on TrackedOperationId + both named indexes.
2026-05-20 14:05:36 -04:00
Joseph Doherty
3162286ade feat(configdb): map SiteCall to SiteCalls table (#22, #23 M3)
Bundle B1 of Audit Log #23 M3: introduces the SiteCall entity + EF mapping
for the central SiteCalls operational-state table. One row per
TrackedOperationId, mirrored from sites via best-effort telemetry then
periodic reconciliation; eventually-consistent mirror, not a dispatcher.

- src/ScadaLink.Commons/Entities/Audit/SiteCall.cs: append-once record
  with required TrackedOperationId/Channel/Target/SourceSite/Status,
  monotonic status update at the repo layer.
- src/ScadaLink.ConfigurationDatabase/Configurations/SiteCallEntityTypeConfiguration.cs:
  table SiteCalls, PK on TrackedOperationId (stored as varchar(36) via
  value conversion through the canonical 'D'-format GUID string —
  matches the wire shape used by gRPC + SQLite columns), two named
  indexes (IX_SiteCalls_Source_Created, IX_SiteCalls_Status_Updated).
- ScadaLinkDbContext: DbSet<SiteCall> SiteCalls in the existing Audit
  section, after AuditLogs.
- Tests in tests/ScadaLink.ConfigurationDatabase.Tests/Configurations/:
  table name, PK, value-conversion shape, index presence + ordering.
2026-05-20 14:04:17 -04:00
Joseph Doherty
e416b21dad feat(commons): CachedCallTelemetry combined operational+audit packet (#23 M3) 2026-05-20 13:58:57 -04:00
Joseph Doherty
0f28d13da7 feat(siteruntime): Tracking.Status(id) script API (#23 M3) 2026-05-20 13:56:59 -04:00