Commit Graph

434 Commits

Author SHA1 Message Date
Joseph Doherty
9b1379ed9b feat(auditlog): wire IAuditPayloadFilter into all writer paths (#23 M5)
Bundle C task M5-T6 — plugs the IAuditPayloadFilter singleton into the
three audit writer entry points so every event is truncated + redacted
before persistence, regardless of which path it took to disk:

  - FallbackAuditWriter (site hot path): filter runs before the primary
    SQLite write AND the ring-buffer enqueue, so a recovery drain replays
    rows that are already capped/redacted.
  - CentralAuditWriter (central direct-write): filter runs before the
    per-call IAuditLogRepository.InsertIfNotExistsAsync.
  - AuditLogIngestActor (site→central telemetry):
      - OnIngestAsync resolves the filter from the per-message scope and
        applies it to each row before IngestedAtUtc stamping.
      - OnCachedTelemetryAsync (M3 dual-write) applies the filter to the
        audit half of every CachedTelemetryEntry before the audit-insert
        + site-call-upsert transaction.

Filter parameter is optional (nullable) on each constructor so the
existing test composition roots that don't pass one keep working unchanged
— production DI wiring in AddAuditLog always passes the real filter
through. ICentralAuditWriter registration switched from the open-ctor
form to a factory so the filter flows through it.

Tests: FilterIntegrationTests covers all three writer paths end-to-end
(4 tests). Full ScadaLink.AuditLog.Tests suite: 146 passed, 0 failed,
0 skipped.
2026-05-20 17:21:57 -04:00
Joseph Doherty
5a7f3e8bf6 feat(auditlog): per-connection SQL parameter redaction opt-in (#23 M5) 2026-05-20 17:11:53 -04:00
Joseph Doherty
37f17dc4a8 feat(auditlog): body regex redaction with over-redaction safety net (#23 M5) 2026-05-20 17:09:36 -04:00
Joseph Doherty
ad7b330f43 feat(auditlog): HTTP header redaction stage (#23 M5) 2026-05-20 17:07:01 -04:00
Joseph Doherty
bba2ef1b4d feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety (#23 M5) 2026-05-20 17:01:13 -04:00
Joseph Doherty
25cdf857c9 feat(auditlog): IAuditPayloadFilter contract (#23 M5) 2026-05-20 16:59:10 -04:00
Joseph Doherty
02727b3a66 test(auditlog): Notify dispatcher audit trail end-to-end (#23 M4) 2026-05-20 16:47:09 -04:00
Joseph Doherty
56b26339ca test(auditlog): DB sync emission end-to-end (#23 M4) 2026-05-20 16:43:55 -04:00
Joseph Doherty
1c862989b4 feat(inbound): register AuditWriteMiddleware in pipeline (#23 M4) 2026-05-20 16:35:13 -04:00
Joseph Doherty
3c3f7770c1 feat(inbound): AuditWriteMiddleware emitting InboundRequest/InboundAuthFailure (#23 M4) 2026-05-20 16:35:03 -04:00
Joseph Doherty
855df759b5 feat(siteruntime): emit NotifySend(Submitted) on site-side Notify.To().Send (#23 M4)
Audit Log #23 M4 Bundle C — Task C1: every script-initiated
Notify.To(list).Send(...) now emits exactly one
Notification/NotifySend audit row via the IAuditWriter wired through
ScriptRuntimeContext. The row carries Status=Submitted,
Target=list name, RequestSummary={subject,body} JSON (M5 will redact),
CorrelationId=NotificationId (parsed as Guid), provenance from context,
ForwardState=Pending.

Emission is best-effort per alog.md §7: a thrown audit writer is logged
and swallowed inside the helper; the original NotificationId still flows
back to the script and the underlying S&F enqueue still happened.

Mirrors the M2 Bundle F ExternalSystem.Call wrapper pattern.

Tests: 7 new tests in NotifySendAuditEmissionTests covering submitted-
status, list-name target, request-summary JSON shape, writer-throws
fail-safe, provenance, NotificationId/CorrelationId round-trip, and the
null-writer degrade path.
2026-05-20 16:18:46 -04:00
Joseph Doherty
6de377a39e feat(notif): emit NotifyDeliver(terminal) on terminal transitions (#23 M4)
M4 Bundle B (B3) — NotificationOutboxActor emits a second NotifyDeliver
audit row carrying the terminal AuditStatus whenever a notification
transitions to a terminal state (Delivered, Parked, Discarded).

- Dispatcher: after the B2 Attempted row, a Delivered or Parked row is
  emitted when the post-outcome status is terminal. Discarded is never
  produced by the dispatcher — only by the manual discard path.
- Missing-adapter park: now emits both Attempted and terminal Parked,
  both carrying the same explanatory error.
- Manual discard (DiscardAsync): after the row update, emits a terminal
  Discarded NotifyDeliver row with no error message (operator-driven
  cancellation, not a delivery error).
- MapNotificationStatusToAuditStatus + IsTerminal helpers added; terminal
  emission shares BuildNotifyDeliverEvent with the B2 Attempted path so
  the two rows carry identical correlation/provenance fields.

Audit failure NEVER aborts the user-facing action: every emission is
wrapped in try/catch (defensive — the CentralAuditWriter itself swallows).
2026-05-20 16:12:44 -04:00
Joseph Doherty
1dfd67a90d feat(notif): emit NotifyDeliver(Attempted) per dispatcher attempt (#23 M4)
M4 Bundle B (B2) — NotificationOutboxActor's dispatcher loop emits a single
AuditChannel.Notification / AuditKind.NotifyDeliver row with AuditStatus.Attempted
for every delivery attempt (success, transient failure, permanent failure,
and the missing-adapter park).

- BuildNotifyDeliverEvent helper populates correlation id (parsed from the
  string NotificationId — sites generate Guid.NewGuid().ToString("N"),
  non-Guid ids fall through as null), list-name target, source site/instance/script
  provenance, and Actor=null (central dispatch has no authenticated end-user).
- Attempt duration is measured around the adapter call and recorded as
  DurationMs so KPIs can compute per-attempt latency.
- Emission is fire-and-forget (the writer swallows internally) and wrapped
  in try/catch — audit failure NEVER aborts the user-facing dispatch.

Terminal-state emission lands separately in B3.
2026-05-20 16:08:06 -04:00
Joseph Doherty
b31747a632 feat(notif): NotificationOutboxActor + CentralAuditWriter wired (#23 M4)
M4 Bundle B (B1) — add the central-only ICentralAuditWriter implementation
and inject it into NotificationOutboxActor so subsequent tasks (B2/B3) can
route attempt + terminal lifecycle events through the direct-write audit path.

- CentralAuditWriter: thin wrapper around IAuditLogRepository.InsertIfNotExistsAsync;
  scope-per-call (matches AuditLogIngestActor / NotificationOutboxActor pattern);
  stamps IngestedAtUtc; swallows all internal failures (alog.md §13).
- Registered as a singleton in AddAuditLog.
- NotificationOutboxActor ctor takes ICentralAuditWriter (validated non-null).
- Host wiring resolves the writer once from the root provider and passes it
  into the singleton's Props.Create call.
- Existing TestKit fixtures updated with a NoOpCentralAuditWriter helper so
  tests that don't exercise audit emission still compile and pass.
2026-05-20 16:04:01 -04:00
Joseph Doherty
e4d902753b feat(siteruntime): emit DbOutbound.DbWrite on sync Database.Execute*/ExecuteReader (#23 M4)
Audit Log #23 — M4 Bundle A (Tasks A1+A2): every script-initiated
synchronous DB call routed through Database.Connection(name) now emits
exactly one DbOutbound/DbWrite audit row.

Implementation — three thin ADO.NET decorators in
src/ScadaLink.SiteRuntime/Scripts/:

  - AuditingDbConnection: wraps the gateway-returned DbConnection so
    CreateDbCommand() hands the script an AuditingDbCommand. All other
    ADO.NET surface forwards unchanged.
  - AuditingDbCommand: intercepts ExecuteNonQuery / ExecuteScalar /
    ExecuteReader (sync + async). On terminal:
      Channel = DbOutbound, Kind = DbWrite, Status = Delivered|Failed,
      Extra = {"op":"write","rowsAffected":N}   (Execute*),
              {"op":"read","rowsReturned":N}   (ExecuteReader),
      RequestSummary = JSON of SQL + parameter values (default capture;
                       redaction in M5),
      Target = "<connection>.<first 60 chars of SQL>",
      DurationMs captured via Stopwatch,
      Provenance from ScriptRuntimeContext (SourceSiteId,
                       SourceInstanceId, SourceScript).
  - AuditingDbDataReader: counts rows on Read/ReadAsync and fires the
    audit emission exactly once on Close/CloseAsync/Dispose.

DatabaseHelper now takes an IAuditWriter; ScriptRuntimeContext.Database
threads through _auditWriter. When the writer is null (tests / minimal
hosts) Connection() returns the raw inner DbConnection unchanged.

Best-effort emission (alog.md §7): mirrors M2 Bundle F's 3-layer
fail-safe — build, write, continuation. Audit-build, audit-write, and
audit-continuation faults are logged + swallowed; the original ADO.NET
result (or original exception) flows back to the script untouched. The
SiteAuditWriteFailures counter increments automatically through the
existing FallbackAuditWriter (Bundle G).

Tests — tests/ScadaLink.SiteRuntime.Tests/Scripts/DatabaseSyncEmissionTests.cs
(7 new, all passing):
  1. Execute / INSERT success — one DbWrite row, op=write, rowsAffected=1.
  2. ExecuteScalar success — one DbWrite row, op=write.
  3. Execute throws — Status=Failed, ErrorMessage + ErrorDetail set.
  4. ExecuteReader success — op=read, rowsReturned counts rows pulled.
  5. AuditWriter throws — original ADO.NET rowsAffected returned, no
     events captured, no exception propagates.
  6. Provenance populated from context.
  7. DurationMs recorded non-zero.

Tests use Microsoft.Data.Sqlite in-memory (already transitively
available via SiteRuntime). Total SiteRuntime test suite: 251 passing
(244 baseline + 7 new). Full solution test suite passes.
2026-05-20 15:54:54 -04:00
Joseph Doherty
73a19c6f02 refactor(auditlog): remove dead-code ternary in CachedCallLifecycleBridge (#23 M3 final-review fix) 2026-05-20 15:39:10 -04:00
Joseph Doherty
f81750b2aa fix(siteruntime): immediate-success CachedCall emits terminal telemetry (#23 M3)
Bundle E left a gap in ExternalSystem.CachedCall: when the underlying HTTP
call succeeds immediately (WasBuffered=false), the store-and-forward retry
loop is never engaged and the ICachedCallLifecycleObserver hook never
fires. As a result Tracking.Status(id) would stay in Submitted forever and
the audit log would be missing the Attempted + CachedResolve pair the M3
contract requires.

Fix: capture the ExternalCallResult returned by IExternalSystemClient.
CachedCallAsync. When WasBuffered=false, emit the two missing telemetry
packets from the helper itself:

- ApiCallCached / Attempted   (per-attempt mechanics row, HttpStatus +
                              ErrorMessage extracted via the same regex
                              the synchronous Call() audit row uses)
- CachedResolve / Delivered   on Success, or
- CachedResolve / Failed      on Success=false (immediate permanent
                              failure or transient failure without S&F).

The terminal CachedResolve row carries TerminalAtUtc so SiteCallAudit can
recognise the row as eligible for purge.

The WasBuffered=true path is unaffected — the S&F retry loop owns the
Attempted + Resolve emissions there via the CachedCallLifecycleBridge.
Database.CachedWrite is unaffected too because IDatabaseGateway.
CachedWriteAsync always enqueues into S&F (no immediate-success path).

Both new emissions are best-effort: a throwing forwarder is logged and
swallowed (alog.md §7) and each row is independently try/catch-wrapped so
a single fault cannot drop both halves of the terminal pair.

Tests in ExternalSystemCachedCallEmissionTests:
- CachedCall_ImmediateSuccess_EmitsAttemptedAndCachedResolve
- CachedCall_ImmediateFailure_EmitsAttemptedAndCachedResolveFailed
- CachedCall_BufferedPath_DoesNotEmitTerminalTelemetryFromHelper

Full suite: 244 SiteRuntime tests (3 new), 200 Host tests, all green.
2026-05-20 15:15:11 -04:00
Joseph Doherty
6fe23a4d9b feat(host): register SiteCallAuditActor + CachedCallTelemetry forwarder/bridge (#22, #23 M3)
M3 Bundle F (Task F1) wires the cached-call audit pipeline through the
composition roots:

- Central: register SiteCallAuditActor as a cluster singleton + proxy
  (mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs
  calls .AddSiteCallAudit() on the central role.
- Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge
  in AddAuditLog (lazy factory — Central nodes degrade to audit-only
  emission because IOperationTrackingStore is site-only).
- Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so
  StoreAndForwardService picks it up via DI.
- Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the
  site id to StoreAndForwardService without creating a
  StoreAndForward -> HealthMonitoring project-reference cycle.
- ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script
  scope and threads it into ScriptRuntimeContext.

CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now
nullable so Central DI validation succeeds with the lazy registration; the
forwarder's tracking-half emission is a no-op when the store is absent.

Tests:
- AkkaHostedServiceAuditWiringTests: Central host builds with
  AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site
  resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext.
- Full solution: 194 Host tests green, 241 SiteRuntime tests green, every
  other suite unchanged.
2026-05-20 15:10:47 -04:00
Joseph Doherty
047988e4c8 feat(siteruntime): Database.CachedWrite emits combined telemetry + S&F audit bridge (#23 M3)
Wire the M3 cached-call audit pipeline end-to-end for the database
channel and close the loop between the S&F lifecycle observer and the
site-side dual emitter.

* DatabaseCachedWriteEmissionTests covers Database.CachedWrite (set up
  in Bundle E3): mints a TrackedOperationId, emits one CachedSubmit
  packet on DbOutbound, threads the id into IDatabaseGateway, and is
  best-effort on a thrown forwarder. Mirrors ExternalSystem.CachedCall
  coverage from E3.

* CachedCallLifecycleBridge (new) implements ICachedCallLifecycleObserver
  and lives alongside CachedCallTelemetryForwarder. The bridge ingests
  per-attempt notifications from the S&F retry loop and fans them out
  to the forwarder:
    - TransientFailure -> 1 Attempted row
    - Delivered        -> Attempted + CachedResolve(Delivered)
    - PermanentFailure -> Attempted + CachedResolve(Parked)
    - ParkedMaxRetries -> Attempted + CachedResolve(Parked)
  Channel string -> AuditKind mapping (ApiOutbound->ApiCallCached,
  DbOutbound->DbWriteCached). Best-effort top-level catch swallows any
  unexpected throw so the S&F retry bookkeeping is never disturbed.

* Bridge tests (7) cover all four outcomes, channel mapping, provenance
  propagation, and the no-throw-on-forwarder-failure contract.

Bundle F (Host registration) will instantiate the bridge and inject it
into StoreAndForwardService.cachedCallObserver, closing the wiring path
end-to-end.

Bundle E task E6.
2026-05-20 14:55:17 -04:00
Joseph Doherty
63eb1f4225 feat(snf): per-attempt and terminal cached-call lifecycle observer (#23 M3)
Hook the store-and-forward retry loop so the audit pipeline can emit
per-attempt + terminal telemetry under the original TrackedOperationId
(Bundle E Tasks E4 + E5).

New seam:

* ICachedCallLifecycleObserver + CachedCallAttemptContext in
  Commons.Interfaces.Services. Outcome enum
  (Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries)
  is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F)
  will map it to the AuditKind/AuditStatus pair when building the
  CachedCallTelemetry packet.

* StoreAndForwardService gains an optional cachedCallObserver
  constructor parameter + siteId. RetryMessageAsync fires the observer
  exactly once per attempt with the appropriate outcome:
    - handler returns true               -> Delivered
    - handler returns false              -> PermanentFailure (and parks)
    - handler throws + retries remaining -> TransientFailure
    - handler throws + max retries hit   -> ParkedMaxRetries (and parks)

Hook is best-effort: a thrown observer is logged + swallowed so a
failing audit pipeline can never be misclassified as a transient
delivery failure or corrupt the retry-count bookkeeping (alog.md §7).

Only cached-call categories (ExternalSystem, CachedDbWrite) generate
notifications — Notification category has its own central-side
audit pipeline (Notification Outbox / #21).

Pre-M3 callers that didn't thread a TrackedOperationId into the S&F
message id are silently skipped — the observer requires a parseable id
by contract. New S&F callers stamp the id as messageId (Bundle E3).

Bundle E tasks E4 + E5.
2026-05-20 14:52:34 -04:00
Joseph Doherty
42430dd10a feat(siteruntime): ExternalSystem.CachedCall emits CachedSubmit telemetry (#23 M3)
Rework ScriptRuntimeContext.ExternalSystem.CachedCall to fit the M3
combined-telemetry model:

* Mints a fresh TrackedOperationId and emits one CachedSubmit packet
  via ICachedCallTelemetryForwarder BEFORE handing the call off — the
  SiteCalls row is materialised before the first delivery attempt so
  Tracking.Status(id) can observe a Submitted row even if immediate
  delivery resolves before the helper returns.
* Threads the TrackedOperationId into IExternalSystemClient.CachedCallAsync
  as a new optional parameter (and into IDatabaseGateway.CachedWriteAsync
  for the Database mirror set up here for E6). The gateway uses the id
  as the StoreAndForward messageId so the retry loop (Tasks E4/E5) can
  recover it from StoreAndForwardMessage.Id.
* Returns the TrackedOperationId rather than ExternalCallResult — the
  script's contract is now "get a tracking handle, observe outcome via
  Tracking.Status". Best-effort emission: a thrown forwarder is logged
  + swallowed; the original call still runs and the id is still returned.

DatabaseHelper gets the matching siteId / sourceScript / forwarder
fields and a parallel CachedSubmit emitter (Channel=DbOutbound) so Task
E6's Database.CachedWrite mirror plugs in without further runtime
wiring.

New ICachedCallTelemetryForwarder seam in Commons.Interfaces.Services
so SiteRuntime depends on Commons (existing arrow) rather than
ScadaLink.AuditLog (would have introduced a new dependency).

Bundle E task E3 (and helper-shape work for E6).
2026-05-20 14:48:05 -04:00
Joseph Doherty
2145b29d4d feat(auditlog): CachedCallTelemetryForwarder for site-side dual emission (#23 M3)
Sister to SiteAuditTelemetryActor: takes a combined CachedCallTelemetry
packet and fans it out to the two site-local stores.

* AuditEvent half writes through IAuditWriter (the M2 FallbackAuditWriter
  + SqliteAuditWriter chain — same site SQLite hot-path as sync calls).
* SiteCallOperational half maps Audit.Kind to the matching
  IOperationTrackingStore method:
    - CachedSubmit  -> RecordEnqueueAsync (insert-if-not-exists)
    - ApiCallCached / DbWriteCached -> RecordAttemptAsync (monotonic)
    - CachedResolve -> RecordTerminalAsync (first-write-wins)

Best-effort contract (alog.md §7): independent try/catch per half so a
thrown writer cannot starve the tracking row (and vice-versa); both
failures are logged at warning level and swallowed — the calling script
never sees them.

Wire push deferred to M6 — the NoOp ISiteStreamAuditClient binding stays
in effect; the forwarder writes only to the local stores in M3. The
existing SiteAuditTelemetryActor drain loop will sweep the audit rows
once a real gRPC client lands.

Bundle E task E2.
2026-05-20 14:41:15 -04:00
Joseph Doherty
73719ee066 feat(auditlog): extend ISiteStreamAuditClient with IngestCachedTelemetryAsync (#23 M3)
Add the second site→central RPC seam alongside the existing M2
IngestAuditEventsAsync. The Bundle D proto already lit up
IngestCachedTelemetry (CachedTelemetryBatch / IngestAck) so this commit
just plumbs the client-side abstraction:

* ISiteStreamAuditClient gains IngestCachedTelemetryAsync(batch, ct).
* NoOpSiteStreamAuditClient implements it returning an empty IngestAck
  (same shape as M2 — production gRPC client lands in M6).
* SyncCallEmissionEndToEndTests' DirectActorSiteStreamAuditClient stub
  throws NotSupportedException from the new method so a regression that
  accidentally routes a cached packet through the sync stub fails loudly.
* New NoOpSiteStreamAuditClientTests cover the null-guard + empty-ack
  contract for both batch shapes.

Bundle E task E1.
2026-05-20 14:39:24 -04:00
Joseph Doherty
0a97fff906 feat(auditlog): combined telemetry dual-write transaction (#23 M3) 2026-05-20 14:33:14 -04:00
Joseph Doherty
2b54290c7f feat(comms): IngestCachedTelemetry RPC + CachedTelemetryPacket proto (#23 M3) 2026-05-20 14:24:13 -04:00
Joseph Doherty
de110f8b42 feat(scaudit): SiteCallAuditActor minimum surface (#22, #23 M3)
Bundle C of Audit Log #23 M3.

Adds the ScadaLink.SiteCallAudit project + matching tests project, mirroring
the ScadaLink.AuditLog scaffolding pattern (net10.0, central package
management, InternalsVisibleTo to the tests assembly).

SiteCallAuditActor is the central singleton entry point for Site Call Audit
(#22): it receives UpsertSiteCallCommand and persists the SiteCall via
ISiteCallAuditRepository.UpsertAsync (monotonic, idempotent — out-of-order
or duplicate updates are silent no-ops at the repo). Audit-write failures
NEVER abort the user-facing action (CLAUDE.md): repository throws are
caught + logged, the actor replies Accepted=false, and the singleton stays
alive (Resume supervisor strategy as defence in depth).

Two constructors mirror AuditLogIngestActor:
- IServiceProvider production constructor resolves the scoped EF repository
  from a fresh DI scope per message.
- ISiteCallAuditRepository test constructor injects a concrete repository so
  the TestKit tests exercise the real monotonic-upsert SQL end to end.

UpsertSiteCallCommand + UpsertSiteCallReply live in ScadaLink.Commons (same
home as IngestAuditEventsCommand) so Bundle D's gRPC server can construct
them without taking a project reference on the actor's host project.

AddSiteCallAudit() is a placeholder for symmetry with AddAuditLog /
AddNotificationOutbox; Bundle F will populate it with the actor's Props
factory + options bindings.

Tests (Akka.TestKit.Xunit2 + MsSqlMigrationFixture via project ref to
ScadaLink.ConfigurationDatabase.Tests, mirroring Bundle D2):
- Receive_UpsertSiteCallCommand_Persists_Replies_Accepted
- Receive_DuplicateUpsert_OlderStatus_NoOp_StillRepliesAccepted (idempotency)
- Receive_RepoThrowsTransient_RepliesAccepted_False_ActorStaysAlive

Reconciliation, KPIs, and the central->site Retry/Discard relay are
deferred per CLAUDE.md scope discipline.

ScadaLink.slnx updated to include both new projects.

All 3 new tests pass against the running infra/mssql container; full suite
(2683 tests across 27 projects) passes with no regressions.
2026-05-20 14:18:49 -04:00
Joseph Doherty
bedfa6b8f3 feat(configdb): ISiteCallAuditRepository + EF impl, monotonic upsert (#22, #23 M3)
Bundle B3 of Audit Log #23 M3: data-access layer for the central SiteCalls
table introduced in B1+B2. UpsertAsync is insert-if-not-exists then
monotonic-status update so out-of-order telemetry, duplicate gRPC packets,
and reconciliation pulls all converge on the same row without rolling
state backward.

- src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs:
  UpsertAsync (monotonic), GetAsync, QueryAsync, PurgeTerminalAsync.
- src/ScadaLink.Commons/Types/Audit/SiteCallQueryFilter.cs +
  SiteCallPaging.cs: filter (Channel/SourceSite/Status/Target/time range)
  and keyset paging cursor on (CreatedAtUtc DESC, TrackedOperationId DESC),
  mirrored on M1's AuditLog* equivalents.
- src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs:
  raw-SQL InsertIfNotExists + conditional UPDATE with inline CASE rank
  compare (Submitted=0, Forwarded=1, Attempted/Skipped=2, terminal=3 —
  terminal statuses are mutually exclusive so e.g. Delivered cannot
  overwrite Parked). Duplicate-key violations (SQL 2601/2627) are
  swallowed at Debug, identical to AuditLogRepository's race-fix.
  QueryAsync uses FromSqlInterpolated because EF Core 10 cannot translate
  string.Compare against the value-converted TrackedOperationId column
  inside an expression tree.
- ServiceCollectionExtensions wires the repository (scoped, after
  IAuditLogRepository).
- 12 integration tests in tests/ScadaLink.ConfigurationDatabase.Tests/
  Repositories/ (MsSqlMigrationFixture + [SkippableFact]): fresh insert,
  monotonic advance, older-status no-op, same-status no-op,
  terminal-over-terminal no-op, 50-way concurrent-insert race produces
  exactly one row, Get known/unknown, filter by site, keyset paging no
  overlap, purge terminal-and-old, purge keeps non-terminal-and-recent.
2026-05-20 14:10:24 -04:00
Joseph Doherty
6667f345fa feat(configdb): add SiteCalls migration (#22, #23 M3)
Bundle B2 of Audit Log #23 M3: EF-generated migration that creates the
SiteCalls operational-state table on [PRIMARY], with the simple clustered
PK on TrackedOperationId and the two named indexes the entity config
declares.

No partition function / scheme / DB-role restriction — SiteCalls holds
mutable operational state (insert-once + monotonic-status update at the
repo layer), unlike the partitioned append-only AuditLog table from M1.

- Migration: 20260520180431_AddSiteCallsTable.cs (auto-generated;
  EF emitted CREATE TABLE + 2 indexes without customisation needed).
- Model snapshot updated alongside.
- Integration test: tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/
  AddSiteCallsTableMigrationTests.cs. Uses the existing MsSqlMigrationFixture
  with [SkippableFact] + Skip.IfNot(fixture.Available). Asserts table +
  twelve columns + PK on TrackedOperationId + both named indexes.
2026-05-20 14:05:36 -04:00
Joseph Doherty
3162286ade feat(configdb): map SiteCall to SiteCalls table (#22, #23 M3)
Bundle B1 of Audit Log #23 M3: introduces the SiteCall entity + EF mapping
for the central SiteCalls operational-state table. One row per
TrackedOperationId, mirrored from sites via best-effort telemetry then
periodic reconciliation; eventually-consistent mirror, not a dispatcher.

- src/ScadaLink.Commons/Entities/Audit/SiteCall.cs: append-once record
  with required TrackedOperationId/Channel/Target/SourceSite/Status,
  monotonic status update at the repo layer.
- src/ScadaLink.ConfigurationDatabase/Configurations/SiteCallEntityTypeConfiguration.cs:
  table SiteCalls, PK on TrackedOperationId (stored as varchar(36) via
  value conversion through the canonical 'D'-format GUID string —
  matches the wire shape used by gRPC + SQLite columns), two named
  indexes (IX_SiteCalls_Source_Created, IX_SiteCalls_Status_Updated).
- ScadaLinkDbContext: DbSet<SiteCall> SiteCalls in the existing Audit
  section, after AuditLogs.
- Tests in tests/ScadaLink.ConfigurationDatabase.Tests/Configurations/:
  table name, PK, value-conversion shape, index presence + ordering.
2026-05-20 14:04:17 -04:00
Joseph Doherty
e416b21dad feat(commons): CachedCallTelemetry combined operational+audit packet (#23 M3) 2026-05-20 13:58:57 -04:00
Joseph Doherty
0f28d13da7 feat(siteruntime): Tracking.Status(id) script API (#23 M3) 2026-05-20 13:56:59 -04:00
Joseph Doherty
b86d7c61ab feat(siteruntime): OperationTrackingStore site-local SQLite (#23 M3) 2026-05-20 13:51:09 -04:00
Joseph Doherty
1c38dd540f feat(commons): TrackedOperationId strong type (#23 M3) 2026-05-20 13:47:40 -04:00
Joseph Doherty
dd3351da93 feat(health): SiteAuditWriteFailures counter + AuditLog bridge (#23)
Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary-
failure counter into the Site Health Monitoring report payload so a
sustained audit-write outage surfaces on /monitoring/health instead of
disappearing into a NoOp sink.

- SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive).
- ISiteHealthCollector + SiteHealthCollector: new
  IncrementSiteAuditWriteFailures() counter, per-interval reset
  semantics matching ScriptErrorCount / DeadLetterCount.
- HealthMetricsAuditWriteFailureCounter: adapter forwarding
  IAuditWriteFailureCounter.Increment() to the collector.
- AddAuditLogHealthMetricsBridge(): swaps the NoOp default
  registration for the real bridge; called from
  SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog.
- Existing host-wiring test updated: site composition now resolves
  HealthMetricsAuditWriteFailureCounter (not NoOp).

Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new),
full solution green.
2026-05-20 13:22:25 -04:00
Joseph Doherty
82a8bbf225 feat(siteruntime): ExternalSystem.Call emits Audit Log #23 event on every sync call
Wraps IExternalSystemClient.CallAsync inside ScriptRuntimeContext's
ExternalSystemHelper so every script-initiated ExternalSystem.Call
produces exactly one ApiOutbound/ApiCall AuditEvent via IAuditWriter.

- Captures duration with Stopwatch.GetTimestamp() around the call.
- Builds the audit event with full provenance (SiteId, InstanceId,
  SourceScript) and a fresh EventId; ForwardState=Pending.
- Maps Success → AuditStatus.Delivered, Failure (or thrown) → Failed;
  parses HTTP {code} out of the ExternalSystemClient's error message
  to populate HttpStatus.
- Audit emission is fully best-effort: event-build failures, sync
  WriteAsync throws, AND async WriteAsync faults are all logged at
  Warning and swallowed so the script's call path is never aborted
  by an audit-write failure (alog.md §7).
- Original ExternalCallResult or original exception flows back to the
  caller unchanged.

ScriptExecutionActor resolves IAuditWriter from DI and threads it
into ScriptRuntimeContext alongside the existing site identity.

Adds ExternalSystemCallAuditEmissionTests covering: success →
Delivered, HTTP 500 → Failed+httpStatus, HTTP 400 → Failed+httpStatus,
client-thrown network exception → Failed with original exception
re-thrown, audit-writer throw → original result returned, provenance
populated from context, DurationMs recorded.

Refs Audit Log #23 M2 Bundle F.
2026-05-20 13:11:19 -04:00
Joseph Doherty
9bf1497f03 feat(host): register Audit Log #23 singletons with dedicated dispatcher (#23)
Wires Bundle E of the M2 site-sync pipeline:

- AddAuditLog extended to register the site writer chain (SqliteAuditWriter
  singleton + ISiteAuditQueue forward + RingBufferFallback + FallbackAuditWriter
  composing them) and the telemetry collaborators (SiteAuditTelemetryOptions,
  SqliteAuditWriterOptions, IAuditWriteFailureCounter NoOp default,
  ISiteStreamAuditClient NoOp default).
- AkkaHostedService central role: AuditLogIngestActor as ClusterSingletonManager
  (singleton name 'audit-log-ingest') + ClusterSingletonProxy, mirroring the
  Notification Outbox pattern. Proxy is offered to SiteStreamGrpcServer if it
  resolves (Site path only today; M6 reconciliation will host gRPC on central).
- AkkaHostedService site role: SiteAuditTelemetryActor (per-site, NOT a
  singleton because each site is its own cluster), bound to a dedicated
  audit-telemetry-dispatcher (ForkJoinDispatcher, 2 dedicated threads).
- Program.cs + SiteServiceRegistration.Configure call AddAuditLog on both roles.
- AuditLogIngestActor gains a second constructor that takes IServiceProvider so
  the cluster singleton can create a fresh scope per message — IAuditLogRepository
  is a scoped EF Core service and cannot be pre-resolved from the root. The
  IAuditLogRepository constructor remains for Bundle D's MSSQL-fixture tests.

NoOp ISiteStreamAuditClient is deliberate: no site→central gRPC channel exists
in M2 (sites talk to central via Akka ClusterClient; gRPC SiteStreamService is
hosted on sites for central→site streaming). M6 reconciliation introduces the
real gRPC site→central client + central-hosted gRPC server. Bundle H's
integration test substitutes a stub client directly via the actor's Props.

Tests:
- tests/ScadaLink.AuditLog.Tests/AddAuditLogTests.cs — 11 tests (was 3): writer
  singleton, IAuditWriter as FallbackAuditWriter, ISiteAuditQueue same-instance
  as SqliteAuditWriter, options bind round-trip, NoOp default assertions.
- tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs (new) — 13
  tests: BuildHocon emits audit-telemetry-dispatcher block with the expected
  type/throughput/thread-count; Central composition root resolves the writer
  chain + options; Site composition root resolves the writer chain + options +
  NoOp client.

Verified: dotnet build clean, 23 test suites green (Host 194 + AuditLog 54).
2026-05-20 13:04:05 -04:00
Joseph Doherty
87cae88f92 feat(auditlog): AuditLogIngestActor + gRPC handler (#23) 2026-05-20 12:48:26 -04:00
Joseph Doherty
b679430d13 feat(auditlog): SiteAuditTelemetryActor + ISiteStreamAuditClient seam (#23) 2026-05-20 12:40:49 -04:00
Joseph Doherty
126956eee6 feat(auditlog): AuditEvent ↔ proto mapper (#23) 2026-05-20 12:32:16 -04:00
Joseph Doherty
5c3d601198 feat(comms): IngestAuditEvents RPC + AuditEventDto proto (#23) 2026-05-20 12:29:58 -04:00
Joseph Doherty
ff8766ec8b feat(auditlog): FallbackAuditWriter compose SQLite + ring + failure counter (#23)
Adds the IAuditWriter composer that sits between the script-side
ScriptRuntimeContext audit emission (Bundle F) and the primary
SqliteAuditWriter. Honours the alog.md §7 guarantee that audit-write
failures NEVER abort the user-facing action:

- Primary throw -> log Warning, increment IAuditWriteFailureCounter
  (Bundle G's health-metric sink), stash the event in the drop-oldest
  RingBufferFallback, return success to the caller.
- Primary success -> opportunistically drain the ring back through the
  primary in FIFO order, behind the triggering event. Drain is
  serialised via a SemaphoreSlim gate so concurrent recoveries don't
  double-replay; a drain-side re-throw re-enqueues at the tail and
  breaks out (the next successful write retries).

Adds IAuditWriteFailureCounter as the lightweight DI seam (one void
Increment()), and a TryDequeue helper on RingBufferFallback that the
recovery path uses to pop one item without blocking.

Tests (4 new, total 26 -> 30):
- WriteAsync_PrimaryThrows_EventLandsInRing_CallReturnsSuccess
- WriteAsync_PrimaryRecovers_RingDrains_InFIFOOrder_OnNextWrite
  (order: trigger first, then ring backlog in submission FIFO)
- WriteAsync_PrimaryAlwaysSucceeds_Ring_StaysEmpty
- WriteAsync_FailureCounter_Incremented_Per_PrimaryFailure
2026-05-20 12:23:50 -04:00
Joseph Doherty
55fbcce7a8 feat(auditlog): RingBufferFallback with drop-oldest overflow (#23)
Adds RingBufferFallback — an in-memory drop-oldest ring buffer used by
the upcoming FallbackAuditWriter (Bundle B-T4) when the primary SQLite
writer is throwing. Backed by Channel<AuditEvent> with
BoundedChannelFullMode.DropOldest, fixed capacity (default 1024).

Channel.CreateBounded(DropOldest) does NOT natively signal a drop on
TryWrite, so overflow is detected by comparing Reader.Count before and
after the enqueue: when the buffer is already at capacity and a new
TryWrite succeeds while keeping the count at capacity, exactly one
event was displaced and RingBufferOverflowed is raised (one event per
drop).

Public surface:
- bool TryEnqueue(AuditEvent) — always succeeds unless completed.
- IAsyncEnumerable<AuditEvent> DrainAsync(CancellationToken) — FIFO.
- void Complete() — closes the channel so DrainAsync can finish.
- event Action? RingBufferOverflowed — health counter hook.

Tests (3 new, total 23 -> 26):
- Enqueue_1025_Into_1024Cap_Ring_DropsOldest_AndRaisesOverflowOnce
- DrainAsync_Yields_FIFO_Then_Completes_When_Empty
- TryEnqueue_AllSucceeds_ReturnsTrue
2026-05-20 12:20:55 -04:00
Joseph Doherty
01480c6ea2 feat(auditlog): SqliteAuditWriter Channel-based hot-path + ReadPendingAsync/MarkForwardedAsync (#23)
Replaces the B-T1 stub WriteAsync with the production hot-path:

- Bounded Channel<PendingAuditEvent> (BoundedChannelFullMode.Wait, capacity
  from options) feeds a background ProcessWriteQueueAsync loop that drains
  up to BatchSize events per transaction.
- The loop INSERTs each event with explicit parameter binding (enums and
  DateTime stored as text); duplicate EventIds (SqliteException with
  ErrorCode 19 SQLITE_CONSTRAINT) are swallowed as first-write-wins per
  alog.md §11, and the pending TCS is still completed successfully so
  callers see idempotent semantics.
- Site rows force ForwardState = Pending on enqueue when the inbound
  event leaves it null — site-side default per the M2 design.
- ReadPendingAsync(limit) returns oldest-first pending rows for the
  Bundle D telemetry actor; EventId is the deterministic tiebreaker on
  identical OccurredAtUtc timestamps. MarkForwardedAsync(ids) flips a
  batch to Forwarded in one UPDATE with a parameterised IN list.
- IAsyncDisposable graceful shutdown: TryComplete the writer, await the
  drain (5s budget), then dispose the connection.

Tests (7 new, total 16 -> 23):
- WriteAsync_FreshEvent_PersistsWithForwardStatePending
- WriteAsync_Concurrent_1000Calls_All_Persist_NoExceptions
- WriteAsync_DuplicateEventId_FirstWriteWins_NoException
- WriteAsync_ForcesForwardStatePending_IfNull
- ReadPendingAsync_Returns_OldestFirst_LimitedToN
- MarkForwardedAsync_FlipsRowsToForwarded
- MarkForwardedAsync_NonExistentId_NoThrow
2026-05-20 12:20:02 -04:00
Joseph Doherty
7173a79ad7 feat(auditlog): SqliteAuditWriter schema bootstrap (#23)
Adds the site-side SqliteAuditWriter skeleton with schema bootstrap —
20-column AuditLog table + IX_SiteAuditLog_ForwardState_Occurred index +
PRAGMA auto_vacuum = INCREMENTAL — and the SqliteAuditWriterOptions
companion type. Mirrors the SiteEventLogger pattern: single owned
SqliteConnection serialised behind a write lock; the Channel-based
hot-path lands in Bundle B-T2.

Adds Microsoft.Data.Sqlite + Microsoft.Extensions.Logging.Abstractions
project refs to ScadaLink.AuditLog; adds Microsoft.Data.Sqlite +
Microsoft.Extensions.Logging.Abstractions + NSubstitute test refs.

Tests (3 new, total 13 -> 16):
- Opens_Creates_AuditLog_Table_With_20Columns_And_PK_On_EventId
- Opens_Creates_IX_ForwardState_Occurred_Index
- PRAGMA_auto_vacuum_Is_INCREMENTAL
2026-05-20 12:17:02 -04:00
Joseph Doherty
d745ef0715 fix(configdb): InsertIfNotExistsAsync swallows duplicate-key races + add keyset tiebreaker test (#23)
Two concurrent sessions can both pass the IF NOT EXISTS check and then both attempt the INSERT against UX_AuditLog_EventId; the loser surfaced as SqlException 2601 (or 2627 for PK violations) and aborted the audit write. First-write-wins idempotency is the documented contract, so the race outcome is semantically a no-op — catch the two duplicate-key error numbers and log at Debug, let any other SqlException bubble.

Tests:

- InsertIfNotExistsAsync_ConcurrentDuplicateInserts_ProduceExactlyOneRow: 50 parallel inserters with the same EventId end with exactly one row and no escaped exceptions.

- QueryAsync_Keyset_SameOccurredAtUtc_TiebreaksOnEventId: four rows sharing the same OccurredAtUtc page deterministically through the (OccurredAtUtc, EventId) keyset cursor — exercises the e.OccurredAtUtc == after && e.EventId.CompareTo(afterId) < 0 branch and verifies EF Core 10's Guid.CompareTo translation against SQL Server uniqueidentifier order (deferred Bundle D reviewer recommendation).

AuditLogRepository now takes an optional ILogger<AuditLogRepository> (NullLogger default, mirrors InboundApiRepository); DI registration unchanged.
2026-05-20 12:12:50 -04:00
Joseph Doherty
7723bfb712 feat(auditlog): add AuditLogOptions + validator (#23) 2026-05-20 11:17:46 -04:00
Joseph Doherty
a15ceb3ec9 feat(auditlog): scaffold ScadaLink.AuditLog project + tests project (#23) 2026-05-20 11:14:03 -04:00
Joseph Doherty
de839627ed feat(configdb): AuditLogRepository (EF) + DI registration, append-only with M6-deferred SwitchOutPartition (#23)
EF Core implementation of IAuditLogRepository:
- InsertIfNotExistsAsync: single IF NOT EXISTS ... INSERT via
  ExecuteSqlInterpolatedAsync, bypasses the change tracker. Enum
  values converted to string in C# (columns are varchar(32) via
  HasConversion<string>).
- QueryAsync: AsNoTracking, predicate-per-non-null-filter, keyset
  paging on (OccurredAtUtc DESC, EventId DESC) — EF Core 10
  translates Guid.CompareTo to a uniqueidentifier < comparison
  natively (verified against MSSQL 2022).
- SwitchOutPartitionAsync: throws NotSupportedException naming M6;
  the non-aligned UX_AuditLog_EventId unique index blocks
  ALTER TABLE SWITCH PARTITION until the drop-and-rebuild dance
  ships with the purge actor.

DI: AddScoped<IAuditLogRepository, AuditLogRepository>() added after
the NotificationOutboxRepository registration; existing DI smoke test
extended with an IAuditLogRepository assertion.

Integration tests (8 new) use the Bundle C MsSqlMigrationFixture and
scope by a per-test SourceSiteId guid so they neither collide nor
require cleanup.

Bundle D of the Audit Log #23 M1 Foundation plan.
2026-05-20 11:05:18 -04:00
Joseph Doherty
db32a149d3 feat(commons): add IAuditLogRepository + AuditLogQueryFilter + AuditLogPaging (#23)
Append-only data-access surface for the central AuditLog table — three
methods: InsertIfNotExistsAsync (first-write-wins on EventId), QueryAsync
(filter + keyset paging on (OccurredAtUtc desc, EventId desc)), and
SwitchOutPartitionAsync (M1 honest contract — throws NotSupported until
M6 lands the non-aligned-index drop/rebuild dance for the partition
switch). No Update, no row-delete; bulk purge is partition-only.

Bundle D of the Audit Log #23 M1 Foundation plan.
2026-05-20 11:04:59 -04:00
Joseph Doherty
d9c99242a3 feat(configdb): add AuditLog migration with monthly partitioning and DB roles (#23)
Bundle C of the #23 M1 foundation. Creates the centralized AuditLog table
with the partition function, partition scheme, partition-aligned
non-clustered indexes, and the two access-control roles documented in
alog.md §4.

Schema:
- pf_AuditLog_Month: RANGE RIGHT, 24 monthly boundaries (Jan 2026 – Dec 2027).
- ps_AuditLog_Month: ALL TO ([PRIMARY]) — dev/test parity.
- dbo.AuditLog: created via raw SQL ON ps_AuditLog_Month(OccurredAtUtc).
  Composite clustered PK {EventId, OccurredAtUtc} (partition column must be
  part of the clustered key). 22 columns matching the EF AuditEvent model.
- 5 reconciliation/query non-clustered indexes from alog.md §4
  (Channel_Status_Occurred, CorrelationId filtered, OccurredAtUtc,
  Site_Occurred, Target_Occurred filtered) — all partition-aligned.
- UX_AuditLog_EventId: non-aligned UNIQUE on EventId alone (preserves
  InsertIfNotExistsAsync idempotency from M1-T8). Non-aligned because
  partition-aligned unique indexes require the partition column in the key,
  which would weaken to composite uniqueness; the purge story (M2/M3)
  rebuilds this index around partition switches.

Access control:
- scadalink_audit_writer: GRANT INSERT + GRANT SELECT, DENY UPDATE + DENY DELETE
  on AuditLog. The explicit DENY guarantees later db_datawriter membership
  cannot quietly re-enable mutation.
- scadalink_audit_purger: GRANT SELECT on AuditLog, GRANT ALTER on SCHEMA::dbo
  (enables ALTER PARTITION FUNCTION SWITCH and SWITCH PARTITION).

Both role definitions are idempotent (IF DATABASE_PRINCIPAL_ID IS NULL).
Down() drops in reverse dependency order with IF EXISTS guards.

Integration tests (tests/ScadaLink.ConfigurationDatabase.Tests/Migrations/):
- MsSqlMigrationFixture: connects to the running infra/mssql container (or
  the SCADALINK_MSSQL_TEST_CONN override), creates a unique per-fixture
  database, applies the migrations, drops the DB on dispose. Marks itself
  Available=false when MSSQL is unreachable so tests early-return cleanly
  on CI without the dev container.
- AddAuditLogTableMigrationTests: 8 tests covering table existence,
  partition function/scheme, partition-aligned PK, the 5 named indexes,
  both roles' grants, and a smoke test that a writer-role user receives
  SqlException with "permission" on UPDATE AuditLog.

ConfigurationDatabase tests: 142 passing -> 150 passing (8 new integration
tests). Full solution builds clean.

Package: tests project locally overrides Microsoft.Data.SqlClient to 6.1.1
(EF SqlServer 10.0.7 needs >= 6.1.1; central package version is pinned at
6.0.2 for the production ExternalSystemGateway).
2026-05-20 10:25:25 -04:00