Commit Graph

58 Commits

Author SHA1 Message Date
Joseph Doherty
466e1454fe test(sitecall-audit): symmetric SourceNode coverage on DbOutbound emitter + clarify DI comments
Two follow-ups from the T13/T14 code review:

- M1: Add CachedWrite_StampsSourceNode_OnSubmitTelemetryRow and
  CachedWrite_NoSourceNodeWired_LeavesSourceNodeNull to DatabaseCachedWriteEmissionTests,
  mirroring the existing ApiOutbound SourceNode tests in
  ExternalSystemCachedCallEmissionTests. Site-emitter coverage now symmetric
  across both cached-call channels.
- M2: Clarify the GetService(INodeIdentityProvider) DI comments on the
  CachedCallTelemetryForwarder and CachedCallLifecycleBridge factories:
  it's test composition roots that may not register the provider, not
  central production. Both site and central hosts always register it via
  SiteServiceRegistration.BindSharedOptions.
2026-05-23 17:50:14 -04:00
Joseph Doherty
06ed0acead feat(sitecall-audit): carry + persist SourceNode end-to-end via cached telemetry
Site: site emitters of SiteCallOperational (ExternalSystemClient, the script-API
cached call path in ScriptRuntimeContext, CachedCallLifecycleBridge) inject
INodeIdentityProvider and stamp SourceNode = NodeName at construction.

OperationTrackingStore call site in CachedCallTelemetryForwarder now stamps
SourceNode too.

Central: SiteCallAuditRepository.UpsertAsync INSERT includes SourceNode in the
column list; conditional monotonic UPDATE uses
COALESCE(@SourceNode, SourceNode) so later packets cannot blank a previously-
stamped value. After this commit every SiteCalls row carries node-a/node-b in
SourceNode (subject to monotonic preservation).
2026-05-23 17:41:22 -04:00
Joseph Doherty
e6341580b3 test(audit): lock null-provider passthrough on CentralAuditWriter
Two follow-ups flagged by code review on Tasks 11/12:

- Lock the back-compat contract for CentralAuditWriter's optional
  `nodeIdentity = null` ctor parameter with two explicit tests
  (`WriteAsync_PassesThroughCallerSourceNode_WhenNoProviderInjected` and
  `WriteAsync_LeavesSourceNodeNull_WhenNoProviderInjected`). The previous
  null-provider path was only exercised incidentally via legacy
  CentralAuditWriterTests setups; the new tests make the contract explicit
  and distinct from the "provider supplied, returns null" path.

- Document why the catch-block log references `evt` rather than the
  post-stamp record: the three logged fields (EventId, Kind, Status) are
  immutable across the filter+stamp chain, so referencing either name is
  equivalent — but the comment warns future maintainers to switch names if
  they ever add a field the chain mutates (e.g. SourceNode).
2026-05-23 17:18:45 -04:00
Joseph Doherty
974a36826a feat(audit): stamp SourceNode at CentralAuditWriter + persist via AuditLogRepository
CentralAuditWriter injects INodeIdentityProvider and stamps the event before
handing to the repository. AuditLogRepository.InsertIfNotExistsAsync now
includes SourceNode in the INSERT column list. Caller-provided value wins
(supports any future direct-write callsite that already has its own node id).
2026-05-23 17:11:23 -04:00
Joseph Doherty
479870e40c feat(audit): stamp SourceNode at site SqliteAuditWriter from INodeIdentityProvider
Caller-provided SourceNode wins (preserves reconciled rows from other nodes);
otherwise the writer fills it from the local INodeIdentityProvider.NodeName.
Reads from the provider on every write — singleton lifetime means zero overhead.
2026-05-23 17:08:21 -04:00
Joseph Doherty
277882d230 feat(site-runtime): add SourceNode column to OperationTracking + thread through RecordEnqueueAsync 2026-05-23 16:54:48 -04:00
Joseph Doherty
f3cb8c0791 feat(audit): add SourceNode column to site SQLite AuditLog (idempotent upgrade) 2026-05-23 16:50:16 -04:00
Joseph Doherty
990eb02fe0 feat(sitecall-audit): add SourceNode to SiteCallOperational + SiteCall entity 2026-05-23 15:53:44 -04:00
Joseph Doherty
7b619d711d feat(auditlog): payload filter uses InboundMaxBytes for ApiInbound rows 2026-05-23 05:55:03 -04:00
Joseph Doherty
c5b27361c0 feat(auditlog): add AuditLog:InboundMaxBytes option (default 1 MiB, [8 KiB, 16 MiB]) 2026-05-23 05:39:50 -04:00
Joseph Doherty
9ec83d5070 docs(auditlog): generalize two stale XML-doc comments
- AddColumnIfMissing is now shared by ExecutionId and ParentExecutionId;
  drop the ExecutionId-specific tag.
- AuditLogRepository.GetExecutionTreeAsync doc no longer hardcodes the
  MAXRECURSION literal; reference the ExecutionChainMaxDepth const instead.
2026-05-21 20:14:31 -04:00
Joseph Doherty
c00603e2a4 feat(auditlog): thread ParentExecutionId through S&F for retry-loop cached rows
The store-and-forward retry loop emits the per-attempt and terminal cached
audit rows (ApiCallCached/DbWriteCached Attempted, CachedResolve) via
CachedCallLifecycleBridge from a CachedCallAttemptContext, not from the
script context. The ExecutionId rollout (Task 4) already threaded ExecutionId
and SourceScript through this path; ParentExecutionId — the spawning
inbound-API request's ExecutionId — was not, so those retry-loop rows had
ParentExecutionId = null even for an inbound-API-routed run.

Thread it additively as a sibling at every carry point ExecutionId passes
through:

- StoreAndForwardMessage gains ParentExecutionId (Guid?).
- StoreAndForwardStorage adds a nullable parent_execution_id column via the
  same idempotent PRAGMA-probed ALTER TABLE migration; rows persisted by an
  older build read back null (back-compat). The defensive Guid.TryParse read
  helper (ParseExecutionId) is renamed ParseGuidColumn and reused for both
  columns so a corrupt value cannot abort the retry sweep.
- StoreAndForwardService.EnqueueAsync gains an optional parentExecutionId
  param, stamped onto the buffered message and surfaced on the
  CachedCallAttemptContext built in the retry loop.
- CachedCallAttemptContext gains ParentExecutionId.
- CachedCallLifecycleBridge.BuildPacket sets AuditEvent.ParentExecutionId
  from the context, beside the existing ExecutionId.
- IExternalSystemClient.CachedCallAsync / IDatabaseGateway.CachedWriteAsync
  gain an optional parentExecutionId param; ScriptRuntimeContext's CachedCall
  / CachedWrite helpers pass _parentExecutionId.

All threading is additive — ParentExecutionId is Guid? everywhere, null for
non-routed runs, and old buffered S&F rows still deserialize with the new
field null.
2026-05-21 17:58:11 -04:00
Joseph Doherty
50430b9daa feat(auditlog): ParentExecutionId on site SQLite schema + gRPC AuditEventDto 2026-05-21 17:12:34 -04:00
Joseph Doherty
5198b114b4 fix(auditlog): evolve existing site auditlog.db schema for ExecutionId 2026-05-21 16:18:17 -04:00
Joseph Doherty
6f5a35f222 feat(auditlog): thread ExecutionId through S&F for retry-loop cached rows
The store-and-forward retry loop emits the per-attempt and terminal cached
audit rows (ApiCallCached/DbWriteCached Attempted, CachedResolve) via
CachedCallLifecycleBridge from a CachedCallAttemptContext, not from the
script context. ExecutionId (and SourceScript) were not threaded through the
S&F buffer, so those rows had ExecutionId = null and SourceScript = null.

Thread both, additively, from the cached-call enqueue path:

- StoreAndForwardMessage gains ExecutionId (Guid?) / SourceScript (string?).
- StoreAndForwardStorage adds nullable execution_id / source_script columns
  via an idempotent PRAGMA-probed ALTER TABLE migration; rows persisted by
  an older build read back null (back-compat).
- StoreAndForwardService.EnqueueAsync gains optional executionId /
  sourceScript params, stamped onto the buffered message and surfaced on the
  CachedCallAttemptContext built in the retry loop.
- CachedCallAttemptContext gains ExecutionId / SourceScript.
- CachedCallLifecycleBridge.BuildPacket sets AuditEvent.ExecutionId and
  AuditEvent.SourceScript from the context (replacing the hard-coded
  SourceScript = null and its now-stale comment).
- IExternalSystemClient.CachedCallAsync / IDatabaseGateway.CachedWriteAsync
  gain optional executionId / sourceScript params; ScriptRuntimeContext's
  CachedCall / CachedWrite helpers pass _executionId / _sourceScript.

Script-side cached rows (CachedSubmit, immediate Attempted+Resolve) are
unchanged. All threading is additive — old buffered S&F rows still
deserialize and process with the new fields null.
2026-05-21 15:18:35 -04:00
Joseph Doherty
6b16a48886 feat(auditlog): ExecutionId on site SQLite schema + gRPC AuditEventDto 2026-05-21 14:53:08 -04:00
Joseph Doherty
7a386a80ce docs(auditlog): mark follow-ups complete in roadmap; refresh stale comments 2026-05-21 06:39:49 -04:00
Joseph Doherty
6f0d2ca499 refactor(auditlog): consolidate SiteCall DTO mapper into Communication
Extract the verbatim-duplicated SiteCallOperationalDto -> SiteCall mapper
into a single public SiteCallDtoMapper static class in
ScadaLink.Communication.Grpc, mirroring AuditEventDtoMapper. Replaces three
identical private copies (SiteStreamGrpcServer.MapSiteCallFromDto,
ClusterClientSiteAuditClient.MapSiteCall, and the test-infra
DirectActorSiteStreamAuditClient.MapSiteCallFromDto), removes the now-stale
doc comment that justified the duplication, and drops the using directives
that became unused. Adds SiteCallDtoMapperTests for field-by-field coverage.

Only the FromDto direction is provided: nothing maps SiteCall back onto the
wire, so a ToDto would be dead code.
2026-05-21 04:00:20 -04:00
Joseph Doherty
fdd1a4b886 refactor(auditlog): consolidate AuditEvent DTO mappers into Communication 2026-05-21 03:51:51 -04:00
Joseph Doherty
6f59a1b546 fix(auditlog): assert Forwarded state in push integration test; tidy docs and Host wiring 2026-05-21 03:46:40 -04:00
Joseph Doherty
de5280d1c7 feat(auditlog): real ClusterClient-based site audit push client 2026-05-21 03:39:17 -04:00
Joseph Doherty
ef49b55cf6 fix(health): decouple AuditCentralHealthSnapshot from ActorSystem (#23 M6)
The snapshot's per-site stalled latch now lives on the snapshot itself
and is fed by SiteAuditTelemetryStalledTracker via ApplyStalled, removing
the chain that required ActorSystem at DI composition time. The tracker
is now constructed by AkkaHostedService once ActorSystem.Create returns,
with a lock-guarded auxiliary-disposable list so concurrent host
start/stop in tests cannot race the enumeration.
2026-05-20 19:25:28 -04:00
Joseph Doherty
2744011ce9 feat(health): surface AuditRedactionFailure in central snapshot (#23 M6) 2026-05-20 19:13:19 -04:00
Joseph Doherty
70ed8d4557 feat(health): CentralAuditWriteFailures + AuditCentralHealthSnapshot (#23 M6) 2026-05-20 19:11:52 -04:00
Joseph Doherty
42333a72ed feat(health): SiteAuditTelemetryStalledTracker subscribes to EventStream (#23 M6) 2026-05-20 19:07:44 -04:00
Joseph Doherty
e93f655ce4 feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6) 2026-05-20 19:02:01 -04:00
Joseph Doherty
75b060e0a8 feat(auditlog): AuditLogPartitionMaintenanceService monthly roll-forward (#23 M6) 2026-05-20 18:51:43 -04:00
Joseph Doherty
cc2d6e91f1 fix(auditlog): SiteAuditReconciliationActor captures EventStream before await (#23 M6) 2026-05-20 18:39:19 -04:00
Joseph Doherty
660fdc4e93 feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6)
Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition
purge. On a configurable timer (default 24 hours) the actor:
  1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for
     monthly boundaries whose latest OccurredAtUtc is older than
     DateTime.UtcNow - AuditLogOptions.RetentionDays.
  2. For each eligible boundary calls SwitchOutPartitionAsync, which runs
     the drop-and-rebuild dance around UX_AuditLog_EventId.
  3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on
     the actor-system EventStream so the Bundle E central health collector
     and ops surfaces can subscribe without coupling to this actor.

Co-changes:
* SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the
  switch via COUNT_BIG over the per-partition  filter so the count
  reflects what the switch removed, not a post-purge scan of a table that
  no longer exists. All stub implementations updated.
* AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for
  tests, Interval property resolving either.
* AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs.

Behavior:
* Continue-on-error per boundary — one partition that throws does NOT
  abandon the rest of the tick.
* DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core
  service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor.
* SupervisorStrategy Resume keeps the singleton alive across leaked
  exceptions.
* EventStream capture BEFORE the first await — Context is unsafe after
  await in async receive handlers (same pattern as Sender-capture in
  AuditLogIngestActor.OnIngestAsync).

Tests:
* Tick_Fires_OnDailyInterval — visible timer side effect.
* Tick_OldPartitions_SwitchedOut — both seeded boundaries purged.
* Tick_NewerPartitions_Untouched — empty enumerator → no switches.
* Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries
  RowsDeleted and DurationMs.
* Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error.
* Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window
  computed from UtcNow - RetentionDays.
* EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit +
  MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged,
  Apr-2026 row kept, AuditLogPurgedEvent observed via probe.
2026-05-20 18:36:31 -04:00
Joseph Doherty
c763bd9a04 feat(auditlog): SiteAuditReconciliationActor central singleton (#23 M6) 2026-05-20 18:10:42 -04:00
Joseph Doherty
640fd07454 feat(comms): site-side PullAuditEvents handler (#23 M6) 2026-05-20 17:58:43 -04:00
Joseph Doherty
b409afda2e feat(auditlog): hot-reloadable AuditLogOptions + regex cache invalidation (#23 M5) 2026-05-20 17:35:15 -04:00
Joseph Doherty
23c0fd417e feat(health): AuditRedactionFailure counter + bridge (#23 M5)
Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor
over-redactions as a Site Health metric so a misconfigured /
catastrophic regex shows up on /monitoring/health rather than
disappearing into a NoOp sink.

  - SiteHealthReport: new 'AuditRedactionFailure' int field
    (defaulted to 0 for back-compat with existing producers/tests).
  - ISiteHealthCollector / SiteHealthCollector:
    new IncrementAuditRedactionFailure() — per-interval atomic
    counter with Interlocked, reset on CollectReport, mirroring
    the M2 Bundle G SiteAuditWriteFailures pattern.
  - HealthMetricsAuditRedactionFailureCounter: new bridge in
    ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter
    increments to ISiteHealthCollector — mirrors
    HealthMetricsAuditWriteFailureCounter one-for-one.
  - AddAuditLogHealthMetricsBridge: now ALSO Replaces the
    NoOpAuditRedactionFailureCounter binding with the health-metrics
    bridge, so a single AddAuditLogHealthMetricsBridge() call wires
    both the M2 Bundle G write-failure counter and the M5 Bundle C
    redaction-failure counter into the health report.

Site-side only for M5 — the filter also runs on CentralAuditWriter
and AuditLogIngestActor (where it just keeps the NoOp default), but
a central-side health-metric surface for AuditRedactionFailure is
deferred to M6 alongside the rest of the central health collector
work.

Tests:
  - AuditRedactionFailureMetricTests (HealthMonitoring) covers the
    SiteHealthCollector increment/report/reset shape (3 tests).
  - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers
    the AuditLog → HealthMonitoring bridge (3 tests).
  - Existing CountCapturingHealthCollector stub in
    DeploymentManagerRedeployTests extended with the new no-op
    interface method.

Verified: dotnet build clean, all 24 test projects green
(the only Failed at first ScadaLink.SiteRuntime.Tests run was the
known-flaky InstanceActorChildAttributeRaceTests; passes on re-run
in isolation and full suite, unrelated to these changes).
2026-05-20 17:28:33 -04:00
Joseph Doherty
9b1379ed9b feat(auditlog): wire IAuditPayloadFilter into all writer paths (#23 M5)
Bundle C task M5-T6 — plugs the IAuditPayloadFilter singleton into the
three audit writer entry points so every event is truncated + redacted
before persistence, regardless of which path it took to disk:

  - FallbackAuditWriter (site hot path): filter runs before the primary
    SQLite write AND the ring-buffer enqueue, so a recovery drain replays
    rows that are already capped/redacted.
  - CentralAuditWriter (central direct-write): filter runs before the
    per-call IAuditLogRepository.InsertIfNotExistsAsync.
  - AuditLogIngestActor (site→central telemetry):
      - OnIngestAsync resolves the filter from the per-message scope and
        applies it to each row before IngestedAtUtc stamping.
      - OnCachedTelemetryAsync (M3 dual-write) applies the filter to the
        audit half of every CachedTelemetryEntry before the audit-insert
        + site-call-upsert transaction.

Filter parameter is optional (nullable) on each constructor so the
existing test composition roots that don't pass one keep working unchanged
— production DI wiring in AddAuditLog always passes the real filter
through. ICentralAuditWriter registration switched from the open-ctor
form to a factory so the filter flows through it.

Tests: FilterIntegrationTests covers all three writer paths end-to-end
(4 tests). Full ScadaLink.AuditLog.Tests suite: 146 passed, 0 failed,
0 skipped.
2026-05-20 17:21:57 -04:00
Joseph Doherty
5a7f3e8bf6 feat(auditlog): per-connection SQL parameter redaction opt-in (#23 M5) 2026-05-20 17:11:53 -04:00
Joseph Doherty
37f17dc4a8 feat(auditlog): body regex redaction with over-redaction safety net (#23 M5) 2026-05-20 17:09:36 -04:00
Joseph Doherty
ad7b330f43 feat(auditlog): HTTP header redaction stage (#23 M5) 2026-05-20 17:07:01 -04:00
Joseph Doherty
bba2ef1b4d feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety (#23 M5) 2026-05-20 17:01:13 -04:00
Joseph Doherty
25cdf857c9 feat(auditlog): IAuditPayloadFilter contract (#23 M5) 2026-05-20 16:59:10 -04:00
Joseph Doherty
b31747a632 feat(notif): NotificationOutboxActor + CentralAuditWriter wired (#23 M4)
M4 Bundle B (B1) — add the central-only ICentralAuditWriter implementation
and inject it into NotificationOutboxActor so subsequent tasks (B2/B3) can
route attempt + terminal lifecycle events through the direct-write audit path.

- CentralAuditWriter: thin wrapper around IAuditLogRepository.InsertIfNotExistsAsync;
  scope-per-call (matches AuditLogIngestActor / NotificationOutboxActor pattern);
  stamps IngestedAtUtc; swallows all internal failures (alog.md §13).
- Registered as a singleton in AddAuditLog.
- NotificationOutboxActor ctor takes ICentralAuditWriter (validated non-null).
- Host wiring resolves the writer once from the root provider and passes it
  into the singleton's Props.Create call.
- Existing TestKit fixtures updated with a NoOpCentralAuditWriter helper so
  tests that don't exercise audit emission still compile and pass.
2026-05-20 16:04:01 -04:00
Joseph Doherty
73a19c6f02 refactor(auditlog): remove dead-code ternary in CachedCallLifecycleBridge (#23 M3 final-review fix) 2026-05-20 15:39:10 -04:00
Joseph Doherty
6fe23a4d9b feat(host): register SiteCallAuditActor + CachedCallTelemetry forwarder/bridge (#22, #23 M3)
M3 Bundle F (Task F1) wires the cached-call audit pipeline through the
composition roots:

- Central: register SiteCallAuditActor as a cluster singleton + proxy
  (mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs
  calls .AddSiteCallAudit() on the central role.
- Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge
  in AddAuditLog (lazy factory — Central nodes degrade to audit-only
  emission because IOperationTrackingStore is site-only).
- Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so
  StoreAndForwardService picks it up via DI.
- Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the
  site id to StoreAndForwardService without creating a
  StoreAndForward -> HealthMonitoring project-reference cycle.
- ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script
  scope and threads it into ScriptRuntimeContext.

CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now
nullable so Central DI validation succeeds with the lazy registration; the
forwarder's tracking-half emission is a no-op when the store is absent.

Tests:
- AkkaHostedServiceAuditWiringTests: Central host builds with
  AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site
  resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext.
- Full solution: 194 Host tests green, 241 SiteRuntime tests green, every
  other suite unchanged.
2026-05-20 15:10:47 -04:00
Joseph Doherty
047988e4c8 feat(siteruntime): Database.CachedWrite emits combined telemetry + S&F audit bridge (#23 M3)
Wire the M3 cached-call audit pipeline end-to-end for the database
channel and close the loop between the S&F lifecycle observer and the
site-side dual emitter.

* DatabaseCachedWriteEmissionTests covers Database.CachedWrite (set up
  in Bundle E3): mints a TrackedOperationId, emits one CachedSubmit
  packet on DbOutbound, threads the id into IDatabaseGateway, and is
  best-effort on a thrown forwarder. Mirrors ExternalSystem.CachedCall
  coverage from E3.

* CachedCallLifecycleBridge (new) implements ICachedCallLifecycleObserver
  and lives alongside CachedCallTelemetryForwarder. The bridge ingests
  per-attempt notifications from the S&F retry loop and fans them out
  to the forwarder:
    - TransientFailure -> 1 Attempted row
    - Delivered        -> Attempted + CachedResolve(Delivered)
    - PermanentFailure -> Attempted + CachedResolve(Parked)
    - ParkedMaxRetries -> Attempted + CachedResolve(Parked)
  Channel string -> AuditKind mapping (ApiOutbound->ApiCallCached,
  DbOutbound->DbWriteCached). Best-effort top-level catch swallows any
  unexpected throw so the S&F retry bookkeeping is never disturbed.

* Bridge tests (7) cover all four outcomes, channel mapping, provenance
  propagation, and the no-throw-on-forwarder-failure contract.

Bundle F (Host registration) will instantiate the bridge and inject it
into StoreAndForwardService.cachedCallObserver, closing the wiring path
end-to-end.

Bundle E task E6.
2026-05-20 14:55:17 -04:00
Joseph Doherty
42430dd10a feat(siteruntime): ExternalSystem.CachedCall emits CachedSubmit telemetry (#23 M3)
Rework ScriptRuntimeContext.ExternalSystem.CachedCall to fit the M3
combined-telemetry model:

* Mints a fresh TrackedOperationId and emits one CachedSubmit packet
  via ICachedCallTelemetryForwarder BEFORE handing the call off — the
  SiteCalls row is materialised before the first delivery attempt so
  Tracking.Status(id) can observe a Submitted row even if immediate
  delivery resolves before the helper returns.
* Threads the TrackedOperationId into IExternalSystemClient.CachedCallAsync
  as a new optional parameter (and into IDatabaseGateway.CachedWriteAsync
  for the Database mirror set up here for E6). The gateway uses the id
  as the StoreAndForward messageId so the retry loop (Tasks E4/E5) can
  recover it from StoreAndForwardMessage.Id.
* Returns the TrackedOperationId rather than ExternalCallResult — the
  script's contract is now "get a tracking handle, observe outcome via
  Tracking.Status". Best-effort emission: a thrown forwarder is logged
  + swallowed; the original call still runs and the id is still returned.

DatabaseHelper gets the matching siteId / sourceScript / forwarder
fields and a parallel CachedSubmit emitter (Channel=DbOutbound) so Task
E6's Database.CachedWrite mirror plugs in without further runtime
wiring.

New ICachedCallTelemetryForwarder seam in Commons.Interfaces.Services
so SiteRuntime depends on Commons (existing arrow) rather than
ScadaLink.AuditLog (would have introduced a new dependency).

Bundle E task E3 (and helper-shape work for E6).
2026-05-20 14:48:05 -04:00
Joseph Doherty
2145b29d4d feat(auditlog): CachedCallTelemetryForwarder for site-side dual emission (#23 M3)
Sister to SiteAuditTelemetryActor: takes a combined CachedCallTelemetry
packet and fans it out to the two site-local stores.

* AuditEvent half writes through IAuditWriter (the M2 FallbackAuditWriter
  + SqliteAuditWriter chain — same site SQLite hot-path as sync calls).
* SiteCallOperational half maps Audit.Kind to the matching
  IOperationTrackingStore method:
    - CachedSubmit  -> RecordEnqueueAsync (insert-if-not-exists)
    - ApiCallCached / DbWriteCached -> RecordAttemptAsync (monotonic)
    - CachedResolve -> RecordTerminalAsync (first-write-wins)

Best-effort contract (alog.md §7): independent try/catch per half so a
thrown writer cannot starve the tracking row (and vice-versa); both
failures are logged at warning level and swallowed — the calling script
never sees them.

Wire push deferred to M6 — the NoOp ISiteStreamAuditClient binding stays
in effect; the forwarder writes only to the local stores in M3. The
existing SiteAuditTelemetryActor drain loop will sweep the audit rows
once a real gRPC client lands.

Bundle E task E2.
2026-05-20 14:41:15 -04:00
Joseph Doherty
73719ee066 feat(auditlog): extend ISiteStreamAuditClient with IngestCachedTelemetryAsync (#23 M3)
Add the second site→central RPC seam alongside the existing M2
IngestAuditEventsAsync. The Bundle D proto already lit up
IngestCachedTelemetry (CachedTelemetryBatch / IngestAck) so this commit
just plumbs the client-side abstraction:

* ISiteStreamAuditClient gains IngestCachedTelemetryAsync(batch, ct).
* NoOpSiteStreamAuditClient implements it returning an empty IngestAck
  (same shape as M2 — production gRPC client lands in M6).
* SyncCallEmissionEndToEndTests' DirectActorSiteStreamAuditClient stub
  throws NotSupportedException from the new method so a regression that
  accidentally routes a cached packet through the sync stub fails loudly.
* New NoOpSiteStreamAuditClientTests cover the null-guard + empty-ack
  contract for both batch shapes.

Bundle E task E1.
2026-05-20 14:39:24 -04:00
Joseph Doherty
0a97fff906 feat(auditlog): combined telemetry dual-write transaction (#23 M3) 2026-05-20 14:33:14 -04:00
Joseph Doherty
dd3351da93 feat(health): SiteAuditWriteFailures counter + AuditLog bridge (#23)
Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary-
failure counter into the Site Health Monitoring report payload so a
sustained audit-write outage surfaces on /monitoring/health instead of
disappearing into a NoOp sink.

- SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive).
- ISiteHealthCollector + SiteHealthCollector: new
  IncrementSiteAuditWriteFailures() counter, per-interval reset
  semantics matching ScriptErrorCount / DeadLetterCount.
- HealthMetricsAuditWriteFailureCounter: adapter forwarding
  IAuditWriteFailureCounter.Increment() to the collector.
- AddAuditLogHealthMetricsBridge(): swaps the NoOp default
  registration for the real bridge; called from
  SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog.
- Existing host-wiring test updated: site composition now resolves
  HealthMetricsAuditWriteFailureCounter (not NoOp).

Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new),
full solution green.
2026-05-20 13:22:25 -04:00
Joseph Doherty
9bf1497f03 feat(host): register Audit Log #23 singletons with dedicated dispatcher (#23)
Wires Bundle E of the M2 site-sync pipeline:

- AddAuditLog extended to register the site writer chain (SqliteAuditWriter
  singleton + ISiteAuditQueue forward + RingBufferFallback + FallbackAuditWriter
  composing them) and the telemetry collaborators (SiteAuditTelemetryOptions,
  SqliteAuditWriterOptions, IAuditWriteFailureCounter NoOp default,
  ISiteStreamAuditClient NoOp default).
- AkkaHostedService central role: AuditLogIngestActor as ClusterSingletonManager
  (singleton name 'audit-log-ingest') + ClusterSingletonProxy, mirroring the
  Notification Outbox pattern. Proxy is offered to SiteStreamGrpcServer if it
  resolves (Site path only today; M6 reconciliation will host gRPC on central).
- AkkaHostedService site role: SiteAuditTelemetryActor (per-site, NOT a
  singleton because each site is its own cluster), bound to a dedicated
  audit-telemetry-dispatcher (ForkJoinDispatcher, 2 dedicated threads).
- Program.cs + SiteServiceRegistration.Configure call AddAuditLog on both roles.
- AuditLogIngestActor gains a second constructor that takes IServiceProvider so
  the cluster singleton can create a fresh scope per message — IAuditLogRepository
  is a scoped EF Core service and cannot be pre-resolved from the root. The
  IAuditLogRepository constructor remains for Bundle D's MSSQL-fixture tests.

NoOp ISiteStreamAuditClient is deliberate: no site→central gRPC channel exists
in M2 (sites talk to central via Akka ClusterClient; gRPC SiteStreamService is
hosted on sites for central→site streaming). M6 reconciliation introduces the
real gRPC site→central client + central-hosted gRPC server. Bundle H's
integration test substitutes a stub client directly via the actor's Props.

Tests:
- tests/ScadaLink.AuditLog.Tests/AddAuditLogTests.cs — 11 tests (was 3): writer
  singleton, IAuditWriter as FallbackAuditWriter, ISiteAuditQueue same-instance
  as SqliteAuditWriter, options bind round-trip, NoOp default assertions.
- tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs (new) — 13
  tests: BuildHocon emits audit-telemetry-dispatcher block with the expected
  type/throughput/thread-count; Central composition root resolves the writer
  chain + options; Site composition root resolves the writer chain + options +
  NoOp client.

Verified: dotnet build clean, 23 test suites green (Host 194 + AuditLog 54).
2026-05-20 13:04:05 -04:00
Joseph Doherty
87cae88f92 feat(auditlog): AuditLogIngestActor + gRPC handler (#23) 2026-05-20 12:48:26 -04:00