feat(snf): per-attempt and terminal cached-call lifecycle observer (#23 M3)

Hook the store-and-forward retry loop so the audit pipeline can emit
per-attempt + terminal telemetry under the original TrackedOperationId
(Bundle E Tasks E4 + E5).

New seam:

* ICachedCallLifecycleObserver + CachedCallAttemptContext in
  Commons.Interfaces.Services. Outcome enum
  (Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries)
  is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F)
  will map it to the AuditKind/AuditStatus pair when building the
  CachedCallTelemetry packet.

* StoreAndForwardService gains an optional cachedCallObserver
  constructor parameter + siteId. RetryMessageAsync fires the observer
  exactly once per attempt with the appropriate outcome:
    - handler returns true               -> Delivered
    - handler returns false              -> PermanentFailure (and parks)
    - handler throws + retries remaining -> TransientFailure
    - handler throws + max retries hit   -> ParkedMaxRetries (and parks)

Hook is best-effort: a thrown observer is logged + swallowed so a
failing audit pipeline can never be misclassified as a transient
delivery failure or corrupt the retry-count bookkeeping (alog.md §7).

Only cached-call categories (ExternalSystem, CachedDbWrite) generate
notifications — Notification category has its own central-side
audit pipeline (Notification Outbox / #21).

Pre-M3 callers that didn't thread a TrackedOperationId into the S&F
message id are silently skipped — the observer requires a parseable id
by contract. New S&F callers stamp the id as messageId (Bundle E3).

Bundle E tasks E4 + E5.
This commit is contained in:
Joseph Doherty
2026-05-20 14:52:34 -04:00
parent 42430dd10a
commit 63eb1f4225
3 changed files with 553 additions and 1 deletions

View File

@@ -0,0 +1,93 @@
using ScadaLink.Commons.Types;
namespace ScadaLink.Commons.Interfaces.Services;
/// <summary>
/// Audit Log #23 (M3 Bundle E — Tasks E4/E5): site-side hook the
/// store-and-forward retry loop invokes after every cached-call attempt and
/// at terminal-state transitions, so the audit pipeline can emit
/// <c>ApiCallCached</c>/<c>DbWriteCached</c> per-attempt rows and the
/// <c>CachedResolve</c> terminal row under the original
/// <see cref="TrackedOperationId"/>.
/// </summary>
/// <remarks>
/// <para>
/// The interface deliberately uses <see cref="CachedCallAttemptOutcome"/>
/// rather than <see cref="ScadaLink.Commons.Types.Enums.AuditStatus"/> so the
/// S&amp;F project does not need to depend on the audit vocabulary — the
/// bridge living in <c>ScadaLink.AuditLog</c> maps the outcome to the right
/// audit kind + status when materialising the <c>CachedCallTelemetry</c>
/// packet.
/// </para>
/// <para>
/// <b>Best-effort contract (alog.md §7):</b> implementations MUST swallow
/// internal failures rather than propagating to the S&amp;F service — a
/// thrown observer must not be misclassified as a transient delivery
/// failure and must not corrupt the retry-count bookkeeping.
/// </para>
/// </remarks>
public interface ICachedCallLifecycleObserver
{
/// <summary>
/// Called by the store-and-forward retry loop after every cached-call
/// delivery attempt. Receives the message's TrackedOperationId-bearing id,
/// the per-category channel discriminator, retry-count + last-error
/// context, and whether the outcome reached a terminal state.
/// </summary>
Task OnAttemptCompletedAsync(CachedCallAttemptContext context, CancellationToken ct = default);
}
/// <summary>
/// Per-attempt context handed to <see cref="ICachedCallLifecycleObserver"/>.
/// </summary>
/// <param name="TrackedOperationId">
/// Tracking id parsed from the underlying <c>StoreAndForwardMessage.Id</c>.
/// </param>
/// <param name="Channel">
/// Trust-boundary channel string — <c>"ApiOutbound"</c> for ExternalSystem
/// cached calls, <c>"DbOutbound"</c> for cached DB writes.
/// </param>
/// <param name="Target">Human-readable target (system name / DB connection).</param>
/// <param name="SourceSite">Site id that submitted the cached call.</param>
/// <param name="Outcome">Per-attempt outcome.</param>
/// <param name="RetryCount">Number of retries performed so far (S&amp;F bookkeeping).</param>
/// <param name="LastError">Most recent error message (null on success).</param>
/// <param name="HttpStatus">Most recent HTTP status (null when not applicable).</param>
/// <param name="CreatedAtUtc">When the underlying S&amp;F message was first enqueued.</param>
/// <param name="OccurredAtUtc">When this attempt completed.</param>
/// <param name="DurationMs">Duration of the attempt in milliseconds (null when not measured).</param>
/// <param name="SourceInstanceId">Originating instance, when known.</param>
public sealed record CachedCallAttemptContext(
TrackedOperationId TrackedOperationId,
string Channel,
string Target,
string SourceSite,
CachedCallAttemptOutcome Outcome,
int RetryCount,
string? LastError,
int? HttpStatus,
DateTime CreatedAtUtc,
DateTime OccurredAtUtc,
int? DurationMs,
string? SourceInstanceId);
/// <summary>
/// Coarse outcome of one cached-call delivery attempt, observed from inside
/// the store-and-forward retry loop. The audit bridge maps this to the
/// <c>ApiCallCached</c>/<c>DbWriteCached</c> Attempted row and, when terminal,
/// the corresponding <c>CachedResolve</c> row.
/// </summary>
public enum CachedCallAttemptOutcome
{
/// <summary>Attempt delivered successfully — terminal Delivered state.</summary>
Delivered,
/// <summary>Attempt failed transiently; another retry will follow.</summary>
TransientFailure,
/// <summary>Attempt returned permanent failure — terminal Parked state (S&amp;F semantics).</summary>
PermanentFailure,
/// <summary>Retry budget exhausted — terminal Parked state.</summary>
ParkedMaxRetries,
}