fix(siteruntime): immediate-success CachedCall emits terminal telemetry (#23 M3)

Bundle E left a gap in ExternalSystem.CachedCall: when the underlying HTTP
call succeeds immediately (WasBuffered=false), the store-and-forward retry
loop is never engaged and the ICachedCallLifecycleObserver hook never
fires. As a result Tracking.Status(id) would stay in Submitted forever and
the audit log would be missing the Attempted + CachedResolve pair the M3
contract requires.

Fix: capture the ExternalCallResult returned by IExternalSystemClient.
CachedCallAsync. When WasBuffered=false, emit the two missing telemetry
packets from the helper itself:

- ApiCallCached / Attempted   (per-attempt mechanics row, HttpStatus +
                              ErrorMessage extracted via the same regex
                              the synchronous Call() audit row uses)
- CachedResolve / Delivered   on Success, or
- CachedResolve / Failed      on Success=false (immediate permanent
                              failure or transient failure without S&F).

The terminal CachedResolve row carries TerminalAtUtc so SiteCallAudit can
recognise the row as eligible for purge.

The WasBuffered=true path is unaffected — the S&F retry loop owns the
Attempted + Resolve emissions there via the CachedCallLifecycleBridge.
Database.CachedWrite is unaffected too because IDatabaseGateway.
CachedWriteAsync always enqueues into S&F (no immediate-success path).

Both new emissions are best-effort: a throwing forwarder is logged and
swallowed (alog.md §7) and each row is independently try/catch-wrapped so
a single fault cannot drop both halves of the terminal pair.

Tests in ExternalSystemCachedCallEmissionTests:
- CachedCall_ImmediateSuccess_EmitsAttemptedAndCachedResolve
- CachedCall_ImmediateFailure_EmitsAttemptedAndCachedResolveFailed
- CachedCall_BufferedPath_DoesNotEmitTerminalTelemetryFromHelper

Full suite: 244 SiteRuntime tests (3 new), 200 Host tests, all green.
This commit is contained in:
Joseph Doherty
2026-05-20 15:15:11 -04:00
parent 6fe23a4d9b
commit f81750b2aa
2 changed files with 330 additions and 5 deletions

View File

@@ -446,13 +446,20 @@ public class ScriptRuntimeContext
// Hand off to the existing cached-call path. The TrackedOperationId
// becomes the S&F message id so the retry loop (Bundle E Tasks
// E4/E5) can read it back via StoreAndForwardMessage.Id. The
// underlying ExternalCallResult is intentionally discarded — the
// script's contract is now "return the tracking id, observe outcome
// via Tracking.Status".
// E4/E5) can read it back via StoreAndForwardMessage.Id.
//
// M3 Bundle F (F2): the result is now retained because the
// immediate-success path (WasBuffered=false) bypasses S&F entirely
// — no retry loop, no ICachedCallLifecycleObserver fire. The
// helper must emit the Attempted + CachedResolve terminal rows
// itself, otherwise Tracking.Status(id) would stay in Submitted
// forever and the audit log would be missing the M3 lifecycle.
// The WasBuffered=true path is unaffected — the S&F retry loop
// owns the Attempted + Resolve emissions in that case.
ExternalCallResult? result;
try
{
await _client.CachedCallAsync(
result = await _client.CachedCallAsync(
systemName,
methodName,
parameters,
@@ -473,6 +480,16 @@ public class ScriptRuntimeContext
throw;
}
// M3 Bundle F (F2): immediate-completion lifecycle — emit the
// missing Attempted + CachedResolve rows when the underlying call
// resolved without engaging the store-and-forward retry loop.
if (result is { WasBuffered: false })
{
await EmitImmediateTerminalTelemetryAsync(
systemName, methodName, target, trackedId, result, cancellationToken)
.ConfigureAwait(false);
}
return trackedId;
}
@@ -546,6 +563,177 @@ public class ScriptRuntimeContext
}
}
/// <summary>
/// M3 Bundle F (F2): emit the Attempted + CachedResolve lifecycle
/// rows for an immediate-completion <c>CachedCall</c> (WasBuffered=false).
/// The S&amp;F retry loop never engaged, so the
/// <c>ICachedCallLifecycleObserver</c> never fires — the helper must
/// produce both rows itself to keep the M3 audit contract whole
/// (Submit → Attempted → Resolve under one TrackedOperationId).
/// </summary>
/// <remarks>
/// Best-effort emission: a throwing forwarder is logged and swallowed
/// per alog.md §7. The two rows are emitted INDEPENDENTLY so a single
/// forwarder fault doesn't drop both halves of the terminal pair.
/// </remarks>
private async Task EmitImmediateTerminalTelemetryAsync(
string systemName,
string methodName,
string target,
TrackedOperationId trackedId,
ExternalCallResult result,
CancellationToken cancellationToken)
{
if (_cachedForwarder == null)
{
return;
}
var occurredAtUtc = DateTime.UtcNow;
// Extract an HTTP status from the error message when present
// (mirrors EmitCallAudit's existing HttpStatusRegex behaviour so
// the immediate-failure row carries the same HttpStatus value the
// synchronous Call() audit row would have stamped).
int? httpStatus = null;
if (!result.Success && !string.IsNullOrEmpty(result.ErrorMessage))
{
var match = HttpStatusRegex.Match(result.ErrorMessage);
if (match.Success && int.TryParse(match.Groups["code"].Value, out var code))
{
httpStatus = code;
}
}
// Status mapping for immediate completion:
// Success=true -> Delivered (audit) / "Delivered" (operational)
// Success=false -> Failed (audit) / "Failed" (operational)
// Permanent vs transient is not relevant here: a permanent failure
// returns Success=false WasBuffered=false (parked-equivalent); a
// transient failure with NO S&F engine wired likewise lands here
// with Success=false. Either way the terminal state is "the
// immediate attempt failed and the operation is done".
var auditTerminalStatus = result.Success
? AuditStatus.Delivered
: AuditStatus.Failed;
var operationalTerminalStatus = result.Success ? "Delivered" : "Failed";
// --- Attempted row -------------------------------------------------
CachedCallTelemetry attempted;
try
{
attempted = new CachedCallTelemetry(
Audit: new AuditEvent
{
EventId = Guid.NewGuid(),
OccurredAtUtc = DateTime.SpecifyKind(occurredAtUtc, DateTimeKind.Utc),
Channel = AuditChannel.ApiOutbound,
Kind = AuditKind.ApiCallCached,
CorrelationId = trackedId.Value,
SourceSiteId = string.IsNullOrEmpty(_siteId) ? null : _siteId,
SourceInstanceId = _instanceName,
SourceScript = _sourceScript,
Target = target,
Status = AuditStatus.Attempted,
HttpStatus = httpStatus,
ErrorMessage = result.Success ? null : result.ErrorMessage,
ForwardState = AuditForwardState.Pending,
},
Operational: new SiteCallOperational(
TrackedOperationId: trackedId,
Channel: "ApiOutbound",
Target: target,
SourceSite: _siteId,
Status: "Attempted",
// RetryCount stays 0 — the operation never reached the
// S&F retry sweep, so no retries were performed.
RetryCount: 0,
LastError: result.Success ? null : result.ErrorMessage,
HttpStatus: httpStatus,
CreatedAtUtc: occurredAtUtc,
UpdatedAtUtc: occurredAtUtc,
TerminalAtUtc: null));
}
catch (Exception buildEx)
{
_logger.LogWarning(buildEx,
"Failed to build immediate-Attempted telemetry for {System}.{Method} (TrackedOperationId {Id}) — skipping emission",
systemName, methodName, trackedId);
attempted = null!;
}
if (attempted is not null)
{
try
{
await _cachedForwarder.ForwardAsync(attempted, cancellationToken)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"Immediate-Attempted telemetry forward failed for {System}.{Method} (TrackedOperationId {Id})",
systemName, methodName, trackedId);
}
}
// --- CachedResolve row --------------------------------------------
CachedCallTelemetry resolve;
try
{
resolve = new CachedCallTelemetry(
Audit: new AuditEvent
{
EventId = Guid.NewGuid(),
OccurredAtUtc = DateTime.SpecifyKind(occurredAtUtc, DateTimeKind.Utc),
Channel = AuditChannel.ApiOutbound,
Kind = AuditKind.CachedResolve,
CorrelationId = trackedId.Value,
SourceSiteId = string.IsNullOrEmpty(_siteId) ? null : _siteId,
SourceInstanceId = _instanceName,
SourceScript = _sourceScript,
Target = target,
Status = auditTerminalStatus,
HttpStatus = httpStatus,
ErrorMessage = result.Success ? null : result.ErrorMessage,
ForwardState = AuditForwardState.Pending,
},
Operational: new SiteCallOperational(
TrackedOperationId: trackedId,
Channel: "ApiOutbound",
Target: target,
SourceSite: _siteId,
Status: operationalTerminalStatus,
RetryCount: 0,
LastError: result.Success ? null : result.ErrorMessage,
HttpStatus: httpStatus,
CreatedAtUtc: occurredAtUtc,
UpdatedAtUtc: occurredAtUtc,
// Immediate-completion terminal — mark TerminalAtUtc so
// SiteCallAudit can move the row to its purge eligible
// set.
TerminalAtUtc: occurredAtUtc));
}
catch (Exception buildEx)
{
_logger.LogWarning(buildEx,
"Failed to build immediate-CachedResolve telemetry for {System}.{Method} (TrackedOperationId {Id}) — skipping emission",
systemName, methodName, trackedId);
return;
}
try
{
await _cachedForwarder.ForwardAsync(resolve, cancellationToken)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"Immediate-CachedResolve telemetry forward failed for {System}.{Method} (TrackedOperationId {Id})",
systemName, methodName, trackedId);
}
}
/// <summary>
/// Best-effort emission of one <c>ApiOutbound</c>/<c>ApiCall</c> audit
/// row. Any exception thrown by the writer is logged and swallowed —