feat(health): AuditRedactionFailure counter + bridge (#23 M5)

Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor
over-redactions as a Site Health metric so a misconfigured /
catastrophic regex shows up on /monitoring/health rather than
disappearing into a NoOp sink.

  - SiteHealthReport: new 'AuditRedactionFailure' int field
    (defaulted to 0 for back-compat with existing producers/tests).
  - ISiteHealthCollector / SiteHealthCollector:
    new IncrementAuditRedactionFailure() — per-interval atomic
    counter with Interlocked, reset on CollectReport, mirroring
    the M2 Bundle G SiteAuditWriteFailures pattern.
  - HealthMetricsAuditRedactionFailureCounter: new bridge in
    ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter
    increments to ISiteHealthCollector — mirrors
    HealthMetricsAuditWriteFailureCounter one-for-one.
  - AddAuditLogHealthMetricsBridge: now ALSO Replaces the
    NoOpAuditRedactionFailureCounter binding with the health-metrics
    bridge, so a single AddAuditLogHealthMetricsBridge() call wires
    both the M2 Bundle G write-failure counter and the M5 Bundle C
    redaction-failure counter into the health report.

Site-side only for M5 — the filter also runs on CentralAuditWriter
and AuditLogIngestActor (where it just keeps the NoOp default), but
a central-side health-metric surface for AuditRedactionFailure is
deferred to M6 alongside the rest of the central health collector
work.

Tests:
  - AuditRedactionFailureMetricTests (HealthMonitoring) covers the
    SiteHealthCollector increment/report/reset shape (3 tests).
  - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers
    the AuditLog → HealthMonitoring bridge (3 tests).
  - Existing CountCapturingHealthCollector stub in
    DeploymentManagerRedeployTests extended with the new no-op
    interface method.

Verified: dotnet build clean, all 24 test projects green
(the only Failed at first ScadaLink.SiteRuntime.Tests run was the
known-flaky InstanceActorChildAttributeRaceTests; passes on re-run
in isolation and full suite, unrelated to these changes).
This commit is contained in:
Joseph Doherty
2026-05-20 17:28:33 -04:00
parent 9b1379ed9b
commit 23c0fd417e
8 changed files with 214 additions and 12 deletions

View File

@@ -0,0 +1,57 @@
namespace ScadaLink.HealthMonitoring.Tests;
/// <summary>
/// Bundle C (M5-T7) regression coverage. The Audit Log payload filter
/// (<c>DefaultAuditPayloadFilter</c>) increments
/// <c>IAuditRedactionFailureCounter</c> every time a header/body/SQL-param
/// redactor stage throws and the filter has to over-redact the field with
/// the <c>&lt;redacted: redactor error&gt;</c> marker. Bundle C bridges that
/// counter into the Site Health Monitoring report payload as
/// <c>AuditRedactionFailure</c> so a misconfigured / catastrophic regex
/// surfaces on /monitoring/health rather than disappearing into a NoOp sink.
/// Mirrors the Bundle G <c>SiteAuditWriteFailures</c> metric shape — same
/// per-interval increment-and-reset semantics, same defaults-to-zero
/// contract.
/// </summary>
public class AuditRedactionFailureMetricTests
{
private readonly SiteHealthCollector _collector = new();
[Fact]
public void Increment_Three_Times_Counter_Reports_3()
{
_collector.IncrementAuditRedactionFailure();
_collector.IncrementAuditRedactionFailure();
_collector.IncrementAuditRedactionFailure();
var report = _collector.CollectReport("site-1");
Assert.Equal(3, report.AuditRedactionFailure);
}
[Fact]
public void Report_Payload_Includes_AuditRedactionFailure_AsZeroByDefault()
{
var report = _collector.CollectReport("site-1");
Assert.Equal(0, report.AuditRedactionFailure);
}
/// <summary>
/// Mirrors the existing per-interval reset semantics for ScriptErrorCount /
/// AlarmEvaluationErrorCount / DeadLetterCount / SiteAuditWriteFailures —
/// AuditRedactionFailure is an interval count, not a running total.
/// </summary>
[Fact]
public void CollectReport_Resets_AuditRedactionFailure()
{
_collector.IncrementAuditRedactionFailure();
_collector.IncrementAuditRedactionFailure();
var first = _collector.CollectReport("site-1");
Assert.Equal(2, first.AuditRedactionFailure);
var second = _collector.CollectReport("site-1");
Assert.Equal(0, second.AuditRedactionFailure);
}
}