fix(review): remediate re-review findings — DCL-029/InboundAPI-031/SiteRuntime-032/StoreAndForward-028 + Low doc/test

Fixes the 8 findings from the 2026-06-24 re-review (commit c42bb485), with a
regression test per Medium finding:

- DataConnectionLayer-029 (Med): HandleAlarmSubscribeCompleted now mirrors the
  tag-path re-check — if a feed is already stored for the source, release the
  redundant just-created subscription instead of overwriting + leaking the first
  one (the double-subscribe window DCL-023 reopened). +regression test.
- InboundAPI-031 (Med): remove WaitForAttribute's local 5s grace backstop (tighter
  than the CommunicationService Ask's timeout+IntegrationTimeout round-trip budget,
  so a slow-but-valid timed-out 'false' got cancelled into a 500). Link only the
  client-abort + explicit caller tokens; the lower layer owns the backstop. +test.
- SiteRuntime-032 (Med): derive the deployed count from an authoritative set of
  deployed config names (HashSet) instead of a map-presence-gated int, so deleting
  a DISABLED instance decrements correctly (SiteRuntime-029's gate leaked it).
  +deploy->disable->delete regression test.
- StoreAndForward-028 (Med): reset _bufferedCount in StopAsync alongside the
  register-guard so a same-instance Stop->Start re-seeds from a clean base (no ~2N
  gauge double-count). +restart regression test.
- AuditLog-017 (Low): test the OnIngestAsync scope-resolution guard (actor survives,
  replies empty, counts the failure) — no longer unpinned.
- CentralUI-037 / ScriptAnalysis-009 / SiteRuntime-033 (Low): doc-comment + spec
  fixes (Database-throws in the inbound sandbox; baseReferences param wording;
  native-alarm cap return-to-normal + per-condition NativeAlarmDropped eviction).

Targeted suites green: SiteRuntime 5, StoreAndForward 6, InboundAPI 31,
DataConnectionLayer 10, AuditLog 5, ScriptAnalysis 40, CentralUI ScriptAnalysis 52.
This commit is contained in:
Joseph Doherty
2026-06-24 09:39:14 -04:00
parent c42bb48585
commit 9ab1c00265
12 changed files with 320 additions and 34 deletions
@@ -1,6 +1,7 @@
using Akka.Actor;
using Akka.TestKit.Xunit2;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging.Abstractions;
using ZB.MOM.WW.ScadaBridge.AuditLog.Central;
using ZB.MOM.WW.Audit;
@@ -184,6 +185,51 @@ public class AuditLogIngestActorTests : TestKit, IClassFixture<MsSqlMigrationFix
Assert.DoesNotContain(rows, r => r.EventId == poisonId);
}
[Fact]
public async Task Receive_WhenRepositoryResolutionThrows_ActorSurvives_RepliesEmpty_CountsFailure()
{
// AuditLog-017 (covers the AuditLog-014 guard): the production ctor resolves the
// scoped repository per message. If scope creation / repository resolution throws
// (transient DI or DbContext-factory fault, pooled-context init, a resolution race
// during host churn), the outer guard must keep the singleton ALIVE, increment the
// failure counter, and still reply with whatever was accepted (empty here) so the
// site keeps its rows Pending and retries — rather than letting the throw restart
// the singleton and drop the captured reply (the site's Ask would then time out).
var counter = new CountingFailureCounter();
// A provider with NO IAuditLogRepository registered → GetRequiredService throws
// inside the per-message scope; the failure counter IS registered so the guard's
// catch can surface the fault.
var services = new ServiceCollection();
services.AddSingleton<ICentralAuditWriteFailureCounter>(counter);
await using var provider = services.BuildServiceProvider();
var actor = Sys.ActorOf(Props.Create(() => new AuditLogIngestActor(
(IServiceProvider)provider, NullLogger<AuditLogIngestActor>.Instance)));
// First batch: resolution throws → empty reply, one counted failure, no restart.
actor.Tell(new IngestAuditEventsCommand(
Enumerable.Range(0, 3).Select(_ => NewEvent(NewSiteId())).ToList()), TestActor);
var reply = ExpectMsg<IngestAuditEventsReply>(TimeSpan.FromSeconds(10));
Assert.Empty(reply.AcceptedEventIds);
Assert.Equal(1, counter.Count);
// Second batch proves the actor was not restarted/wedged: it still processes
// messages and the guard fires again.
actor.Tell(new IngestAuditEventsCommand(
new List<AuditEvent> { NewEvent(NewSiteId()) }), TestActor);
var reply2 = ExpectMsg<IngestAuditEventsReply>(TimeSpan.FromSeconds(10));
Assert.Empty(reply2.AcceptedEventIds);
Assert.Equal(2, counter.Count);
}
/// <summary>Counts how many times the guard's catch surfaced a write failure.</summary>
private sealed class CountingFailureCounter : ICentralAuditWriteFailureCounter
{
public int Count { get; private set; }
public void Increment() => Count++;
}
/// <summary>
/// Tiny test double that delegates to a real repository but throws on a
/// specified EventId. Used to verify per-row failure isolation: one bad