fix(concurrency/lifetime): close Theme 5 — 10 concurrency / DI / scope findings

Concurrency hazards, DI lifetime hygiene, and one verify-only confirmation
across 8 modules. Highlights:

Concurrency:
- CentralUI-030: SandboxConsoleCapture writes routed through WriteSynchronized
  locking on the captured StringWriter — intra-script Task fan-out can no
  longer corrupt the per-call buffer.
- Commons-021: ExternalCallResult.Response now backed by Lazy<dynamic?>
  (ExecutionAndPublication) — no more benign double-parse race.
- CD-017: DeploymentManagerRepository.DeleteDeploymentRecordAsync now takes
  an expected RowVersion and seeds entry.OriginalValues so EF emits
  DELETE ... WHERE Id=@id AND RowVersion=@prior; stale RowVersion now
  throws DbUpdateConcurrencyException instead of silent overwrite.
- Transport-009: AuditCorrelationContext.BundleImportId backed by
  AsyncLocal<Guid?> so concurrent imports get per-logical-call isolation
  (was a scoped instance shared via AuditService across runs).

DI / lifetime:
- AuditLog-003: All 3 AuditLog actor handlers switched to CreateAsyncScope
  + await using — async EF disposal no longer swallowed.
- AuditLog-007: INodeIdentityProvider resolution standardised on
  GetRequiredService<>() (was mixed with GetService<>()).
- AuditLog-011: AddAuditLogHealthMetricsBridge guarded by sentinel
  descriptor check — calling twice no longer double-registers the hosted
  service.

Shutdown / supervision:
- SiteCallAudit-002: AkkaHostedService adds a CoordinatedShutdown
  cluster-leave task (drain-site-call-audit-singleton) that issues a
  bounded GracefulStop(10s) so failover waits for in-flight upserts.

Registration safety:
- NS-020: AkkaHostedService now guards NotificationForwarder S&F
  registration with _notificationDeliveryHandlerRegistered + throws
  InvalidOperationException on double-register to make the regression loud.

VERIFY-only closures:
- NotifOutbox-005: Confirmed already closed by CD-015 fix (ac96b83) —
  NotificationOutboxRepository.InsertIfNotExistsAsync uses the same
  raw-SQL IF NOT EXISTS + 2601/2627 swallow pattern; race eliminated.

5+ new regression tests (CentralUI sandbox WhenAll, ExternalCallResult
64-reader Barrier, AuditLog DI idempotency, RowVersion stale-throw,
SiteCallAudit-002 shutdown drain). Build clean; affected suites all green.
README regenerated: 65 open (was 75).
This commit is contained in:
Joseph Doherty
2026-05-28 07:29:41 -04:00
parent 6ae0fea558
commit 2ed5c6c379
25 changed files with 699 additions and 239 deletions
@@ -1,5 +1,6 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
@@ -274,4 +275,36 @@ public class AddAuditLogTests
Assert.Throws<InvalidOperationException>(
() => provider.GetRequiredService<IAuditWriteFailureCounter>());
}
[Fact]
public void AddAuditLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService()
{
// AuditLog-011: AddHostedService has no TryAdd variant, so a second
// call without the sentinel guard would spin up a second
// SiteAuditBacklogReporter on the same SQLite file. The helper must
// be a no-op on the second call — exactly one hosted-service
// descriptor for SiteAuditBacklogReporter survives.
var config = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
})
.Build();
var services = new ServiceCollection();
services.AddSingleton<ILoggerFactory, NullLoggerFactory>();
services.AddSingleton(typeof(ILogger<>), typeof(NullLogger<>));
services.AddSingleton<INodeIdentityProvider>(new FakeNodeIdentityProvider());
services.AddAuditLog(config);
services.AddHealthMonitoring();
services.AddAuditLogHealthMetricsBridge();
services.AddAuditLogHealthMetricsBridge();
var reporterCount = services.Count(d =>
d.ServiceType == typeof(IHostedService) &&
d.ImplementationType == typeof(SiteAuditBacklogReporter));
Assert.Equal(1, reporterCount);
}
}
@@ -0,0 +1,86 @@
using ScadaLink.CentralUI.ScriptAnalysis;
namespace ScadaLink.CentralUI.Tests.ScriptAnalysis;
/// <summary>
/// Regression tests for the <c>SandboxConsoleCapture</c> writer that the Test Run
/// sandbox installs on <c>Console.Out</c>/<c>Console.Error</c>. CentralUI-030
/// surfaced an intra-script concurrency hazard: a sandboxed script can fan out
/// work with <c>Task.WhenAll</c> / <c>Task.Run</c> and every child task inherits
/// the capture <c>StringWriter</c> via <c>AsyncLocal</c>; <c>StringWriter</c> is
/// not thread-safe, so concurrent writes could corrupt the buffer. These tests
/// drive the writer the same way Roslyn-hosted user code does.
/// </summary>
public class SandboxConsoleCaptureTests
{
/// <summary>
/// CentralUI-030: a capture scope shared across <c>Task.WhenAll</c> child
/// tasks must serialise writes so the resulting transcript contains exactly
/// the expected number of lines without character-level interleaving.
/// </summary>
[Fact]
public async Task BeginCapture_ConcurrentWritesFromTasks_DoNotCorruptBuffer()
{
// The static install routes Console.Out through the singleton sandbox
// capture writer for the test process — this is idempotent and matches
// the way ScriptAnalysisService bootstraps the sandbox in production.
var (capture, _) = SandboxConsoleCapture.Install();
var buffer = new StringWriter();
const int taskCount = 32;
const int linesPerTask = 50;
const int expectedLines = taskCount * linesPerTask;
using (capture.BeginCapture(buffer))
{
// AsyncLocal flows the capture scope into each Task.Run, mirroring
// a sandboxed script doing `await Task.WhenAll(...)` over Tasks
// that each `Console.WriteLine`.
var tasks = Enumerable.Range(0, taskCount).Select(i => Task.Run(() =>
{
for (var j = 0; j < linesPerTask; j++)
{
Console.WriteLine($"task-{i}-line-{j}");
}
}));
await Task.WhenAll(tasks);
}
var captured = buffer.ToString();
// Without the lock, concurrent StringWriter.WriteLine can drop or
// interleave characters and produce malformed lines / a wrong count.
// We assert the exact line count and that every emitted token is
// present on a line of its own — both fail under the unprotected
// implementation.
var lines = captured.Split(Environment.NewLine, StringSplitOptions.RemoveEmptyEntries);
Assert.Equal(expectedLines, lines.Length);
for (var i = 0; i < taskCount; i++)
{
for (var j = 0; j < linesPerTask; j++)
{
Assert.Contains($"task-{i}-line-{j}", lines);
}
}
}
/// <summary>
/// Sanity check: the most basic capture happy-path still works after the
/// CentralUI-030 lock was introduced.
/// </summary>
[Fact]
public void BeginCapture_SingleThreadedWrites_AreCaptured()
{
var (capture, _) = SandboxConsoleCapture.Install();
var buffer = new StringWriter();
using (capture.BeginCapture(buffer))
{
Console.WriteLine("hello");
Console.Write("world");
}
Assert.Contains("hello", buffer.ToString());
Assert.Contains("world", buffer.ToString());
}
}
@@ -0,0 +1,76 @@
using ScadaLink.Commons.Interfaces.Services;
namespace ScadaLink.Commons.Tests.Interfaces.Services;
/// <summary>
/// Tests for <see cref="ExternalCallResult"/>, in particular the Commons-021
/// thread-safe lazy parse of <c>Response</c>. The pre-fix implementation used
/// two mutable fields (<c>_response</c>/<c>_responseParsed</c>) with no
/// synchronization, so concurrent readers could each construct a fresh
/// <c>DynamicJsonElement</c> and one would overwrite the other. The fix moves
/// the parse onto a <c>Lazy&lt;dynamic?&gt;</c> with
/// <c>LazyThreadSafetyMode.ExecutionAndPublication</c> (the default), which
/// guarantees one parse and one shared result for all readers.
/// </summary>
public class ExternalCallResultTests
{
[Fact]
public void Response_NullOrEmptyJson_ReturnsNull()
{
var withNull = new ExternalCallResult(Success: true, ResponseJson: null, ErrorMessage: null);
var withEmpty = new ExternalCallResult(Success: true, ResponseJson: string.Empty, ErrorMessage: null);
Assert.Null(withNull.Response);
Assert.Null(withEmpty.Response);
}
[Fact]
public void Response_ParsesJsonIntoDynamicElement()
{
var result = new ExternalCallResult(Success: true, ResponseJson: "{\"answer\": 42}", ErrorMessage: null);
// dynamic property access is the production usage pattern.
dynamic? response = result.Response;
Assert.NotNull(response);
int answer = (int)response!.answer;
Assert.Equal(42, answer);
}
/// <summary>
/// Commons-021: concurrent readers must observe the same parsed instance
/// (a `Lazy&lt;T&gt;` invariant). Under the pre-fix code two threads could
/// both produce a fresh `DynamicJsonElement` and one would win the race —
/// `ReferenceEquals` would then occasionally fail. With the fix every
/// reader observes the single Lazy-published value, so the assertion
/// holds for every pair of observers.
/// </summary>
[Fact]
public void Response_ConcurrentReads_ReturnSameInstance()
{
// A larger payload makes the parse window wider so the race, if
// present, is more likely to fire. The same property — single
// published instance — must hold for any payload, though.
var json = "{\"items\":[{\"name\":\"a\"},{\"name\":\"b\"},{\"name\":\"c\"}],\"count\":3}";
var result = new ExternalCallResult(Success: true, ResponseJson: json, ErrorMessage: null);
const int observerCount = 64;
var barrier = new Barrier(observerCount);
var observed = new object?[observerCount];
Parallel.For(0, observerCount, i =>
{
// Force all observers to call `Response` at the same instant so
// they collide on the lazy parse rather than each finding it
// already-published.
barrier.SignalAndWait();
observed[i] = result.Response;
});
var first = observed[0];
Assert.NotNull(first);
for (var i = 1; i < observerCount; i++)
{
Assert.Same(first, observed[i]);
}
}
}
@@ -6,6 +6,7 @@ using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Entities.Templates;
using ScadaLink.Commons.Types.Enums;
using ScadaLink.ConfigurationDatabase;
using ScadaLink.ConfigurationDatabase.Repositories;
namespace ScadaLink.ConfigurationDatabase.Tests;
@@ -36,6 +37,31 @@ public class ConcurrencyTestDbContext : ScadaLinkDbContext
}
}
/// <summary>
/// A SQLite-friendly DbContext that keeps <see cref="DeploymentRecord.RowVersion"/> as
/// the optimistic-concurrency token but disables auto-generation (SQLite cannot
/// auto-populate a rowversion column). The caller sets RowVersion explicitly, which
/// is sufficient to exercise the production stub-attach delete path under CD-017's
/// concurrency rule.
/// </summary>
public class RowVersionConcurrencyTestDbContext : ScadaLinkDbContext
{
public RowVersionConcurrencyTestDbContext(DbContextOptions<ScadaLinkDbContext> options) : base(options) { }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<DeploymentRecord>(builder =>
{
builder.Property(d => d.RowVersion)
.IsRequired(false)
.IsConcurrencyToken()
.ValueGeneratedNever();
});
}
}
public class ConcurrencyTests : IDisposable
{
private readonly string _dbPath;
@@ -149,6 +175,63 @@ public class ConcurrencyTests : IDisposable
Assert.Equal("Second update", loaded.Description);
}
[Fact]
public async Task DeleteDeploymentRecord_StaleRowVersion_ThrowsConcurrencyException()
{
// CD-017: Verifies the stub-attach delete path enforces optimistic concurrency
// when the caller passes a RowVersion that no longer matches the row's current
// RowVersion. Uses a SQLite fixture where DeploymentRecord.RowVersion is an
// explicit, caller-managed concurrency token (no SQL Server auto-generation).
using var setupCtx = new RowVersionConcurrencyTestDbContext(BuildOptions());
await setupCtx.Database.EnsureCreatedAsync();
var site = new Site("Site1", "S-RV1");
var template = new Template("RV-T1");
setupCtx.Sites.Add(site);
setupCtx.Templates.Add(template);
await setupCtx.SaveChangesAsync();
var instance = new Instance("RV-I1") { SiteId = site.Id, TemplateId = template.Id, State = InstanceState.Enabled };
setupCtx.Instances.Add(instance);
await setupCtx.SaveChangesAsync();
var record = new DeploymentRecord("deploy-rv-stale", "admin")
{
InstanceId = instance.Id,
DeployedAt = DateTimeOffset.UtcNow,
RowVersion = new byte[] { 0x01 },
};
setupCtx.DeploymentRecords.Add(record);
await setupCtx.SaveChangesAsync();
var id = record.Id;
// Reload in a fresh context and simulate a concurrent edit that has advanced
// the stored RowVersion. The caller below holds the *prior* RowVersion (0x01)
// and is expected to lose the concurrency check.
using (var advanceCtx = new RowVersionConcurrencyTestDbContext(BuildOptions()))
{
var stored = await advanceCtx.DeploymentRecords.SingleAsync(d => d.Id == id);
stored.RowVersion = new byte[] { 0x02 };
await advanceCtx.SaveChangesAsync();
}
using var deleteCtx = new RowVersionConcurrencyTestDbContext(BuildOptions());
var repository = new DeploymentManagerRepository(deleteCtx);
var staleRowVersion = new byte[] { 0x01 };
await repository.DeleteDeploymentRecordAsync(id, staleRowVersion);
await Assert.ThrowsAsync<DbUpdateConcurrencyException>(
() => repository.SaveChangesAsync());
}
private DbContextOptions<ScadaLinkDbContext> BuildOptions()
{
return new DbContextOptionsBuilder<ScadaLinkDbContext>()
.UseSqlite($"DataSource={_dbPath}")
.ConfigureWarnings(w => w.Ignore(RelationalEventId.PendingModelChangesWarning))
.Options;
}
[Fact]
public void DeploymentRecord_HasRowVersionConfigured()
{
@@ -838,9 +838,10 @@ public class DeploymentManagerRepositoryTests : IDisposable
await _repository.AddDeploymentRecordAsync(record);
await _repository.SaveChangesAsync();
var id = record.Id;
var rowVersion = record.RowVersion ?? Array.Empty<byte>();
_context.ChangeTracker.Clear();
await _repository.DeleteDeploymentRecordAsync(id);
await _repository.DeleteDeploymentRecordAsync(id, rowVersion);
await _repository.SaveChangesAsync();
Assert.Null(await _repository.GetDeploymentRecordByIdAsync(id));