Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-17 21:35:25 -04:00
parent fc0ce36308
commit 01fd90c178
128 changed files with 12352 additions and 4 deletions

View File

@@ -0,0 +1,64 @@
using System;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Stability;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests;
[Trait("Category", "Unit")]
public sealed class MemoryWatchdogTests
{
private const long Mb = 1024 * 1024;
[Fact]
public void Baseline_sample_returns_None()
{
var w = new MemoryWatchdog(baselineBytes: 300 * Mb);
w.Sample(320 * Mb, DateTime.UtcNow).ShouldBe(WatchdogAction.None);
}
[Fact]
public void Warn_threshold_uses_larger_of_1_5x_or_plus_200MB()
{
// Baseline 300 → warn threshold = max(450, 500) = 500 MB
var w = new MemoryWatchdog(baselineBytes: 300 * Mb);
w.Sample(499 * Mb, DateTime.UtcNow).ShouldBe(WatchdogAction.None);
w.Sample(500 * Mb, DateTime.UtcNow).ShouldBe(WatchdogAction.Warn);
}
[Fact]
public void Soft_recycle_triggers_at_2x_or_plus_200MB_whichever_larger()
{
// Baseline 400 → soft = max(800, 600) = 800 MB
var w = new MemoryWatchdog(baselineBytes: 400 * Mb);
w.Sample(799 * Mb, DateTime.UtcNow).ShouldBe(WatchdogAction.Warn);
w.Sample(800 * Mb, DateTime.UtcNow).ShouldBe(WatchdogAction.SoftRecycle);
}
[Fact]
public void Hard_kill_triggers_at_absolute_ceiling()
{
var w = new MemoryWatchdog(baselineBytes: 1000 * Mb);
w.Sample(1501 * Mb, DateTime.UtcNow).ShouldBe(WatchdogAction.HardKill);
}
[Fact]
public void Sustained_slope_triggers_soft_recycle_before_absolute_threshold()
{
// Baseline 1000 MB → warn = 1200, soft = 2000 (absolute). Slope 6 MB/min over 30 min = 180 MB
// delta — still well below the absolute soft threshold; slope detector must fire on its own.
var w = new MemoryWatchdog(baselineBytes: 1000 * Mb) { SustainedSlopeBytesPerMinute = 5 * Mb };
var t0 = new DateTime(2026, 4, 17, 12, 0, 0, DateTimeKind.Utc);
long rss = 1050 * Mb;
var slopeFired = false;
for (var i = 0; i <= 35; i++)
{
var action = w.Sample(rss, t0.AddMinutes(i));
if (action == WatchdogAction.SoftRecycle) { slopeFired = true; break; }
rss += 6 * Mb;
}
slopeFired.ShouldBeTrue("slope detector should fire once the 30-min window fills");
}
}

View File

@@ -0,0 +1,64 @@
using System;
using System.IO;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Stability;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests;
[Trait("Category", "Unit")]
public sealed class PostMortemMmfTests : IDisposable
{
private readonly string _path = Path.Combine(Path.GetTempPath(), $"mmf-test-{Guid.NewGuid():N}.bin");
public void Dispose()
{
if (File.Exists(_path)) File.Delete(_path);
}
[Fact]
public void Write_then_read_round_trips_entries_in_oldest_first_order()
{
using (var mmf = new PostMortemMmf(_path, capacity: 10))
{
mmf.Write(0x30, "read tag-1");
mmf.Write(0x30, "read tag-2");
mmf.Write(0x32, "write tag-3");
}
using var reopen = new PostMortemMmf(_path, capacity: 10);
var entries = reopen.ReadAll();
entries.Length.ShouldBe(3);
entries[0].Message.ShouldBe("read tag-1");
entries[1].Message.ShouldBe("read tag-2");
entries[2].Message.ShouldBe("write tag-3");
entries[0].OpKind.ShouldBe(0x30L);
}
[Fact]
public void Ring_buffer_wraps_and_oldest_entry_is_overwritten()
{
using var mmf = new PostMortemMmf(_path, capacity: 3);
mmf.Write(1, "A");
mmf.Write(2, "B");
mmf.Write(3, "C");
mmf.Write(4, "D"); // overwrites A
var entries = mmf.ReadAll();
entries.Length.ShouldBe(3);
entries[0].Message.ShouldBe("B");
entries[1].Message.ShouldBe("C");
entries[2].Message.ShouldBe("D");
}
[Fact]
public void Message_longer_than_capacity_is_truncated_safely()
{
using var mmf = new PostMortemMmf(_path, capacity: 2);
var huge = new string('x', 500);
mmf.Write(0, huge);
var entries = mmf.ReadAll();
entries[0].Message.Length.ShouldBeLessThan(PostMortemMmf.EntryBytes);
}
}

View File

@@ -0,0 +1,51 @@
using System;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Stability;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests;
[Trait("Category", "Unit")]
public sealed class RecyclePolicyTests
{
[Fact]
public void First_soft_recycle_is_allowed()
{
var p = new RecyclePolicy();
p.TryRequestSoftRecycle(DateTime.UtcNow, out var reason).ShouldBeTrue();
reason.ShouldBeNull();
}
[Fact]
public void Second_soft_recycle_within_cap_is_blocked()
{
var p = new RecyclePolicy();
var t0 = DateTime.UtcNow;
p.TryRequestSoftRecycle(t0, out _).ShouldBeTrue();
p.TryRequestSoftRecycle(t0.AddMinutes(30), out var reason).ShouldBeFalse();
reason.ShouldContain("frequency cap");
}
[Fact]
public void Recycle_after_cap_elapses_is_allowed_again()
{
var p = new RecyclePolicy();
var t0 = DateTime.UtcNow;
p.TryRequestSoftRecycle(t0, out _).ShouldBeTrue();
p.TryRequestSoftRecycle(t0.AddHours(1).AddMinutes(1), out _).ShouldBeTrue();
}
[Fact]
public void Scheduled_recycle_fires_once_per_day_at_local_3am()
{
var p = new RecyclePolicy();
var last = DateTime.MinValue;
p.ShouldSoftRecycleScheduled(new DateTime(2026, 4, 17, 2, 59, 0), ref last).ShouldBeFalse();
p.ShouldSoftRecycleScheduled(new DateTime(2026, 4, 17, 3, 0, 0), ref last).ShouldBeTrue();
p.ShouldSoftRecycleScheduled(new DateTime(2026, 4, 17, 3, 30, 0), ref last).ShouldBeFalse(
"already fired today");
p.ShouldSoftRecycleScheduled(new DateTime(2026, 4, 18, 3, 0, 0), ref last).ShouldBeTrue(
"next day fires again");
}
}

View File

@@ -0,0 +1,47 @@
using System;
using System.Threading;
using System.Threading.Tasks;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Sta;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests;
[Trait("Category", "Unit")]
public sealed class StaPumpTests
{
[Fact]
public async Task InvokeAsync_runs_work_on_the_STA_thread()
{
using var pump = new StaPump();
await pump.WaitForStartedAsync();
var apartment = await pump.InvokeAsync(() => Thread.CurrentThread.GetApartmentState());
apartment.ShouldBe(ApartmentState.STA);
}
[Fact]
public async Task Responsiveness_probe_returns_true_under_healthy_pump()
{
using var pump = new StaPump();
await pump.WaitForStartedAsync();
(await pump.IsResponsiveAsync(TimeSpan.FromSeconds(2))).ShouldBeTrue();
}
[Fact]
public async Task Responsiveness_probe_returns_false_when_pump_is_wedged()
{
using var pump = new StaPump();
await pump.WaitForStartedAsync();
// Wedge the pump with an infinite work item on the STA thread.
var wedge = new ManualResetEventSlim();
_ = pump.InvokeAsync(() => wedge.Wait());
var responsive = await pump.IsResponsiveAsync(TimeSpan.FromMilliseconds(500));
responsive.ShouldBeFalse();
wedge.Set();
}
}

View File

@@ -0,0 +1,31 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net48</TargetFramework>
<Nullable>enable</Nullable>
<LangVersion>latest</LangVersion>
<IsPackable>false</IsPackable>
<IsTestProject>true</IsTestProject>
<RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests</RootNamespace>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="xunit" Version="2.9.2"/>
<PackageReference Include="xunit.runner.visualstudio" Version="3.0.2">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
<PackageReference Include="Shouldly" Version="4.3.0"/>
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0"/>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.csproj"/>
</ItemGroup>
<ItemGroup>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-37gx-xxp4-5rgx"/>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-w3x6-4m5h-cxqf"/>
</ItemGroup>
</Project>