Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,28 @@
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Supervisor;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class BackoffTests
|
||||
{
|
||||
[Fact]
|
||||
public void Default_sequence_is_5_15_60_seconds_capped()
|
||||
{
|
||||
var b = new Backoff();
|
||||
b.Next().ShouldBe(TimeSpan.FromSeconds(5));
|
||||
b.Next().ShouldBe(TimeSpan.FromSeconds(15));
|
||||
b.Next().ShouldBe(TimeSpan.FromSeconds(60));
|
||||
b.Next().ShouldBe(TimeSpan.FromSeconds(60), "capped once past the last entry");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void RecordStableRun_resets_to_the_first_delay()
|
||||
{
|
||||
var b = new Backoff();
|
||||
b.Next(); b.Next();
|
||||
b.RecordStableRun();
|
||||
b.Next().ShouldBe(TimeSpan.FromSeconds(5));
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,78 @@
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Supervisor;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class CircuitBreakerTests
|
||||
{
|
||||
[Fact]
|
||||
public void First_three_crashes_within_window_allow_respawn()
|
||||
{
|
||||
var breaker = new CircuitBreaker();
|
||||
var t0 = new DateTime(2026, 4, 17, 12, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
breaker.TryRecordCrash(t0, out _).ShouldBeTrue();
|
||||
breaker.TryRecordCrash(t0.AddSeconds(30), out _).ShouldBeTrue();
|
||||
breaker.TryRecordCrash(t0.AddSeconds(60), out _).ShouldBeTrue();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Fourth_crash_within_window_opens_breaker_with_sticky_alert()
|
||||
{
|
||||
var breaker = new CircuitBreaker();
|
||||
var t0 = new DateTime(2026, 4, 17, 12, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
for (var i = 0; i < 3; i++) breaker.TryRecordCrash(t0.AddSeconds(i * 30), out _);
|
||||
|
||||
breaker.TryRecordCrash(t0.AddSeconds(120), out var remaining).ShouldBeFalse();
|
||||
remaining.ShouldBe(TimeSpan.FromHours(1));
|
||||
breaker.StickyAlertActive.ShouldBeTrue();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Cooldown_escalates_1h_then_4h_then_manual()
|
||||
{
|
||||
var breaker = new CircuitBreaker();
|
||||
var t0 = new DateTime(2026, 4, 17, 12, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
// Open once.
|
||||
for (var i = 0; i < 4; i++) breaker.TryRecordCrash(t0.AddSeconds(i * 30), out _);
|
||||
|
||||
// Cooldown starts when the breaker opens (the 4th crash, at t0+90s). Jump past 1h from there.
|
||||
var openedAt = t0.AddSeconds(90);
|
||||
var afterFirstCooldown = openedAt.AddHours(1).AddMinutes(1);
|
||||
breaker.TryRecordCrash(afterFirstCooldown, out _).ShouldBeTrue("cooldown elapsed, breaker closes for a try");
|
||||
|
||||
// Second trip: within 5 min, breaker opens again with 4h cooldown. The crash that trips
|
||||
// it is the 3rd retry since the cooldown closed (afterFirstCooldown itself counted as 1).
|
||||
breaker.TryRecordCrash(afterFirstCooldown.AddSeconds(30), out _).ShouldBeTrue();
|
||||
breaker.TryRecordCrash(afterFirstCooldown.AddSeconds(60), out _).ShouldBeTrue();
|
||||
breaker.TryRecordCrash(afterFirstCooldown.AddSeconds(90), out var cd2).ShouldBeFalse(
|
||||
"4th crash within window reopens the breaker");
|
||||
cd2.ShouldBe(TimeSpan.FromHours(4));
|
||||
|
||||
// Third trip: 4h elapsed, breaker closes for a try, then reopens with MaxValue (manual only).
|
||||
var reopenedAt = afterFirstCooldown.AddSeconds(90);
|
||||
var afterSecondCooldown = reopenedAt.AddHours(4).AddMinutes(1);
|
||||
breaker.TryRecordCrash(afterSecondCooldown, out _).ShouldBeTrue();
|
||||
breaker.TryRecordCrash(afterSecondCooldown.AddSeconds(30), out _).ShouldBeTrue();
|
||||
breaker.TryRecordCrash(afterSecondCooldown.AddSeconds(60), out _).ShouldBeTrue();
|
||||
breaker.TryRecordCrash(afterSecondCooldown.AddSeconds(90), out var cd3).ShouldBeFalse();
|
||||
cd3.ShouldBe(TimeSpan.MaxValue);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ManualReset_clears_sticky_alert_and_crash_history()
|
||||
{
|
||||
var breaker = new CircuitBreaker();
|
||||
var t0 = DateTime.UtcNow;
|
||||
for (var i = 0; i < 4; i++) breaker.TryRecordCrash(t0.AddSeconds(i * 30), out _);
|
||||
|
||||
breaker.ManualReset();
|
||||
breaker.StickyAlertActive.ShouldBeFalse();
|
||||
|
||||
breaker.TryRecordCrash(t0.AddMinutes(10), out _).ShouldBeTrue();
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,40 @@
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Supervisor;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class HeartbeatMonitorTests
|
||||
{
|
||||
[Fact]
|
||||
public void Single_miss_does_not_declare_dead()
|
||||
{
|
||||
var m = new HeartbeatMonitor();
|
||||
m.RecordMiss().ShouldBeFalse();
|
||||
m.RecordMiss().ShouldBeFalse();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Three_consecutive_misses_declare_host_dead()
|
||||
{
|
||||
var m = new HeartbeatMonitor();
|
||||
m.RecordMiss().ShouldBeFalse();
|
||||
m.RecordMiss().ShouldBeFalse();
|
||||
m.RecordMiss().ShouldBeTrue();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Ack_resets_the_miss_counter()
|
||||
{
|
||||
var m = new HeartbeatMonitor();
|
||||
m.RecordMiss();
|
||||
m.RecordMiss();
|
||||
|
||||
m.RecordAck(DateTime.UtcNow);
|
||||
|
||||
m.ConsecutiveMisses.ShouldBe(0);
|
||||
m.RecordMiss().ShouldBeFalse();
|
||||
m.RecordMiss().ShouldBeFalse();
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
using System.IO.Pipes;
|
||||
using System.Security.Principal;
|
||||
using Serilog;
|
||||
using Serilog.Core;
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc;
|
||||
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc;
|
||||
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests;
|
||||
|
||||
/// <summary>
|
||||
/// End-to-end IPC test: <see cref="PipeServer"/> (from Galaxy.Host) accepts a connection from
|
||||
/// the Proxy's <see cref="GalaxyIpcClient"/>. Verifies the Hello handshake, shared-secret
|
||||
/// check, and heartbeat round-trip. Uses the current user's SID so the ACL allows the
|
||||
/// localhost test process. Skipped on non-Windows (pipe ACL is Windows-only).
|
||||
/// </summary>
|
||||
[Trait("Category", "Integration")]
|
||||
public sealed class IpcHandshakeIntegrationTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task Hello_handshake_with_correct_secret_succeeds_and_heartbeat_round_trips()
|
||||
{
|
||||
if (!OperatingSystem.IsWindows()) return; // pipe ACL is Windows-only
|
||||
if (IsAdministrator()) return; // ACL explicitly denies Administrators — skip on admin shells
|
||||
|
||||
using var currentIdentity = WindowsIdentity.GetCurrent();
|
||||
var allowedSid = currentIdentity.User!;
|
||||
var pipeName = $"OtOpcUaGalaxyTest-{Guid.NewGuid():N}";
|
||||
const string secret = "test-secret-2026";
|
||||
Logger log = new LoggerConfiguration().CreateLogger();
|
||||
|
||||
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||
|
||||
var server = new PipeServer(pipeName, allowedSid, secret, log);
|
||||
var serverTask = Task.Run(() => server.RunOneConnectionAsync(new StubFrameHandler(), cts.Token));
|
||||
|
||||
await using var client = await GalaxyIpcClient.ConnectAsync(
|
||||
pipeName, secret, TimeSpan.FromSeconds(5), cts.Token);
|
||||
|
||||
// Heartbeat round-trip via the stub handler.
|
||||
var ack = await client.CallAsync<Heartbeat, HeartbeatAck>(
|
||||
MessageKind.Heartbeat,
|
||||
new Heartbeat { SequenceNumber = 42, UtcUnixMs = 1000 },
|
||||
MessageKind.HeartbeatAck,
|
||||
cts.Token);
|
||||
ack.SequenceNumber.ShouldBe(42L);
|
||||
|
||||
cts.Cancel();
|
||||
try { await serverTask; } catch (OperationCanceledException) { }
|
||||
server.Dispose();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Hello_with_wrong_secret_is_rejected()
|
||||
{
|
||||
if (!OperatingSystem.IsWindows()) return;
|
||||
if (IsAdministrator()) return;
|
||||
|
||||
using var currentIdentity = WindowsIdentity.GetCurrent();
|
||||
var allowedSid = currentIdentity.User!;
|
||||
var pipeName = $"OtOpcUaGalaxyTest-{Guid.NewGuid():N}";
|
||||
Logger log = new LoggerConfiguration().CreateLogger();
|
||||
|
||||
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||
var server = new PipeServer(pipeName, allowedSid, "real-secret", log);
|
||||
var serverTask = Task.Run(() => server.RunOneConnectionAsync(new StubFrameHandler(), cts.Token));
|
||||
|
||||
await Should.ThrowAsync<UnauthorizedAccessException>(() =>
|
||||
GalaxyIpcClient.ConnectAsync(pipeName, "wrong-secret", TimeSpan.FromSeconds(5), cts.Token));
|
||||
|
||||
cts.Cancel();
|
||||
try { await serverTask; } catch { /* server loop ends */ }
|
||||
server.Dispose();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// The production ACL explicitly denies Administrators. On dev boxes the interactive user
|
||||
/// is often an Administrator, so the allow rule gets overridden by the deny — the pipe
|
||||
/// refuses the connection. Skip in that case; the production install runs as a dedicated
|
||||
/// non-admin service account.
|
||||
/// </summary>
|
||||
private static bool IsAdministrator()
|
||||
{
|
||||
if (!OperatingSystem.IsWindows()) return false;
|
||||
using var identity = WindowsIdentity.GetCurrent();
|
||||
var principal = new WindowsPrincipal(identity);
|
||||
return principal.IsInRole(WindowsBuiltInRole.Administrator);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
<Project Sdk="Microsoft.NET.Sdk">
|
||||
|
||||
<PropertyGroup>
|
||||
<TargetFramework>net10.0</TargetFramework>
|
||||
<Nullable>enable</Nullable>
|
||||
<ImplicitUsings>enable</ImplicitUsings>
|
||||
<IsPackable>false</IsPackable>
|
||||
<IsTestProject>true</IsTestProject>
|
||||
<RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests</RootNamespace>
|
||||
</PropertyGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<PackageReference Include="xunit.v3" Version="1.1.0"/>
|
||||
<PackageReference Include="Shouldly" Version="4.3.0"/>
|
||||
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0"/>
|
||||
<PackageReference Include="xunit.runner.visualstudio" Version="3.0.2">
|
||||
<PrivateAssets>all</PrivateAssets>
|
||||
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
|
||||
</PackageReference>
|
||||
</ItemGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<ProjectReference Include="..\..\src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.csproj"/>
|
||||
<ProjectReference Include="..\..\src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.csproj"/>
|
||||
</ItemGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-37gx-xxp4-5rgx"/>
|
||||
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-w3x6-4m5h-cxqf"/>
|
||||
</ItemGroup>
|
||||
|
||||
</Project>
|
||||
Reference in New Issue
Block a user