Phase 2 Streams A+B+C feature-complete — real Win32 pump, all 9 IDriver capabilities, end-to-end IPC dispatch. Streams D+E remain (Galaxy MXAccess code lift + parity-debug cycle, plan-budgeted 3-4 weeks). The 494 v1 IntegrationTests still pass — legacy OtOpcUa.Host untouched. StaPump replaces the BlockingCollection placeholder with a real Win32 message pump lifted from v1 StaComThread per CLAUDE.md "Reference Implementation": dedicated STA Thread with SetApartmentState(STA), GetMessage/PostThreadMessage/PeekMessage/TranslateMessage/DispatchMessage/PostQuitMessage P/Invoke, WM_APP=0x8000 for work-item dispatch, WM_APP+1 for graceful-drain → PostQuitMessage, peek-pm-noremove on entry to force the system to create the thread message queue before signalling Started, IsResponsiveAsync probe still no-op-round-trips through PostThreadMessage so the wedge detection works against the real pump. Concurrent ConcurrentQueue<WorkItem> drains on every WM_APP; fault path on dispose drains-and-faults all pending work-item TCSes with InvalidOperationException("STA pump has exited"). All three StaPumpTests pass against the real pump (apartment state STA, healthy probe true, wedged probe false). GalaxyProxyDriver now implements every Phase 2 Stream C capability — IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IRediscoverable, IHostConnectivityProbe — each forwarding through the matching IPC contract. ReadAsync preserves request order even when the Host returns out-of-order values; WriteAsync MessagePack-serializes the value into ValueBytes; SubscribeAsync wraps SubscriptionId in a GalaxySubscriptionHandle record; UnsubscribeAsync uses the new SendOneWayAsync helper on GalaxyIpcClient (fire-and-forget but still gated through the call-semaphore so it doesn't interleave with CallAsync); AlarmSubscribe is one-way and the Host pushes events back via OnAlarmEvent; ReadProcessedAsync short-circuits to NotSupportedException (Galaxy historian only does raw); IRediscoverable's OnRediscoveryNeeded fires when the Host pushes a deploy-watermark notification; IHostConnectivityProbe.GetHostStatuses() snapshots and OnHostStatusChanged fires on Running↔Stopped/Faulted transitions, with IpcHostConnectivityStatus aliased to disambiguate from the Core.Abstractions namespace's same-named type. Internal RaiseDataChange/RaiseAlarmEvent/RaiseRediscoveryNeeded/OnHostConnectivityUpdate methods are the entry points the IPC client will invoke when push frames arrive. Host side: new Backend/IGalaxyBackend interface defines the seam between IPC dispatch and the live MXAccess code (so the dispatcher is unit-testable against an in-memory mock without needing live Galaxy); Backend/StubGalaxyBackend returns success for OpenSession/CloseSession/Subscribe/Unsubscribe/AlarmSubscribe/AlarmAck/Recycle and a recognizable "stub: MXAccess code lift pending (Phase 2 Task B.1)"-tagged error for Discover/ReadValues/WriteValues/HistoryRead — keeps the IPC end-to-end testable today and gives the parity team a clear seam to slot the real implementation into; Ipc/GalaxyFrameHandler is the new real dispatcher (replaces StubFrameHandler in Program.cs) — switch on MessageKind, deserialize the matching contract, await backend method, write the response (one-way for Unsubscribe/AlarmSubscribe/AlarmAck/CloseSession), heartbeat handled inline so liveness still works if the backend is sick, exceptions caught and surfaced as ErrorResponse with code "handler-exception" so the Proxy raises GalaxyIpcException instead of disconnecting. End-to-end IPC integration test (EndToEndIpcTests) drives every operation through the full stack — Initialize → Read → Write → Subscribe → Unsubscribe → SubscribeAlarms → AlarmAck → ReadRaw → ReadProcessed (short-circuit) — proving the wire protocol, dispatcher, capability forwarding, and one-way semantics agree end-to-end. Skipped on Windows administrator shells per the same PipeAcl-denies-Administrators reasoning the IpcHandshakeIntegrationTests use. Full solution 952 pass / 1 pre-existing Phase 0 baseline. Phase 2 evidence doc updated: status header now reads "Streams A+B+C complete... Streams D+E remain — gated only on the iterative Galaxy code lift + parity-debug cycle"; new Update 2026-04-17 (later) callout enumerates the upgrade with explicit "what's left for the Phase 2 exit gate" — replace StubGalaxyBackend with a MxAccessClient-backed implementation calling on the StaPump, then run the v1 IntegrationTests against the v2 topology and iterate on parity defects until green, then delete legacy OtOpcUa.Host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-17 23:02:00 -04:00
parent a1e9ed40fb
commit 32eeeb9e04
9 changed files with 889 additions and 40 deletions

View File

@@ -0,0 +1,34 @@
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
/// <summary>
/// Galaxy data-plane abstraction. Replaces the placeholder <c>StubFrameHandler</c> with a
/// real boundary the lifted <c>MxAccessClient</c> + <c>GalaxyRepository</c> implement during
/// Phase 2 Task B.1. Splitting the IPC dispatch (<c>GalaxyFrameHandler</c>) from the
/// backend means the dispatcher is unit-testable against an in-memory mock without needing
/// live Galaxy.
/// </summary>
public interface IGalaxyBackend
{
Task<OpenSessionResponse> OpenSessionAsync(OpenSessionRequest req, CancellationToken ct);
Task CloseSessionAsync(CloseSessionRequest req, CancellationToken ct);
Task<DiscoverHierarchyResponse> DiscoverAsync(DiscoverHierarchyRequest req, CancellationToken ct);
Task<ReadValuesResponse> ReadValuesAsync(ReadValuesRequest req, CancellationToken ct);
Task<WriteValuesResponse> WriteValuesAsync(WriteValuesRequest req, CancellationToken ct);
Task<SubscribeResponse> SubscribeAsync(SubscribeRequest req, CancellationToken ct);
Task UnsubscribeAsync(UnsubscribeRequest req, CancellationToken ct);
Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct);
Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct);
Task<HistoryReadResponse> HistoryReadAsync(HistoryReadRequest req, CancellationToken ct);
Task<RecycleStatusResponse> RecycleAsync(RecycleHostRequest req, CancellationToken ct);
}

View File

@@ -0,0 +1,87 @@
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
/// <summary>
/// Phase 2 placeholder backend — accepts session open/close + responds to recycle, returns
/// "not-implemented" results for every data-plane call. Replaced by the lifted
/// <c>MxAccessClient</c>-backed implementation during the deferred Galaxy code move
/// (Task B.1 + parity gate). Keeps the IPC end-to-end testable today.
/// </summary>
public sealed class StubGalaxyBackend : IGalaxyBackend
{
private long _nextSessionId;
private long _nextSubscriptionId;
public Task<OpenSessionResponse> OpenSessionAsync(OpenSessionRequest req, CancellationToken ct)
{
var id = Interlocked.Increment(ref _nextSessionId);
return Task.FromResult(new OpenSessionResponse { Success = true, SessionId = id });
}
public Task CloseSessionAsync(CloseSessionRequest req, CancellationToken ct) => Task.CompletedTask;
public Task<DiscoverHierarchyResponse> DiscoverAsync(DiscoverHierarchyRequest req, CancellationToken ct)
=> Task.FromResult(new DiscoverHierarchyResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Objects = System.Array.Empty<GalaxyObjectInfo>(),
});
public Task<ReadValuesResponse> ReadValuesAsync(ReadValuesRequest req, CancellationToken ct)
=> Task.FromResult(new ReadValuesResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Values = System.Array.Empty<GalaxyDataValue>(),
});
public Task<WriteValuesResponse> WriteValuesAsync(WriteValuesRequest req, CancellationToken ct)
{
var results = new WriteValueResult[req.Writes.Length];
for (var i = 0; i < req.Writes.Length; i++)
{
results[i] = new WriteValueResult
{
TagReference = req.Writes[i].TagReference,
StatusCode = 0x80020000u, // Bad_InternalError
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
};
}
return Task.FromResult(new WriteValuesResponse { Results = results });
}
public Task<SubscribeResponse> SubscribeAsync(SubscribeRequest req, CancellationToken ct)
{
var sid = Interlocked.Increment(ref _nextSubscriptionId);
return Task.FromResult(new SubscribeResponse
{
Success = true,
SubscriptionId = sid,
ActualIntervalMs = req.RequestedIntervalMs,
});
}
public Task UnsubscribeAsync(UnsubscribeRequest req, CancellationToken ct) => Task.CompletedTask;
public Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct) => Task.CompletedTask;
public Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct) => Task.CompletedTask;
public Task<HistoryReadResponse> HistoryReadAsync(HistoryReadRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Tags = System.Array.Empty<HistoryTagValues>(),
});
public Task<RecycleStatusResponse> RecycleAsync(RecycleHostRequest req, CancellationToken ct)
=> Task.FromResult(new RecycleStatusResponse
{
Accepted = true,
GraceSeconds = 15, // matches Phase 2 plan §B.8 default
});
}

View File

@@ -0,0 +1,107 @@
using System;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc;
/// <summary>
/// Real IPC dispatcher — routes each <see cref="MessageKind"/> to the matching
/// <see cref="IGalaxyBackend"/> method. Replaces <see cref="StubFrameHandler"/>. Heartbeat
/// stays handled inline so liveness detection works regardless of backend health.
/// </summary>
public sealed class GalaxyFrameHandler(IGalaxyBackend backend, ILogger logger) : IFrameHandler
{
public async Task HandleAsync(MessageKind kind, byte[] body, FrameWriter writer, CancellationToken ct)
{
try
{
switch (kind)
{
case MessageKind.Heartbeat:
{
var hb = Deserialize<Heartbeat>(body);
await writer.WriteAsync(MessageKind.HeartbeatAck,
new HeartbeatAck { SequenceNumber = hb.SequenceNumber, UtcUnixMs = hb.UtcUnixMs }, ct);
return;
}
case MessageKind.OpenSessionRequest:
{
var resp = await backend.OpenSessionAsync(Deserialize<OpenSessionRequest>(body), ct);
await writer.WriteAsync(MessageKind.OpenSessionResponse, resp, ct);
return;
}
case MessageKind.CloseSessionRequest:
await backend.CloseSessionAsync(Deserialize<CloseSessionRequest>(body), ct);
return; // one-way
case MessageKind.DiscoverHierarchyRequest:
{
var resp = await backend.DiscoverAsync(Deserialize<DiscoverHierarchyRequest>(body), ct);
await writer.WriteAsync(MessageKind.DiscoverHierarchyResponse, resp, ct);
return;
}
case MessageKind.ReadValuesRequest:
{
var resp = await backend.ReadValuesAsync(Deserialize<ReadValuesRequest>(body), ct);
await writer.WriteAsync(MessageKind.ReadValuesResponse, resp, ct);
return;
}
case MessageKind.WriteValuesRequest:
{
var resp = await backend.WriteValuesAsync(Deserialize<WriteValuesRequest>(body), ct);
await writer.WriteAsync(MessageKind.WriteValuesResponse, resp, ct);
return;
}
case MessageKind.SubscribeRequest:
{
var resp = await backend.SubscribeAsync(Deserialize<SubscribeRequest>(body), ct);
await writer.WriteAsync(MessageKind.SubscribeResponse, resp, ct);
return;
}
case MessageKind.UnsubscribeRequest:
await backend.UnsubscribeAsync(Deserialize<UnsubscribeRequest>(body), ct);
return; // one-way
case MessageKind.AlarmSubscribeRequest:
await backend.SubscribeAlarmsAsync(Deserialize<AlarmSubscribeRequest>(body), ct);
return; // one-way; subsequent alarm events are server-pushed
case MessageKind.AlarmAckRequest:
await backend.AcknowledgeAlarmAsync(Deserialize<AlarmAckRequest>(body), ct);
return;
case MessageKind.HistoryReadRequest:
{
var resp = await backend.HistoryReadAsync(Deserialize<HistoryReadRequest>(body), ct);
await writer.WriteAsync(MessageKind.HistoryReadResponse, resp, ct);
return;
}
case MessageKind.RecycleHostRequest:
{
var resp = await backend.RecycleAsync(Deserialize<RecycleHostRequest>(body), ct);
await writer.WriteAsync(MessageKind.RecycleStatusResponse, resp, ct);
return;
}
default:
await SendErrorAsync(writer, "unknown-kind", $"Frame kind {kind} not handled by Host", ct);
return;
}
}
catch (OperationCanceledException) { throw; }
catch (Exception ex)
{
logger.Error(ex, "GalaxyFrameHandler threw on {Kind}", kind);
await SendErrorAsync(writer, "handler-exception", ex.Message, ct);
}
}
private static T Deserialize<T>(byte[] body) => MessagePackSerializer.Deserialize<T>(body);
private static Task SendErrorAsync(FrameWriter writer, string code, string message, CancellationToken ct)
=> writer.WriteAsync(MessageKind.ErrorResponse,
new ErrorResponse { Code = code, Message = message }, ct);
}

View File

@@ -38,7 +38,10 @@ public static class Program
Log.Information("OtOpcUaGalaxyHost starting — pipe={Pipe} allowedSid={Sid}", pipeName, allowedSidValue);
var handler = new StubFrameHandler();
// Real frame dispatcher backed by StubGalaxyBackend until the MXAccess code lift
// (Phase 2 Task B.1) replaces the backend with the live MxAccessClient-backed one.
var backend = new Backend.StubGalaxyBackend();
var handler = new GalaxyFrameHandler(backend, Log.Logger);
server.RunAsync(handler, cts.Token).GetAwaiter().GetResult();
Log.Information("OtOpcUaGalaxyHost stopped cleanly");

View File

@@ -1,31 +1,37 @@
using System;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Sta;
/// <summary>
/// Dedicated STA thread that owns all <c>LMXProxyServer</c> COM instances. Work items are
/// posted from any thread and dispatched on the STA. Per <c>driver-stability.md</c> Galaxy
/// deep dive §"STA thread + Win32 message pump".
/// Dedicated STA thread with a Win32 message pump that owns all <c>LMXProxyServer</c> COM
/// instances. Lifted from v1 <c>StaComThread</c> per CLAUDE.md "Reference Implementation".
/// Per <c>driver-stability.md</c> Galaxy deep dive §"STA thread + Win32 message pump":
/// work items dispatched via <c>PostThreadMessage(WM_APP)</c>; <c>WM_APP+1</c> requests a
/// graceful drain → <c>WM_QUIT</c>; supervisor escalates to <c>Environment.Exit(2)</c> if the
/// pump doesn't drain within the recycle grace window.
/// </summary>
/// <remarks>
/// Phase 2 scaffold: uses a <see cref="BlockingCollection{T}"/> dispatcher instead of the real
/// Win32 <c>GetMessage/DispatchMessage</c> pump. Real pump arrives when the v1 <c>StaComThread</c>
/// is lifted — that's part of the deferred Galaxy code move. The apartment state and work
/// dispatch semantics are identical so production code can be swapped in without changes.
/// </remarks>
public sealed class StaPump : IDisposable
{
private const uint WM_APP = 0x8000;
private const uint WM_DRAIN_AND_QUIT = WM_APP + 1;
private const uint PM_NOREMOVE = 0x0000;
private readonly Thread _thread;
private readonly BlockingCollection<Action> _workQueue = new(new ConcurrentQueue<Action>());
private readonly ConcurrentQueue<WorkItem> _workItems = new();
private readonly TaskCompletionSource<bool> _started = new(TaskCreationOptions.RunContinuationsAsynchronously);
private volatile uint _nativeThreadId;
private volatile bool _pumpExited;
private volatile bool _disposed;
public int ThreadId => _thread.ManagedThreadId;
public DateTime LastDispatchedUtc { get; private set; } = DateTime.MinValue;
public int QueueDepth => _workQueue.Count;
public int QueueDepth => _workItems.Count;
public bool IsRunning => _nativeThreadId != 0 && !_disposed && !_pumpExited;
public StaPump(string name = "Galaxy.Sta")
{
@@ -40,24 +46,36 @@ public sealed class StaPump : IDisposable
public Task<T> InvokeAsync<T>(Func<T> work)
{
if (_disposed) throw new ObjectDisposedException(nameof(StaPump));
if (_pumpExited) throw new InvalidOperationException("STA pump has exited");
var tcs = new TaskCompletionSource<T>(TaskCreationOptions.RunContinuationsAsynchronously);
_workQueue.Add(() =>
_workItems.Enqueue(new WorkItem(
() =>
{
try { tcs.TrySetResult(work()); }
catch (Exception ex) { tcs.TrySetException(ex); }
},
ex => tcs.TrySetException(ex)));
if (!PostThreadMessage(_nativeThreadId, WM_APP, IntPtr.Zero, IntPtr.Zero))
{
try { tcs.SetResult(work()); }
catch (Exception ex) { tcs.SetException(ex); }
});
_pumpExited = true;
DrainAndFaultQueue();
}
return tcs.Task;
}
public Task InvokeAsync(Action work) => InvokeAsync(() => { work(); return 0; });
/// <summary>
/// Health probe — returns true if a no-op work item round-trips within <paramref name="timeout"/>.
/// Used by the supervisor; timeout means the pump is wedged and a recycle is warranted.
/// Health probe — returns true if a no-op work item round-trips within
/// <paramref name="timeout"/>. Used by the supervisor; timeout means the pump is wedged
/// and a recycle is warranted (Task B.2 acceptance).
/// </summary>
public async Task<bool> IsResponsiveAsync(TimeSpan timeout)
{
if (!IsRunning) return false;
var task = InvokeAsync(() => { });
var completed = await Task.WhenAny(task, Task.Delay(timeout)).ConfigureAwait(false);
return completed == task;
@@ -65,27 +83,124 @@ public sealed class StaPump : IDisposable
private void PumpLoop()
{
_started.TrySetResult(true);
try
{
while (!_disposed)
_nativeThreadId = GetCurrentThreadId();
// Force the system to create the thread message queue before we signal Started.
// PeekMessage(PM_NOREMOVE) on an empty queue is the documented way to do this.
PeekMessage(out _, IntPtr.Zero, 0, 0, PM_NOREMOVE);
_started.TrySetResult(true);
// GetMessage returns 0 on WM_QUIT, -1 on error, otherwise a positive value.
while (GetMessage(out var msg, IntPtr.Zero, 0, 0) > 0)
{
if (_workQueue.TryTake(out var work, Timeout.Infinite))
if (msg.message == WM_APP)
{
work();
LastDispatchedUtc = DateTime.UtcNow;
DrainQueue();
}
else if (msg.message == WM_DRAIN_AND_QUIT)
{
DrainQueue();
PostQuitMessage(0);
}
else
{
// Pass through any window/dialog messages the COM proxy may inject.
TranslateMessage(ref msg);
DispatchMessage(ref msg);
}
}
}
catch (InvalidOperationException) { /* CompleteAdding called during dispose */ }
catch (Exception ex)
{
_started.TrySetException(ex);
}
finally
{
_pumpExited = true;
DrainAndFaultQueue();
}
}
private void DrainQueue()
{
while (_workItems.TryDequeue(out var item))
{
item.Execute();
LastDispatchedUtc = DateTime.UtcNow;
}
}
private void DrainAndFaultQueue()
{
var ex = new InvalidOperationException("STA pump has exited");
while (_workItems.TryDequeue(out var item))
{
try { item.Fault(ex); }
catch { /* faulting a TCS shouldn't throw, but be defensive */ }
}
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
_workQueue.CompleteAdding();
_thread.Join(TimeSpan.FromSeconds(5));
_workQueue.Dispose();
try
{
if (_nativeThreadId != 0 && !_pumpExited)
PostThreadMessage(_nativeThreadId, WM_DRAIN_AND_QUIT, IntPtr.Zero, IntPtr.Zero);
_thread.Join(TimeSpan.FromSeconds(5));
}
catch { /* swallow — best effort */ }
DrainAndFaultQueue();
}
private sealed record WorkItem(Action Execute, Action<Exception> Fault);
#region Win32 P/Invoke
[StructLayout(LayoutKind.Sequential)]
private struct MSG
{
public IntPtr hwnd;
public uint message;
public IntPtr wParam;
public IntPtr lParam;
public uint time;
public POINT pt;
}
[StructLayout(LayoutKind.Sequential)]
private struct POINT { public int x; public int y; }
[DllImport("user32.dll")]
private static extern int GetMessage(out MSG lpMsg, IntPtr hWnd, uint wMsgFilterMin, uint wMsgFilterMax);
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool TranslateMessage(ref MSG lpMsg);
[DllImport("user32.dll")]
private static extern IntPtr DispatchMessage(ref MSG lpMsg);
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool PostThreadMessage(uint idThread, uint Msg, IntPtr wParam, IntPtr lParam);
[DllImport("user32.dll")]
private static extern void PostQuitMessage(int nExitCode);
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool PeekMessage(out MSG lpMsg, IntPtr hWnd, uint wMsgFilterMin, uint wMsgFilterMax,
uint wRemoveMsg);
[DllImport("kernel32.dll")]
private static extern uint GetCurrentThreadId();
#endregion
}