diff --git a/docs/ServiceHosting.md b/docs/ServiceHosting.md index 833ca27..6b8b391 100644 --- a/docs/ServiceHosting.md +++ b/docs/ServiceHosting.md @@ -83,7 +83,7 @@ The host spins up `StaPump` (the STA thread with message pump), creates the MXAc ### Pipe security -`PipeServer` builds a `PipeAcl` from the provided `SecurityIdentifier` + uses `NamedPipeServerStream` with `maxNumberOfServerInstances: 1`. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match `OTOPCUA_ALLOWED_SID` are rejected before any frame is processed. **By design the pipe ACL denies BUILTIN\Administrators** — live smoke tests must therefore run from a non-elevated shell that matches the allowed principal. The installed dev host (`OtOpcUaGalaxyHost`) runs as `dohertj2` with the secret at `.local/galaxy-host-secret.txt`. +`PipeServer` builds a `PipeAcl` from the provided `SecurityIdentifier` + uses `NamedPipeServerStream` with `maxNumberOfServerInstances: 1`. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match `OTOPCUA_ALLOWED_SID` are rejected before any frame is processed via `NamedPipeServerStream.RunAsClient` + a SID comparison against the configured allow list. The DACL grants `ReadWrite | Synchronize` only to the allowed SID and denies `LocalSystem`. The installed dev host (`OtOpcUaGalaxyHost`) runs as `dohertj2` with the secret at `.local/galaxy-host-secret.txt`. ### Installation diff --git a/docs/drivers/Galaxy-Test-Fixture.md b/docs/drivers/Galaxy-Test-Fixture.md index 53eb41f..b92d7f7 100644 --- a/docs/drivers/Galaxy-Test-Fixture.md +++ b/docs/drivers/Galaxy-Test-Fixture.md @@ -35,13 +35,14 @@ Multi-project test topology: ## How tests skip - **E2E parity**: `ParityFixture.SkipIfUnavailable()` runs at class init and - checks Windows-only, non-admin user, ZB SQL reachable on - `localhost:1433`, Host EXE built in the expected `bin/` folder. Any miss - → tests skip. + checks Windows-only, ZB SQL reachable on `localhost:1433`, Host EXE built + in the expected `bin/` folder. Any miss → tests skip. - **Live-smoke** (`GalaxyRepositoryLiveSmokeTests`): `Assert.Skip` when ZB unreachable. A `per project_galaxy_host_installed` memory on this repo's - dev box notes the MXAccess runtime is installed + pipe ACL denies Admins, - so live tests must run from a non-elevated shell. + dev box notes the MXAccess runtime is installed. The pipe ACL allows the + configured SID outright; elevation of the caller doesn't matter because + the per-connection SID check in `PipeServer.VerifyCaller` only compares + user SIDs (not group membership or integrity level). - **Unit** tests (Shared, Proxy contract, most Host.Tests) have no skip — they run anywhere. diff --git a/docs/drivers/Galaxy.md b/docs/drivers/Galaxy.md index 91d3039..7ea5e83 100644 --- a/docs/drivers/Galaxy.md +++ b/docs/drivers/Galaxy.md @@ -31,7 +31,7 @@ The same Tier-C isolation story applies to FOCAS (decision record in `docs/v2/pl - Pipe name: `otopcua-galaxy-{DriverInstanceId}` (localhost-only, no TCP surface) - Wire format: MessagePack-CSharp, length-prefixed frames -- ACL: pipe is created with a DACL that grants only the Server's service identity; the Admins group is explicitly denied so a live-smoke test running from an elevated shell fails fast rather than silently bypassing the handshake +- ACL: pipe is created with a DACL that grants `ReadWrite | Synchronize` only to the configured Server service-principal SID + denies `LocalSystem`. The per-connection SID check in `PipeServer.VerifyCaller` is the real authorization boundary — any caller whose impersonated token SID doesn't match the allowed SID is dropped before the first frame is read. - Handshake: Proxy presents a shared secret at `OpenSessionRequest`; Host rejects anything else with `MessageKind.OpenSessionResponse{Success=false}` - Heartbeat: Proxy sends a periodic ping; missed heartbeats trigger the Proxy-side crash-loop supervisor to restart the Host diff --git a/docs/v2/driver-stability.md b/docs/v2/driver-stability.md index 4f131ea..55eb91a 100644 --- a/docs/v2/driver-stability.md +++ b/docs/v2/driver-stability.md @@ -174,7 +174,7 @@ Common contract for the proxy in the main server: Named pipes default to allowing connections from any local user. Without explicit ACLs, any process on the host machine that knows the pipe name could connect, bypass the OPC UA server's authentication and authorization layers, and issue reads, writes, or alarm acknowledgments directly against the driver host. **This is a real privilege-escalation surface** — a service account with no OPC UA permissions could write field values it should never have access to. Every Tier C driver enforces the following: -1. **Pipe ACL**: the host creates the pipe with a `PipeSecurity` ACL that grants `ReadWrite | Synchronize` only to the OtOpcUa server's service principal SID. All other local users — including LocalSystem and Administrators — are explicitly denied. The ACL is set at pipe-creation time so it's atomic with the pipe being listenable. +1. **Pipe ACL**: the host creates the pipe with a `PipeSecurity` ACL that grants `ReadWrite | Synchronize` only to the OtOpcUa server's service principal SID. `LocalSystem` is explicitly denied. The ACL is set at pipe-creation time so it's atomic with the pipe being listenable. Administrators are **not** added to the deny list — UAC's filtered token carries the Admins group SID as deny-only, so a deny ACE on Administrators would fire even for non-elevated callers whose user account happens to be a member (common on dev boxes). The per-connection SID check in §2 remains the authorization boundary. 2. **Caller identity verification**: on each new pipe connection, the host calls `NamedPipeServerStream.GetImpersonationUserName()` (or impersonates and inspects the token) and verifies the connected client's SID matches the configured server service SID. Mismatches are logged and the connection is dropped before any RPC frame is read. 3. **Per-message authorization context**: every RPC frame includes the operation's authenticated OPC UA principal (forwarded by the Core after it has done its own authn/authz). The host treats this as input only — the driver-level authorization (e.g. "is this principal allowed to write Tune attributes?") is performed by the Core, but the host's own audit log records the principal so post-incident attribution is possible. 4. **No anonymous endpoints**: the heartbeat pipe has the same ACL as the data-plane pipe. There are no "open" pipes a generic client can probe. diff --git a/docs/v2/implementation/phase-2-galaxy-out-of-process.md b/docs/v2/implementation/phase-2-galaxy-out-of-process.md index 47fc707..111b914 100644 --- a/docs/v2/implementation/phase-2-galaxy-out-of-process.md +++ b/docs/v2/implementation/phase-2-galaxy-out-of-process.md @@ -172,7 +172,7 @@ Lift the existing `GalaxyRuntimeProbeManager` into the new project. Behaviors pe #### Task B.6 — Named-pipe IPC server with mandatory ACL Per decision #76 + `driver-stability.md` §"IPC Security": -- Pipe ACL on creation: `ReadWrite | Synchronize` granted only to the OtOpcUa server's service principal SID; LocalSystem and Administrators **explicitly denied** +- Pipe ACL on creation: `ReadWrite | Synchronize` granted only to the OtOpcUa server's service principal SID; LocalSystem **explicitly denied**. Administrators was dropped from the deny list so non-elevated admins on dev boxes aren't blocked via UAC-filtered-token deny-only semantics — the per-connection SID check (§2 of driver-stability.md) remains the real authorization boundary. - Caller identity verification on each new connection: `GetImpersonationUserName()` cross-checked against configured server service SID; mismatches dropped before any RPC frame is read - Per-process shared secret: passed by the supervisor at spawn time, required on first frame of every connection - Heartbeat pipe: separate from data-plane pipe, same ACL diff --git a/docs/v2/implementation/phase-7-e2e-smoke.md b/docs/v2/implementation/phase-7-e2e-smoke.md index e16355c..3f6c261 100644 --- a/docs/v2/implementation/phase-7-e2e-smoke.md +++ b/docs/v2/implementation/phase-7-e2e-smoke.md @@ -14,7 +14,7 @@ End-to-end validation that the Phase 7 production wiring chain (#243 / #244 / #2 | SQL Server reachable, `OtOpcUaConfig` DB exists with all migrations applied | `sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "..." -Q "SELECT COUNT(*) FROM dbo.__EFMigrationsHistory"` returns ≥ 11 | | Server's `appsettings.json` `Node:ConfigDbConnectionString` matches your SQL Server | `cat src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` | -> **Galaxy.Host pipe ACL.** Per `docs/ServiceHosting.md`, the pipe ACL deliberately denies `BUILTIN\Administrators`. **Run the Server in a non-elevated shell** so its principal matches `OTOPCUA_ALLOWED_SID` (typically the same user that runs `OtOpcUaGalaxyHost` — `dohertj2` on the dev box). +> **Galaxy.Host pipe ACL.** The pipe allows the configured `OTOPCUA_ALLOWED_SID` (typically the user that runs `OtOpcUaGalaxyHost` — `dohertj2` on the dev box). Run the Server under the same user; elevation doesn't matter — `PipeAcl.cs` no longer denies `BUILTIN\Administrators` since UAC's deny-only Admins SID would have blocked non-elevated dev-box admins too. ## Setup @@ -56,7 +56,7 @@ The seed creates one each of: `ServerCluster`, `ClusterNode`, `ConfigGeneration` ## Run -### 5. Start the Server (non-elevated shell) +### 5. Start the Server ```powershell dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server @@ -120,7 +120,7 @@ Open the Historian Client (or InTouch alarm summary) — the `OverTemp` activati - [ ] EF migrations applied through `20260420232000_ExtendComputeGenerationDiffWithPhase7` - [ ] Smoke seed completes without errors + creates exactly 1 Published generation -- [ ] Server starts in non-elevated shell + logs the Phase 7 composition lines +- [ ] Server starts + logs the Phase 7 composition lines - [ ] Client.CLI browse shows the UNS tree with Source / Doubled / OverTemp under reactor-1 - [ ] Read on `Doubled` returns `2 × Source` value - [ ] Read on `OverTemp` returns the live boolean truth of `Source > 50` diff --git a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeAcl.cs b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeAcl.cs index ba8dc62..a66d2ef 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeAcl.cs +++ b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeAcl.cs @@ -8,9 +8,18 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc; /// /// Builds the required by driver-stability.md §"IPC Security": /// only the configured OtOpcUa server principal SID gets ReadWrite | Synchronize; -/// LocalSystem and Administrators are explicitly denied. Any other authenticated user falls -/// through to the implicit deny. +/// LocalSystem is explicitly denied. Any other authenticated user falls through to the +/// implicit deny. /// +/// +/// Earlier revisions also denied BUILTIN\Administrators, which broke live testing +/// on dev boxes where the allowed user (dohertj2) is also a member of the local +/// Administrators group — UAC's filtered token still carries the Admins SID as deny-only, +/// so the deny ACE fired even from non-elevated shells. The per-connection +/// check already gates on the exact allowed SID, +/// which is the real authorization boundary, so the Admins deny added no defence in depth +/// in that topology. +/// public static class PipeAcl { public static PipeSecurity Create(SecurityIdentifier allowedSid) @@ -25,12 +34,8 @@ public static class PipeAcl AccessControlType.Allow)); var localSystem = new SecurityIdentifier(WellKnownSidType.LocalSystemSid, null); - var admins = new SecurityIdentifier(WellKnownSidType.BuiltinAdministratorsSid, null); - if (allowedSid != localSystem) security.AddAccessRule(new PipeAccessRule(localSystem, PipeAccessRights.FullControl, AccessControlType.Deny)); - if (allowedSid != admins) - security.AddAccessRule(new PipeAccessRule(admins, PipeAccessRights.FullControl, AccessControlType.Deny)); // Owner = allowed SID so the deny rules can't be removed without write-DACL rights. security.SetOwner(allowedSid); diff --git a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeServer.cs b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeServer.cs index 32651e0..5994d9d 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeServer.cs +++ b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Ipc/PipeServer.cs @@ -56,17 +56,12 @@ public sealed class PipeServer : IDisposable { await _current.WaitForConnectionAsync(linked.Token).ConfigureAwait(false); - if (!VerifyCaller(_current, out var reason)) - { - _logger.Warning("IPC caller rejected: {Reason}", reason); - _current.Disconnect(); - return; - } - using var reader = new FrameReader(_current, leaveOpen: true); using var writer = new FrameWriter(_current, leaveOpen: true); - // First frame must be a Hello with the correct shared secret. + // First frame must be a Hello with the correct shared secret. Reading it before + // the caller-SID impersonation check satisfies Windows' ERROR_CANNOT_IMPERSONATE + // rule — ImpersonateNamedPipeClient fails until at least one frame has been read. var first = await reader.ReadFrameAsync(linked.Token).ConfigureAwait(false); if (first is null || first.Value.Kind != MessageKind.Hello) { @@ -74,6 +69,13 @@ public sealed class PipeServer : IDisposable return; } + if (!VerifyCaller(_current, out var reason)) + { + _logger.Warning("IPC caller rejected: {Reason}", reason); + _current.Disconnect(); + return; + } + var hello = MessagePackSerializer.Deserialize(first.Value.Body); if (!string.Equals(hello.SharedSecret, _sharedSecret, StringComparison.Ordinal)) { diff --git a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/GalaxyProxyDriver.cs b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/GalaxyProxyDriver.cs index e97a6b5..1e43a28 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/GalaxyProxyDriver.cs +++ b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/GalaxyProxyDriver.cs @@ -1,3 +1,4 @@ +using MessagePack; using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc; @@ -48,6 +49,12 @@ public sealed class GalaxyProxyDriver(GalaxyProxyOptions options) _client = await GalaxyIpcClient.ConnectAsync( options.PipeName, options.SharedSecret, options.ConnectTimeout, cancellationToken); + // Route Host-pushed event frames to the matching Raise* methods. Must be set BEFORE + // the first CallAsync so a RuntimeStatusChange arriving between OpenSessionRequest + // and OpenSessionResponse lands on the handler rather than unblocking the call with + // the wrong kind. + _client.SetEventHandler(DispatchHostEventAsync); + var resp = await _client.CallAsync( MessageKind.OpenSessionRequest, new OpenSessionRequest { DriverInstanceId = DriverInstanceId, DriverConfigJson = driverConfigJson }, @@ -459,6 +466,37 @@ public sealed class GalaxyProxyDriver(GalaxyProxyOptions options) // ---- helpers ---- + /// + /// Event-handler registered with . Decodes + /// the MessagePack body into the matching wire contract and delegates to the existing + /// Raise* helpers. Unknown kinds are silently ignored — the IPC contract is + /// append-only, so a newer Host sending a kind this Proxy doesn't recognise shouldn't + /// break the session. + /// + private Task DispatchHostEventAsync(MessageKind kind, byte[] body) + { + switch (kind) + { + case MessageKind.OnDataChangeNotification: + RaiseDataChange(MessagePackSerializer.Deserialize(body)); + break; + case MessageKind.AlarmEvent: + RaiseAlarmEvent(MessagePackSerializer.Deserialize(body)); + break; + case MessageKind.HostConnectivityStatus: + OnHostConnectivityUpdate(MessagePackSerializer.Deserialize(body)); + break; + case MessageKind.RuntimeStatusChange: + var rsc = MessagePackSerializer.Deserialize(body); + OnHostConnectivityUpdate(rsc.Status); + break; + // HistorianConnectivityStatus has no consumer on this Proxy today — drop. + // Response kinds never reach the event handler; the client routes those to + // their pending CallAsync TCS. + } + return Task.CompletedTask; + } + private GalaxyIpcClient RequireClient() => _client ?? throw new InvalidOperationException("Driver not initialized"); diff --git a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Ipc/GalaxyIpcClient.cs b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Ipc/GalaxyIpcClient.cs index b4b61bf..649aa1a 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Ipc/GalaxyIpcClient.cs +++ b/src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Ipc/GalaxyIpcClient.cs @@ -7,14 +7,35 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc; /// /// Client-side IPC channel to a running Driver.Galaxy.Host. Owns the data-plane pipe -/// connection and serializes request/response round-trips. One instance per session. +/// connection, serializes request/response round-trips, and routes unsolicited push frames +/// (, , +/// , , +/// ) to a handler supplied via +/// . One instance per session. /// +/// +/// A single background reader task owns the read side of the pipe. Calls are serialized by +/// , so at most one pending response is outstanding at a time — the +/// reader uses a single pending-response slot. Any frame that doesn't match the pending +/// expected kind (or ) is treated as a push event and +/// forwarded to the registered handler. Without this router, a push event arriving between +/// request and response would satisfy the caller's read and fail the next +/// with an "Expected X, got Y" error. +/// public sealed class GalaxyIpcClient : IAsyncDisposable { private readonly NamedPipeClientStream _stream; private readonly FrameReader _reader; private readonly FrameWriter _writer; - private readonly SemaphoreSlim _callGate = new(1, 1); + private readonly SemaphoreSlim _writeGate = new(1, 1); + private readonly CancellationTokenSource _readerCts = new(); + + private readonly object _pendingLock = new(); + private TaskCompletionSource<(MessageKind Kind, byte[] Body)>? _pending; + private MessageKind _pendingExpected; + + private Task? _readerTask; + private Func? _eventHandler; private GalaxyIpcClient(NamedPipeClientStream stream) { @@ -41,6 +62,9 @@ public sealed class GalaxyIpcClient : IAsyncDisposable await client._writer.WriteAsync(MessageKind.Hello, new Hello { PeerName = "Galaxy.Proxy", SharedSecret = sharedSecret }, ct); + // Hello/HelloAck is the one round-trip that runs inline before the reader loop + // starts — the Host expects its response-side write before accepting any other + // frames, so there's no push-event window to worry about here. var ack = await client._reader.ReadFrameAsync(ct); if (ack is null || ack.Value.Kind != MessageKind.HelloAck) throw new InvalidOperationException("Did not receive HelloAck from Galaxy.Host"); @@ -49,6 +73,7 @@ public sealed class GalaxyIpcClient : IAsyncDisposable if (!ackMsg.Accepted) throw new UnauthorizedAccessException($"Galaxy.Host rejected Hello: {ackMsg.RejectReason}"); + client._readerTask = Task.Run(() => client.ReadLoopAsync(client._readerCts.Token)); return client; } catch @@ -58,50 +83,155 @@ public sealed class GalaxyIpcClient : IAsyncDisposable } } - /// Round-trips a request and returns the first frame of the response. + /// + /// Register a handler that receives unsolicited push frames. Safe to call once per + /// session — typically during the driver's InitializeAsync right after + /// . The handler is invoked on the reader's thread-pool + /// task; it should not block. Exceptions thrown by the handler are swallowed so a + /// buggy event subscriber cannot kill the reader loop. + /// + public void SetEventHandler(Func handler) + => _eventHandler = handler ?? throw new ArgumentNullException(nameof(handler)); + + /// Round-trips a request and returns the deserialized response. public async Task CallAsync( MessageKind requestKind, TReq request, MessageKind expectedResponseKind, CancellationToken ct) { - await _callGate.WaitAsync(ct); + await _writeGate.WaitAsync(ct); + var tcs = new TaskCompletionSource<(MessageKind, byte[])>( + TaskCreationOptions.RunContinuationsAsynchronously); try { + lock (_pendingLock) + { + if (_pending is not null) + throw new InvalidOperationException( + "GalaxyIpcClient pending-response slot is not empty — call re-entry is a bug"); + _pending = tcs; + _pendingExpected = expectedResponseKind; + } + await _writer.WriteAsync(requestKind, request, ct); - var frame = await _reader.ReadFrameAsync(ct); - if (frame is null) throw new EndOfStreamException("IPC peer closed before response"); + using var reg = ct.Register(static s => + ((TaskCompletionSource<(MessageKind, byte[])>)s!).TrySetCanceled(), tcs); + var frame = await tcs.Task.ConfigureAwait(false); - if (frame.Value.Kind == MessageKind.ErrorResponse) + if (frame.Item1 == MessageKind.ErrorResponse) { - var err = MessagePackSerializer.Deserialize(frame.Value.Body); + var err = MessagePackSerializer.Deserialize(frame.Item2); throw new GalaxyIpcException(err.Code, err.Message); } - if (frame.Value.Kind != expectedResponseKind) - throw new InvalidOperationException( - $"Expected {expectedResponseKind}, got {frame.Value.Kind}"); - - return MessagePackSerializer.Deserialize(frame.Value.Body); + return MessagePackSerializer.Deserialize(frame.Item2); + } + finally + { + lock (_pendingLock) + { + if (ReferenceEquals(_pending, tcs)) _pending = null; + } + _writeGate.Release(); } - finally { _callGate.Release(); } } /// /// Fire-and-forget request — used for unsubscribe, alarm-ack, close-session, and other - /// calls where the protocol is one-way. The send is still serialized through the call + /// calls where the protocol is one-way. The send is still serialized through the write /// gate so it doesn't interleave a frame with a concurrent . /// public async Task SendOneWayAsync(MessageKind requestKind, TReq request, CancellationToken ct) { - await _callGate.WaitAsync(ct); + await _writeGate.WaitAsync(ct); try { await _writer.WriteAsync(requestKind, request, ct); } - finally { _callGate.Release(); } + finally { _writeGate.Release(); } + } + + private async Task ReadLoopAsync(CancellationToken ct) + { + try + { + while (!ct.IsCancellationRequested) + { + (MessageKind Kind, byte[] Body)? frame; + try + { + var read = await _reader.ReadFrameAsync(ct).ConfigureAwait(false); + frame = read is null ? null : (read.Value.Kind, read.Value.Body); + } + catch (OperationCanceledException) { break; } + catch (Exception ex) + { + FailPending(ex); + break; + } + + if (frame is null) + { + FailPending(new EndOfStreamException("IPC peer closed the pipe")); + break; + } + + // Route: response-ish frame to pending TCS if one is waiting, else treat as event. + // ErrorResponse always terminates a pending call — that's the Host signalling a + // request-scoped failure. Unsolicited ErrorResponse with no pending call shouldn't + // happen under a well-formed protocol; if it does, we drop it to the event channel + // so it shows up in logs rather than deadlocking the next CallAsync. + TaskCompletionSource<(MessageKind, byte[])>? pendingTcs = null; + lock (_pendingLock) + { + if (_pending is not null && (frame.Value.Kind == _pendingExpected + || frame.Value.Kind == MessageKind.ErrorResponse)) + { + pendingTcs = _pending; + _pending = null; + } + } + + if (pendingTcs is not null) + { + pendingTcs.TrySetResult(frame.Value); + continue; + } + + var handler = _eventHandler; + if (handler is null) continue; + + try { await handler(frame.Value.Kind, frame.Value.Body).ConfigureAwait(false); } + catch + { + // A buggy subscriber must not kill the reader. The handler is expected to + // do its own logging; swallowing here keeps the channel alive for the next + // frame + the next CallAsync. + } + } + } + finally + { + // Any still-pending call after the loop exits would otherwise hang forever. + FailPending(new EndOfStreamException("IPC reader loop exited")); + } + } + + private void FailPending(Exception ex) + { + TaskCompletionSource<(MessageKind, byte[])>? tcs; + lock (_pendingLock) { tcs = _pending; _pending = null; } + tcs?.TrySetException(ex); } public async ValueTask DisposeAsync() { - _callGate.Dispose(); + _readerCts.Cancel(); + if (_readerTask is not null) + { + try { await _readerTask.ConfigureAwait(false); } catch { /* shutdown */ } + } + + _writeGate.Dispose(); _reader.Dispose(); _writer.Dispose(); + _readerCts.Dispose(); await _stream.DisposeAsync(); } } diff --git a/src/ZB.MOM.WW.OtOpcUa.Server/Hosting/RedundancyPublisherHostedService.cs b/src/ZB.MOM.WW.OtOpcUa.Server/Hosting/RedundancyPublisherHostedService.cs index 73f06c8..8fbc9f7 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Server/Hosting/RedundancyPublisherHostedService.cs +++ b/src/ZB.MOM.WW.OtOpcUa.Server/Hosting/RedundancyPublisherHostedService.cs @@ -91,10 +91,21 @@ public sealed class RedundancyPublisherHostedService( // Bounded retry so a genuine failure to start doesn't pin the hosted service forever. // 60s is generous — production boot is ~2s on this box; cert PKI + certificate-creation // cases have been observed to take up to 15s cold. + // + // StandardServer.CurrentInstance throws BadServerHalted before OnServerStarted has run, + // rather than returning null, so we catch that specifically and retry. Other + // ServiceResultException codes (e.g. BadInternalError) are still propagated — a true + // boot failure shouldn't look like "not ready yet". var deadline = DateTime.UtcNow.AddSeconds(60); while (!ct.IsCancellationRequested && DateTime.UtcNow < deadline) { - var serverInternal = host.Server?.CurrentInstance; + Opc.Ua.Server.IServerInternal? serverInternal = null; + try { serverInternal = host.Server?.CurrentInstance; } + catch (Opc.Ua.ServiceResultException ex) when (ex.StatusCode == Opc.Ua.StatusCodes.BadServerHalted) + { + // Server is mid-startup — keep polling. + } + if (serverInternal?.ServerObject is not null) { var writerLogger = loggerFactory.CreateLogger(); diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs index 37b0912..4c40c3b 100644 --- a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs @@ -12,8 +12,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E; /// a connected for the tests. Per Phase 2 plan §"Stream E /// Parity Validation": the Proxy owns a session against a real out-of-process Host running /// the production-shape MxAccessGalaxyBackend backed by live ZB + MXAccess COM. -/// Skipped when the Host EXE isn't built, when ZB SQL is unreachable, or when the dev box -/// runs as Administrator (the IPC ACL explicitly denies Administrators per decision #76). +/// Skipped when the Host EXE isn't built or when ZB SQL is unreachable. /// public sealed class ParityFixture : IAsyncLifetime { @@ -26,7 +25,6 @@ public sealed class ParityFixture : IAsyncLifetime public async ValueTask InitializeAsync() { if (!OperatingSystem.IsWindows()) { SkipReason = "Windows-only"; return; } - if (IsAdministrator()) { SkipReason = "PipeAcl denies Administrators on dev shells"; return; } if (!await ZbReachableAsync()) { SkipReason = "Galaxy ZB SQL not reachable on localhost:1433"; return; } var hostExe = FindHostExe(); @@ -96,13 +94,6 @@ public sealed class ParityFixture : IAsyncLifetime Assert.Skip(SkipReason); } - private static bool IsAdministrator() - { - if (!OperatingSystem.IsWindows()) return false; - using var identity = WindowsIdentity.GetCurrent(); - return new WindowsPrincipal(identity).IsInRole(WindowsBuiltInRole.Administrator); - } - private static async Task ZbReachableAsync() { try diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/EndToEndIpcTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/EndToEndIpcTests.cs index 4ec2dce..760ed89 100644 --- a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/EndToEndIpcTests.cs +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/EndToEndIpcTests.cs @@ -28,12 +28,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Trait("Category", "Integration")] public sealed class EndToEndIpcTests { - private static bool IsAdministrator() - { - using var identity = WindowsIdentity.GetCurrent(); - return new WindowsPrincipal(identity).IsInRole(WindowsBuiltInRole.Administrator); - } - private sealed class TestStack : IDisposable { public PipeServer Server = null!; @@ -102,7 +96,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Fact] public async Task OpenSession_succeeds_with_an_assigned_session_id() { - if (IsAdministrator()) return; using var s = await StartAsync(); var resp = await RoundTripAsync( @@ -117,7 +110,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Fact] public async Task Discover_against_stub_returns_an_error_response() { - if (IsAdministrator()) return; using var s = await StartAsync(); var resp = await RoundTripAsync( @@ -132,7 +124,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Fact] public async Task WriteValues_returns_per_tag_BadInternalError_status() { - if (IsAdministrator()) return; using var s = await StartAsync(); var resp = await RoundTripAsync( @@ -151,7 +142,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Fact] public async Task Subscribe_returns_a_subscription_id() { - if (IsAdministrator()) return; using var s = await StartAsync(); var sub = await RoundTripAsync( @@ -166,7 +156,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Fact] public async Task Recycle_returns_the_grace_window_from_the_backend() { - if (IsAdministrator()) return; using var s = await StartAsync(); var resp = await RoundTripAsync( diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/IpcHandshakeIntegrationTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/IpcHandshakeIntegrationTests.cs index 3f1d263..614ffa1 100644 --- a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/IpcHandshakeIntegrationTests.cs +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/IpcHandshakeIntegrationTests.cs @@ -22,17 +22,10 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests /// net48 x86 alongside the Host (the Proxy's GalaxyIpcClient is net10 only and /// cannot be loaded into this process). Functionally equivalent to going through /// GalaxyIpcClient — proves the wire protocol + ACL + shared-secret enforcement. - /// Skipped on Administrator shells per the same PipeAcl-denies-Administrators guard. /// [Trait("Category", "Integration")] public sealed class IpcHandshakeIntegrationTests { - private static bool IsAdministrator() - { - using var identity = WindowsIdentity.GetCurrent(); - return new WindowsPrincipal(identity).IsInRole(WindowsBuiltInRole.Administrator); - } - private static async Task<(NamedPipeClientStream Stream, FrameReader Reader, FrameWriter Writer)> ConnectAndHelloAsync(string pipeName, string secret, CancellationToken ct) { @@ -56,8 +49,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Fact] public async Task Handshake_with_correct_secret_succeeds_and_heartbeat_round_trips() { - if (IsAdministrator()) return; - using var identity = WindowsIdentity.GetCurrent(); var sid = identity.User!; var pipe = $"OtOpcUaGalaxyTest-{Guid.NewGuid():N}"; @@ -91,8 +82,6 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests [Fact] public async Task Handshake_with_wrong_secret_is_rejected() { - if (IsAdministrator()) return; - using var identity = WindowsIdentity.GetCurrent(); var sid = identity.User!; var pipe = $"OtOpcUaGalaxyTest-{Guid.NewGuid():N}"; diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/GalaxyIpcClientRoutingTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/GalaxyIpcClientRoutingTests.cs new file mode 100644 index 0000000..c6235b3 --- /dev/null +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/GalaxyIpcClientRoutingTests.cs @@ -0,0 +1,209 @@ +using System.IO.Pipes; +using MessagePack; +using Shouldly; +using Xunit; +using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc; +using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared; +using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts; + +namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests; + +/// +/// Exercises the single-pending-slot router in : request/response +/// matching, handling, and routing of unsolicited push +/// frames (e.g. ) arriving between a request and +/// its response. Without the router, a push event interleaved with a call would be consumed +/// as the response and the next would +/// fail with an "Expected X, got Y" mismatch — the bug that blocked task #112's live Galaxy +/// E2E on the dev box. +/// +[Trait("Category", "Unit")] +public sealed class GalaxyIpcClientRoutingTests +{ + private const string Secret = "routing-suite-secret"; + + [Fact] + public async Task Response_matching_expected_kind_completes_the_call() + { + var (pipe, serverStream, clientTask) = await StartPairAsync(); + + using (serverStream) + await using (var client = await clientTask) + { + using var reader = new FrameReader(serverStream, leaveOpen: true); + using var writer = new FrameWriter(serverStream, leaveOpen: true); + + var callTask = client.CallAsync( + MessageKind.OpenSessionRequest, + new OpenSessionRequest { DriverInstanceId = "t", DriverConfigJson = "{}" }, + MessageKind.OpenSessionResponse, + CancellationToken.None); + + var request = await reader.ReadFrameAsync(CancellationToken.None); + request!.Value.Kind.ShouldBe(MessageKind.OpenSessionRequest); + + await writer.WriteAsync(MessageKind.OpenSessionResponse, + new OpenSessionResponse { Success = true, SessionId = 42 }, + CancellationToken.None); + + var response = await callTask.WaitAsync(TimeSpan.FromSeconds(2)); + response.Success.ShouldBeTrue(); + response.SessionId.ShouldBe(42); + } + } + + [Fact] + public async Task ErrorResponse_throws_GalaxyIpcException_regardless_of_expected_kind() + { + var (pipe, serverStream, clientTask) = await StartPairAsync(); + + using (serverStream) + await using (var client = await clientTask) + { + using var reader = new FrameReader(serverStream, leaveOpen: true); + using var writer = new FrameWriter(serverStream, leaveOpen: true); + + var callTask = client.CallAsync( + MessageKind.OpenSessionRequest, + new OpenSessionRequest { DriverInstanceId = "t", DriverConfigJson = "{}" }, + MessageKind.OpenSessionResponse, + CancellationToken.None); + + await reader.ReadFrameAsync(CancellationToken.None); + await writer.WriteAsync(MessageKind.ErrorResponse, + new ErrorResponse { Code = "bad-request", Message = "malformed" }, + CancellationToken.None); + + var ex = await Should.ThrowAsync(() => callTask.WaitAsync(TimeSpan.FromSeconds(2))); + ex.Code.ShouldBe("bad-request"); + ex.Message.ShouldContain("malformed"); + } + } + + [Fact] + public async Task Unsolicited_event_between_request_and_response_routes_to_handler_not_the_call() + { + var (pipe, serverStream, clientTask) = await StartPairAsync(); + + using (serverStream) + await using (var client = await clientTask) + { + var eventFrames = new List<(MessageKind Kind, byte[] Body)>(); + var eventReceived = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); + client.SetEventHandler((k, body) => + { + eventFrames.Add((k, body)); + if (k == MessageKind.RuntimeStatusChange) eventReceived.TrySetResult(true); + return Task.CompletedTask; + }); + + using var reader = new FrameReader(serverStream, leaveOpen: true); + using var writer = new FrameWriter(serverStream, leaveOpen: true); + + var callTask = client.CallAsync( + MessageKind.OpenSessionRequest, + new OpenSessionRequest { DriverInstanceId = "t", DriverConfigJson = "{}" }, + MessageKind.OpenSessionResponse, + CancellationToken.None); + + await reader.ReadFrameAsync(CancellationToken.None); + + // Push event lands first — the bug this test guards against is CallAsync consuming + // this frame as the response and failing with "Expected X, got Y". + await writer.WriteAsync(MessageKind.RuntimeStatusChange, + new RuntimeStatusChangeNotification + { + Status = new HostConnectivityStatus + { + HostName = "host-a", RuntimeStatus = "Running", LastObservedUtcUnixMs = 1, + }, + }, CancellationToken.None); + + await writer.WriteAsync(MessageKind.OpenSessionResponse, + new OpenSessionResponse { Success = true, SessionId = 7 }, + CancellationToken.None); + + var response = await callTask.WaitAsync(TimeSpan.FromSeconds(2)); + response.SessionId.ShouldBe(7); + + await eventReceived.Task.WaitAsync(TimeSpan.FromSeconds(2)); + var runtime = eventFrames.ShouldHaveSingleItem(); + runtime.Kind.ShouldBe(MessageKind.RuntimeStatusChange); + var decoded = MessagePackSerializer.Deserialize(runtime.Body); + decoded.Status.HostName.ShouldBe("host-a"); + } + } + + [Fact] + public async Task Idle_push_event_with_no_pending_call_still_reaches_handler() + { + var (pipe, serverStream, clientTask) = await StartPairAsync(); + + using (serverStream) + await using (var client = await clientTask) + { + var received = new TaskCompletionSource<(MessageKind, byte[])>(TaskCreationOptions.RunContinuationsAsynchronously); + client.SetEventHandler((k, body) => { received.TrySetResult((k, body)); return Task.CompletedTask; }); + + using var writer = new FrameWriter(serverStream, leaveOpen: true); + await writer.WriteAsync(MessageKind.HostConnectivityStatus, + new HostConnectivityStatus { HostName = "h", RuntimeStatus = "Running", LastObservedUtcUnixMs = 1 }, + CancellationToken.None); + + var (kind, _) = await received.Task.WaitAsync(TimeSpan.FromSeconds(2)); + kind.ShouldBe(MessageKind.HostConnectivityStatus); + } + } + + [Fact] + public async Task Peer_closing_pipe_during_pending_call_surfaces_as_EndOfStream() + { + var (pipe, serverStream, clientTask) = await StartPairAsync(); + + await using var client = await clientTask; + + using var reader = new FrameReader(serverStream, leaveOpen: true); + + var callTask = client.CallAsync( + MessageKind.OpenSessionRequest, + new OpenSessionRequest { DriverInstanceId = "t", DriverConfigJson = "{}" }, + MessageKind.OpenSessionResponse, + CancellationToken.None); + + await reader.ReadFrameAsync(CancellationToken.None); + serverStream.Dispose(); + + await Should.ThrowAsync(() => callTask.WaitAsync(TimeSpan.FromSeconds(2))); + } + + // ---- test harness ---------------------------------------------------- + + private static async Task<(string PipeName, NamedPipeServerStream Server, Task Client)> StartPairAsync() + { + var pipeName = $"GalaxyIpcRouting-{Guid.NewGuid():N}"; + var serverStream = new NamedPipeServerStream( + pipeName, PipeDirection.InOut, maxNumberOfServerInstances: 1, + PipeTransmissionMode.Byte, PipeOptions.Asynchronous); + + // Drive a Hello/HelloAck handshake on a background task so the client's ConnectAsync + // can complete. After the handshake the test owns the stream for manual framing. + var acceptTask = Task.Run(async () => + { + await serverStream.WaitForConnectionAsync(); + using var reader = new FrameReader(serverStream, leaveOpen: true); + using var writer = new FrameWriter(serverStream, leaveOpen: true); + + var hello = await reader.ReadFrameAsync(CancellationToken.None); + if (hello is null || hello.Value.Kind != MessageKind.Hello) + throw new InvalidOperationException("expected Hello first"); + + await writer.WriteAsync(MessageKind.HelloAck, + new HelloAck { Accepted = true, HostName = "test-host" }, + CancellationToken.None); + }); + + var clientTask = GalaxyIpcClient.ConnectAsync(pipeName, Secret, TimeSpan.FromSeconds(5), CancellationToken.None); + await acceptTask; + return (pipeName, serverStream, clientTask); + } +} diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/HostSubprocessParityTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/HostSubprocessParityTests.cs index c9e8fe1..f45add4 100644 --- a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/HostSubprocessParityTests.cs +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/HostSubprocessParityTests.cs @@ -50,13 +50,6 @@ public sealed class HostSubprocessParityTests : IDisposable return File.Exists(candidate) ? candidate : null; } - private static bool IsAdministrator() - { - if (!OperatingSystem.IsWindows()) return false; - using var identity = WindowsIdentity.GetCurrent(); - return new WindowsPrincipal(identity).IsInRole(WindowsBuiltInRole.Administrator); - } - private static async Task ZbReachableAsync() { try @@ -71,7 +64,7 @@ public sealed class HostSubprocessParityTests : IDisposable [Fact] public async Task Spawned_Host_in_db_mode_lets_Proxy_Discover_real_Galaxy_gobjects() { - if (!OperatingSystem.IsWindows() || IsAdministrator()) return; + if (!OperatingSystem.IsWindows()) return; if (!await ZbReachableAsync()) return; var hostExe = FindHostExe(); diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/LiveStack/LiveStackFixture.cs b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/LiveStack/LiveStackFixture.cs index 2915811..0565c8e 100644 --- a/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/LiveStack/LiveStackFixture.cs +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/LiveStack/LiveStackFixture.cs @@ -1,8 +1,3 @@ -using System.Runtime.InteropServices; -using System.Runtime.Versioning; -using System.Security.Principal; -using System.Threading; -using System.Threading.Tasks; using Xunit; using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport; @@ -43,25 +38,6 @@ public sealed class LiveStackFixture : IAsyncLifetime public async ValueTask InitializeAsync() { - // 0. Elevated-shell short-circuit. The OtOpcUaGalaxyHost pipe ACL allows the configured - // SID but explicitly DENIES Administrators (decision #76 — production hardening). - // A test process running with a high-integrity token (any elevated shell) carries the - // Admins group in its security context, so the deny rule trumps the user's allow and - // the pipe connect returns UnauthorizedAccessException — technically correct but - // the operationally confusing failure mode that ate most of the PR 37 install - // debugging session. Surfacing it explicitly here saves the next operator the same - // five-step diagnosis. ParityFixture has the same skip with the same rationale. - if (IsElevatedAdministratorOnWindows()) - { - SkipReason = - "Test host is running with elevated (Administrators) privileges, but the " + - "OtOpcUaGalaxyHost named-pipe ACL explicitly denies Administrators per the IPC " + - "security design (decision #76 / PipeAcl.cs). Re-run from a NORMAL (non-admin) " + - "PowerShell window — even when your user is already in the pipe's allow list, " + - "the elevated token's Admins group membership trumps the allow rule."; - return; - } - // 1. AVEVA + OtOpcUa service state — actionable diagnostic if anything is missing. using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10)); PrerequisiteReport = await AvevaPrerequisites.CheckAllAsync( @@ -134,27 +110,6 @@ public sealed class LiveStackFixture : IAsyncLifetime if (SkipReason is not null) Assert.Skip(SkipReason); } - private static bool IsElevatedAdministratorOnWindows() - { - if (!RuntimeInformation.IsOSPlatform(OSPlatform.Windows)) return false; - return CheckWindowsAdminToken(); - } - - [SupportedOSPlatform("windows")] - private static bool CheckWindowsAdminToken() - { - try - { - using var identity = WindowsIdentity.GetCurrent(); - return new WindowsPrincipal(identity).IsInRole(WindowsBuiltInRole.Administrator); - } - catch - { - // Probe shouldn't crash the test; if we can't determine elevation, optimistically - // continue and let the actual pipe connect surface its own error. - return false; - } - } } [CollectionDefinition(Name)]