Galaxy IPC unblock — live dev-box E2E path

Three root-cause fixes to get an elevated dev-box shell past session open
through to real MXAccess reads:

1. PipeAcl — drop BUILTIN\Administrators deny ACE. UAC's filtered token
   carries the Admins SID as deny-only, so the deny fired even from
   non-elevated admin-account shells. The per-connection SID check in
   PipeServer.VerifyCaller remains the real authorization boundary.

2. PipeServer — swap the Hello-read / VerifyCaller order. ImpersonateNamedPipeClient
   returns ERROR_CANNOT_IMPERSONATE until at least one frame has been read
   from the pipe; reading Hello first satisfies that rule. Previously the
   ACL deny-first path masked this race — removing the deny ACE exposed it.

3. GalaxyIpcClient — add a background reader + single pending-response
   slot. A RuntimeStatusChange event between OpenSessionRequest and
   OpenSessionResponse used to satisfy the caller's single ReadFrameAsync
   and fail CallAsync with "Expected OpenSessionResponse, got
   RuntimeStatusChange". The reader now routes response kinds (and
   ErrorResponse) to the pending TCS and everything else to a handler the
   driver registers in InitializeAsync. The Proxy was already set up to
   raise managed events from RaiseDataChange / RaiseAlarmEvent /
   OnHostConnectivityUpdate — those helpers had no caller until now.

4. RedundancyPublisherHostedService — swallow BadServerHalted while
   polling host.Server.CurrentInstance. StandardServer throws that code
   during startup rather than returning null, so the first poll attempt
   crashed the BackgroundService (and the host) before OnServerStarted
   ran. This race was latent behind the Galaxy init failure above.

Updates docs that described the Admins deny ACE + mandatory non-elevated
shells, and drops the admin-skip guards from every Galaxy integration +
E2E fixture that had them (IpcHandshakeIntegrationTests, EndToEndIpcTests,
ParityFixture, LiveStackFixture, HostSubprocessParityTests).

Adds GalaxyIpcClientRoutingTests covering the router's
request/response match, ErrorResponse, event-between-call, idle event,
and peer-close paths.

Verified live on the dev box against the p7-smoke cluster (gen 6):
driver registered=1 failedInit=0, Phase 7 bridge subscribed, OPC UA
server up on 4840, MXAccess read round-trip returns real data with
Status=0x00000000.

Task #112 — partial: Galaxy live stack is functional end-to-end. The
supplied test-galaxy.ps1 script still fails because the UNS walker
encodes TagConfig JSON as the tag's NodeId instead of the seeded TagId
(pre-existing; separate issue from this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-24 16:30:16 -04:00
parent fb6dd3478d
commit d11dd0520b
17 changed files with 443 additions and 130 deletions

View File

@@ -83,7 +83,7 @@ The host spins up `StaPump` (the STA thread with message pump), creates the MXAc
### Pipe security
`PipeServer` builds a `PipeAcl` from the provided `SecurityIdentifier` + uses `NamedPipeServerStream` with `maxNumberOfServerInstances: 1`. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match `OTOPCUA_ALLOWED_SID` are rejected before any frame is processed. **By design the pipe ACL denies BUILTIN\Administrators** — live smoke tests must therefore run from a non-elevated shell that matches the allowed principal. The installed dev host (`OtOpcUaGalaxyHost`) runs as `dohertj2` with the secret at `.local/galaxy-host-secret.txt`.
`PipeServer` builds a `PipeAcl` from the provided `SecurityIdentifier` + uses `NamedPipeServerStream` with `maxNumberOfServerInstances: 1`. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match `OTOPCUA_ALLOWED_SID` are rejected before any frame is processed via `NamedPipeServerStream.RunAsClient` + a SID comparison against the configured allow list. The DACL grants `ReadWrite | Synchronize` only to the allowed SID and denies `LocalSystem`. The installed dev host (`OtOpcUaGalaxyHost`) runs as `dohertj2` with the secret at `.local/galaxy-host-secret.txt`.
### Installation

View File

@@ -35,13 +35,14 @@ Multi-project test topology:
## How tests skip
- **E2E parity**: `ParityFixture.SkipIfUnavailable()` runs at class init and
checks Windows-only, non-admin user, ZB SQL reachable on
`localhost:1433`, Host EXE built in the expected `bin/` folder. Any miss
→ tests skip.
checks Windows-only, ZB SQL reachable on `localhost:1433`, Host EXE built
in the expected `bin/` folder. Any miss → tests skip.
- **Live-smoke** (`GalaxyRepositoryLiveSmokeTests`): `Assert.Skip` when ZB
unreachable. A `per project_galaxy_host_installed` memory on this repo's
dev box notes the MXAccess runtime is installed + pipe ACL denies Admins,
so live tests must run from a non-elevated shell.
dev box notes the MXAccess runtime is installed. The pipe ACL allows the
configured SID outright; elevation of the caller doesn't matter because
the per-connection SID check in `PipeServer.VerifyCaller` only compares
user SIDs (not group membership or integrity level).
- **Unit** tests (Shared, Proxy contract, most Host.Tests) have no skip —
they run anywhere.

View File

@@ -31,7 +31,7 @@ The same Tier-C isolation story applies to FOCAS (decision record in `docs/v2/pl
- Pipe name: `otopcua-galaxy-{DriverInstanceId}` (localhost-only, no TCP surface)
- Wire format: MessagePack-CSharp, length-prefixed frames
- ACL: pipe is created with a DACL that grants only the Server's service identity; the Admins group is explicitly denied so a live-smoke test running from an elevated shell fails fast rather than silently bypassing the handshake
- ACL: pipe is created with a DACL that grants `ReadWrite | Synchronize` only to the configured Server service-principal SID + denies `LocalSystem`. The per-connection SID check in `PipeServer.VerifyCaller` is the real authorization boundary — any caller whose impersonated token SID doesn't match the allowed SID is dropped before the first frame is read.
- Handshake: Proxy presents a shared secret at `OpenSessionRequest`; Host rejects anything else with `MessageKind.OpenSessionResponse{Success=false}`
- Heartbeat: Proxy sends a periodic ping; missed heartbeats trigger the Proxy-side crash-loop supervisor to restart the Host

View File

@@ -174,7 +174,7 @@ Common contract for the proxy in the main server:
Named pipes default to allowing connections from any local user. Without explicit ACLs, any process on the host machine that knows the pipe name could connect, bypass the OPC UA server's authentication and authorization layers, and issue reads, writes, or alarm acknowledgments directly against the driver host. **This is a real privilege-escalation surface** — a service account with no OPC UA permissions could write field values it should never have access to. Every Tier C driver enforces the following:
1. **Pipe ACL**: the host creates the pipe with a `PipeSecurity` ACL that grants `ReadWrite | Synchronize` only to the OtOpcUa server's service principal SID. All other local users — including LocalSystem and Administrators — are explicitly denied. The ACL is set at pipe-creation time so it's atomic with the pipe being listenable.
1. **Pipe ACL**: the host creates the pipe with a `PipeSecurity` ACL that grants `ReadWrite | Synchronize` only to the OtOpcUa server's service principal SID. `LocalSystem` is explicitly denied. The ACL is set at pipe-creation time so it's atomic with the pipe being listenable. Administrators are **not** added to the deny list — UAC's filtered token carries the Admins group SID as deny-only, so a deny ACE on Administrators would fire even for non-elevated callers whose user account happens to be a member (common on dev boxes). The per-connection SID check in §2 remains the authorization boundary.
2. **Caller identity verification**: on each new pipe connection, the host calls `NamedPipeServerStream.GetImpersonationUserName()` (or impersonates and inspects the token) and verifies the connected client's SID matches the configured server service SID. Mismatches are logged and the connection is dropped before any RPC frame is read.
3. **Per-message authorization context**: every RPC frame includes the operation's authenticated OPC UA principal (forwarded by the Core after it has done its own authn/authz). The host treats this as input only — the driver-level authorization (e.g. "is this principal allowed to write Tune attributes?") is performed by the Core, but the host's own audit log records the principal so post-incident attribution is possible.
4. **No anonymous endpoints**: the heartbeat pipe has the same ACL as the data-plane pipe. There are no "open" pipes a generic client can probe.

View File

@@ -172,7 +172,7 @@ Lift the existing `GalaxyRuntimeProbeManager` into the new project. Behaviors pe
#### Task B.6 — Named-pipe IPC server with mandatory ACL
Per decision #76 + `driver-stability.md` §"IPC Security":
- Pipe ACL on creation: `ReadWrite | Synchronize` granted only to the OtOpcUa server's service principal SID; LocalSystem and Administrators **explicitly denied**
- Pipe ACL on creation: `ReadWrite | Synchronize` granted only to the OtOpcUa server's service principal SID; LocalSystem **explicitly denied**. Administrators was dropped from the deny list so non-elevated admins on dev boxes aren't blocked via UAC-filtered-token deny-only semantics — the per-connection SID check (§2 of driver-stability.md) remains the real authorization boundary.
- Caller identity verification on each new connection: `GetImpersonationUserName()` cross-checked against configured server service SID; mismatches dropped before any RPC frame is read
- Per-process shared secret: passed by the supervisor at spawn time, required on first frame of every connection
- Heartbeat pipe: separate from data-plane pipe, same ACL

View File

@@ -14,7 +14,7 @@ End-to-end validation that the Phase 7 production wiring chain (#243 / #244 / #2
| SQL Server reachable, `OtOpcUaConfig` DB exists with all migrations applied | `sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "..." -Q "SELECT COUNT(*) FROM dbo.__EFMigrationsHistory"` returns ≥ 11 |
| Server's `appsettings.json` `Node:ConfigDbConnectionString` matches your SQL Server | `cat src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` |
> **Galaxy.Host pipe ACL.** Per `docs/ServiceHosting.md`, the pipe ACL deliberately denies `BUILTIN\Administrators`. **Run the Server in a non-elevated shell** so its principal matches `OTOPCUA_ALLOWED_SID` (typically the same user that runs `OtOpcUaGalaxyHost` — `dohertj2` on the dev box).
> **Galaxy.Host pipe ACL.** The pipe allows the configured `OTOPCUA_ALLOWED_SID` (typically the user that runs `OtOpcUaGalaxyHost` — `dohertj2` on the dev box). Run the Server under the same user; elevation doesn't matter — `PipeAcl.cs` no longer denies `BUILTIN\Administrators` since UAC's deny-only Admins SID would have blocked non-elevated dev-box admins too.
## Setup
@@ -56,7 +56,7 @@ The seed creates one each of: `ServerCluster`, `ClusterNode`, `ConfigGeneration`
## Run
### 5. Start the Server (non-elevated shell)
### 5. Start the Server
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server
@@ -120,7 +120,7 @@ Open the Historian Client (or InTouch alarm summary) — the `OverTemp` activati
- [ ] EF migrations applied through `20260420232000_ExtendComputeGenerationDiffWithPhase7`
- [ ] Smoke seed completes without errors + creates exactly 1 Published generation
- [ ] Server starts in non-elevated shell + logs the Phase 7 composition lines
- [ ] Server starts + logs the Phase 7 composition lines
- [ ] Client.CLI browse shows the UNS tree with Source / Doubled / OverTemp under reactor-1
- [ ] Read on `Doubled` returns `2 × Source` value
- [ ] Read on `OverTemp` returns the live boolean truth of `Source > 50`