diff --git a/docs/plans/2026-06-12-historian-tcp-transport-design.md b/docs/plans/2026-06-12-historian-tcp-transport-design.md new file mode 100644 index 00000000..6d0c4e21 --- /dev/null +++ b/docs/plans/2026-06-12-historian-tcp-transport-design.md @@ -0,0 +1,176 @@ +# Wonderware Historian Sidecar — TCP Transport Design + +**Date:** 2026-06-12 +**Status:** Approved (design); implementation plan to follow. + +## Goal + +Replace the Wonderware Historian sidecar's **local named-pipe IPC** with a +**TCP transport** so a remote OtOpcUa host (e.g. the dev server running in +Linux Docker on a MacBook) can reach the net48 sidecar on the Windows +Historian VM. Today the IPC is a local, Windows-SID-gated named pipe, so the +only possible consumer is a Windows OtOpcUa process on the same machine; once +the host moves off the VM the sidecar is orphaned. + +## Why not gRPC (the "like mxaccessgw" question) + +mxaccessgw uses gRPC/HTTP2, but it can do so only because it is split into a +**net10 Server** (Kestrel/`Grpc.AspNetCore`) + a **net48 Worker**. The +Historian sidecar must stay **net48** (the AVEVA `aahClientManaged` + native +`aahClient.dll` SDK is .NET Framework 4.8), and **net48 cannot host +Kestrel/`Grpc.AspNetCore`**. The only gRPC-on-net48 option is the **EOL +`Grpc.Core` C-core** library, or adding a second net10 front process. + +Decision: **plain TCP reusing the existing MessagePack frame protocol.** The +protocol is 5 unary request/reply ops (`ReadRaw`, `ReadProcessed`, +`ReadAtTime`, `ReadEvents`, `WriteAlarmEvents`) + a `Hello` handshake — no +streaming — and is already abstracted behind a `Stream` on both ends, so a TCP +swap is small, native to net48, depends on no EOL libraries, and reuses every +contract. + +## Locked decisions + +| Decision | Choice | +|---|---| +| Transport | **TCP only** — named pipe fully removed | +| Concurrency | **Single active connection, serial accept** (mirrors today's pipe `maxInstances:1`) | +| Caller auth | **Shared-secret `Hello`**, required in every mode (replaces the SID ACL) | +| Transport security | **TLS optional** — plaintext in dev, TLS in prod (config-driven) | +| mTLS / client-cert | Out of scope now; future hardening follow-up | + +## Architecture + +``` +Before: After: + OtOpcUa host (same VM, Windows) OtOpcUa host (anywhere, .NET 10) + | named pipe (local, SID-gated) | TCP (+ optional TLS), MessagePack frames + v v + Sidecar (net48) PipeServer Sidecar (net48) TcpFrameServer +``` + +Everything above the socket is unchanged: `Hello`/`HelloAck` handshake, +length-prefixed `MessageKind` framing, MessagePack DTOs, `FrameReader`/ +`FrameWriter` (they operate on `Stream`; `NetworkStream`/`SslStream` are +`Stream`s), `HistorianFrameHandler` dispatch, and the AVEVA SDK backends +(`HistorianDataSource` reads, `SdkAlarmHistorianWriteBackend` writes). + +## Detailed design + +### Server (net48 sidecar — `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/`) + +- **New `Ipc/TcpFrameServer.cs`** replaces `Ipc/PipeServer.cs`: + - `TcpListener` bound to `:`; single-active **serial** accept + loop (accept the next connection only after the current disconnects). + - Per connection: if TLS enabled, wrap `NetworkStream` in `SslStream` and + `AuthenticateAsServer(cert)`; else use the raw `NetworkStream`. + - Run the **existing** `Hello` handshake (verify shared secret; reject with a + `HelloAck{Accepted=false}` on mismatch / major-version mismatch) → then the + existing `reader.ReadFrameAsync` → `handler.HandleAsync` → `writer.WriteAsync` + loop. + - Keep the `RunAsync` backoff (`250ms…8s`) + `MaxConsecutiveFailures=20`→throw + behavior so `Program.Main`'s exit-2 + NSSM restart semantics are identical. +- **Remove:** `PipeServer`, `PipeAcl`, `VerifyCaller`/`CallerVerifier` + (Windows-pipe-only), the `OTOPCUA_ALLOWED_SID` env + `SecurityIdentifier`. +- **`Program.cs`:** swap `new PipeServer(pipeName, allowedSid, sharedSecret, …)` + (line ~62) for `new TcpFrameServer(bind, port, sharedSecret, tlsCert?, …)`; + drop the SID read; keep `OTOPCUA_HISTORIAN_ENABLED` (pipe-only-idle behavior + becomes "tcp-only-idle" / listen-but-SDK-disabled, semantics preserved). +- **New env vars:** `OTOPCUA_HISTORIAN_TCP_PORT` (default e.g. 32569), + `OTOPCUA_HISTORIAN_BIND` (default `0.0.0.0`), `OTOPCUA_HISTORIAN_TLS_ENABLED` + (default `false`), `OTOPCUA_HISTORIAN_TLS_CERT` (pfx path or cert-store + thumbprint), `OTOPCUA_HISTORIAN_TLS_CERT_PASSWORD`. Keep + `OTOPCUA_HISTORIAN_SECRET`, `OTOPCUA_HISTORIAN_ENABLED`, and the SDK vars + (`OTOPCUA_HISTORIAN_SERVER`/`PORT`/…). + +### Client (.NET 10 — `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/`) + +- **`Internal/PipeChannel.cs` → `Internal/FrameChannel.cs`** (cosmetic rename; + already transport-agnostic). Add **`DefaultTcpConnectFactory`**: + `TcpClient.ConnectAsync(host, port, ct)` → if TLS, + `SslStream.AuthenticateAsClientAsync` (validate server cert: thumbprint-pin + *or* CA-chain per config; skip in plaintext mode) → return the stream. The + `FrameReader`/`FrameWriter`/`Hello`/MessagePack layer is reused unchanged. +- **`WonderwareHistorianClient.cs`:** default ctor switches to the TCP factory; + the injectable `connect`-func ctor stays (used by tests). + +### Options + host wiring + +- **`…Client.Contracts/WonderwareHistorianClientOptions.cs`:** replace + `PipeName` with `Host` + `Port`; add `UseTls` and `ServerCertThumbprint` + (optional pin) / validation mode. Keep `SharedSecret`, `PeerName`, + `ConnectTimeout`, `CallTimeout`, `ProbeTimeoutSeconds`. +- **`Historian:Wonderware` appsettings:** `Host`/`Port`/`UseTls`/ + `ServerCertThumbprint` replace `PipeName`. Bound where the client is built: + `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs`, + `src/Server/ZB.MOM.WW.OtOpcUa.Host/Drivers/DriverFactoryBootstrap.cs`, + `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/AlarmHistorianOptions.cs`. +- **AdminUI Test-Connect probe:** + `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/Pickers/HistorianWonderwareAddressBuilder.cs` + + the probe path updated to host/port/TLS. + +## Security model + +- **Caller auth:** shared-secret `Hello`, required in all modes — this is the + replacement for the named pipe's Windows-SID ACL. +- **Transport:** TLS optional, config-driven. Dev = plaintext (`UseTls=false`). + Prod = server cert; client pins the thumbprint *or* validates the CA chain + (both supported). Server cert can live in the existing + `C:\ProgramData\OtOpcUa\pki`. +- **Network exposure:** a new inbound port requires a **Windows Firewall rule** + on the VM. Bind to a specific management NIC instead of `0.0.0.0` to scope + exposure. +- **mTLS** (client-cert auth) is a future follow-up; the shared secret covers + caller authentication for now. + +## Deployment + +- `scripts/install/Install-Services.ps1` / `Refresh-Services.ps1`: swap the + historian service env block (drop `OTOPCUA_ALLOWED_SID`, add + `OTOPCUA_HISTORIAN_TCP_PORT` + `OTOPCUA_HISTORIAN_TLS_*`), add an inbound + firewall rule for the port, and provision the server cert (prod). The Step 4b + deploy-completeness assertion stays as-is. + +## Testing (no live sign-in by the agent) + +- **Reuse** the existing byte-parity / round-trip contract tests (contracts + unchanged). +- **New unit/integration (xUnit + Shouldly):** TCP connect factory; self-signed + TLS loopback handshake; **end-to-end loopback** (`TcpFrameServer` + client + over `127.0.0.1`, both plaintext and TLS); `Hello`-reject on bad shared + secret over TCP; single-active-connection serial-accept behavior. +- **Live (user-driven):** MacBook OtOpcUa → VM sidecar over TCP — dev plaintext + first: `ReadRaw` returns live samples + a `WriteAlarmEvents` round-trips; then + flip `UseTls=true` and re-verify. Open the VM firewall port. Done = build + clean + `dotnet test` green + live read/write pass. + +## Rollout / migration + +- The pipe is fully replaced, so **both ends move together** (no mixed + pipe/TCP). The protocol above the socket is byte-identical, so this is a + transport swap, not a contract change. Sequence: deploy the TCP sidecar + (+firewall +env), then the TCP-client host, with the same shared secret. + +## Open items / follow-ups (not blockers) + +- mTLS / client-cert auth (hardening). +- Optional: bind-NIC scoping vs `0.0.0.0`. +- Same-machine deploys now use loopback TCP (`127.0.0.1:`) instead of a + pipe — expected given full replacement. + +## Touched code (authoritative file list) + +- Sidecar: `Ipc/TcpFrameServer.cs` (new, replaces `Ipc/PipeServer.cs`), remove + `Ipc/PipeAcl.cs`, `Program.cs`. +- Client: `Internal/FrameChannel.cs` (rename of `PipeChannel.cs` + TCP factory), + `WonderwareHistorianClient.cs`. +- Contracts: `WonderwareHistorianClientOptions.cs`. +- Host: `Runtime/ServiceCollectionExtensions.cs`, + `Host/Drivers/DriverFactoryBootstrap.cs`, + `Runtime/Historian/AlarmHistorianOptions.cs`, + `AdminUI/.../Pickers/HistorianWonderwareAddressBuilder.cs`. +- Deploy: `scripts/install/Install-Services.ps1`, + `scripts/install/Refresh-Services.ps1`. +- Docs: `docs/drivers/Historian.Wonderware.md`, `docs/ServiceHosting.md`, + `docs/AlarmHistorian.md`. +- Unchanged (reused): both `Ipc/Contracts.cs`, `Ipc/Framing.cs`, + `Ipc/FrameReader.cs`, `Ipc/HistorianFrameHandler.cs`, the AVEVA SDK backends.