Audit (three parallel agent passes) found 43 markdown files carrying stale references to the deleted Galaxy.Host/Proxy/Shared projects after the v2-mxgw merge. This commit lands the prioritized fixes. Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted) - README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install text; leads with the multi-driver .NET 10 server identity and points at scripts/install/Install-Services.ps1 and the parity rig. - docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the Tier-C out-of-process spec with a Tier-A in-process description matching the current GalaxyDriver code, with the four-section GalaxyDriverOptions JSON shape pulled verbatim from Config/GalaxyDriverOptions.cs. - docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the current Browse/Runtime/Health/Config sub-folders. Track 2 — historical banners (5 files) - lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md, docs/v2/Galaxy.ParityMatrix.md, docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a "✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md also fixes two dead links (`docs/Galaxy.Driver.md` and `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`. Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs) - Moved 10 v1 docs under docs/v1/ preserving subpath structure: AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess, Subscriptions (top-level); drivers/Galaxy-Repository, drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs, reqs/MxAccessClientReqs, reqs/ServiceHostReqs. - New docs/v1/README.md is the shared archive banner + per-file table. - docs/README.md repointed to the v1 paths and updated to reflect the v2 two-process deploy shape (Server + Admin + optional OtOpcUaWonderwareHistorian). - docs/v2/Galaxy.ParityRig.md got a historical banner + four inline scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2. The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now describes only the post-PR-7.2 architecture. v1 docs are preserved as a labelled archive under docs/v1/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
✅ Completed 2026-04-30 — historical record of the v2-mxgw backend-options decision.
This document evaluated alternative backend topologies before the v2-mxgw migration. Option 1 (in-process driver + gRPC gateway) was selected and implemented; see
lmx_mxgw.mdfor the design andlmx_mxgw_impl.mdfor the implementation plan. Both shipped at commitae7106d(2026-04-30). Preserved here as the audit trail.
Galaxy / LMX Backend — Restructuring Options
Context
Today the Galaxy driver is structured very differently from every other driver in this repo:
- Galaxy.Proxy (.NET 10, in-process): tiny shim that frames IPC to the host.
- Galaxy.Host (.NET Framework 4.8 x86, NSSM-wrapped Windows service):
owns MXAccess COM, the STA pump, the ZB Galaxy Repository SQL queries, the
Wonderware Historian SDK plugin, the per-platform
ScanStateprobe manager, the alarm tracker (.InAlarm/.Priority/.DescAttrName/.Ackedstate machine + ack writer), recycle policy, and post-mortem MMF.
Other drivers (Modbus, S7, AB CIP, OpcUaClient, TwinCAT, FOCAS Tier-C) are in-process Tier-A drivers in the .NET 10 server. They do data + browse only; historian and alarming are driver-agnostic concerns at the server layer.
A sibling project, mxaccessgw
(C:\Users\dohertj2\Desktop\mxaccessgw), already provides:
- A .NET 10 x64 gRPC gateway in front of per-session .NET 4.8 x86 worker
processes that own MXAccess COM, the STA, and event sinks
(
MxGateway.Server+MxGateway.Worker). - A full MXAccess command + event surface (
Register,AddItem,Advise,Write,WriteSecured,OnDataChange,OnWriteComplete, etc.). - A cached, deploy-gated, paged Galaxy Repository browse RPC
(
galaxy_repository.v1) reading the same ZB tables we read today, with the query bodies kept byte-identical to OtOpcUa. - A .NET client library (
clients/dotnet/MxGateway.Client). - API-key auth, Blazor dashboard, structured logs, metrics, watchdog/recycle.
The proposal is to strip Galaxy down to data + browse — push historian and alarming out to server-level subsystems where they live for every other driver — and pick how the slimmed-down driver talks to MXAccess.
What "push historian and alarming out" means
Both options below assume the same scope reduction; they only differ in how the driver reaches MXAccess.
| Concern | Today (Galaxy.Host) | After |
|---|---|---|
| Galaxy hierarchy browse | GalaxyRepository (SQL) inside Host |
Driver (Option 1: via gw browse RPC; Option 2: own SQL or worker) |
| Live read / write / subscribe | MxAccessClient + STA pump in Host |
gw (Option 1) or embedded worker (Option 2) |
| Wonderware Historian SDK | HistorianDataSource in Host (x86) |
Separate Historian data source plugged into the server's HA service. Likely stays its own .NET 4.8 x86 sidecar because the SDK is x86-only; independent of the Galaxy driver lifecycle. |
Alarm state machine (.InAlarm/.Acked quartet, transitions, ack writer) |
GalaxyAlarmTracker in Host |
Server-level A&E subsystem subscribes to alarm-bearing attributes the driver advertises and runs the AlarmCondition state machine generically. Driver only flags IsAlarm=true in node metadata. |
ScanState per-platform probes |
GalaxyRuntimeProbeManager in Host |
Driver-side: ScanState is just another tag subscription; the driver re-advises one per discovered $WinPlatform/$AppEngine and reports HostConnectivityStatus from the value stream. No special host-side machinery. |
After the strip-down, the Galaxy driver looks like Modbus or OpcUaClient: it discovers nodes, reads/writes/subscribes, and reports per-host transport health. Everything else is the server's problem.
Option 1 — Tier-A driver against the MxAccess Gateway
Driver.Galaxy becomes a regular in-process .NET 10 driver in the OtOpcUa
server (no .Host, no .Proxy split, no x86). It talks to a separately
deployed MxGateway.Server over gRPC using MxGateway.Client. Browse comes
from galaxy_repository.v1.DiscoverHierarchy. Live data comes from
MxAccessGateway.OpenSession/AddItem/Advise/StreamEvents.
OtOpcUa.Server (.NET 10 x64)
└── Driver.Galaxy (in-proc, .NET 10)
└── gRPC ──► MxGateway.Server (.NET 10 x64)
└── pipe ──► MxGateway.Worker (.NET 4.8 x86)
└── MXAccess COM (STA)
Pros
- Architectural parity with other drivers. No bespoke
Hostservice, no x86 build target, no NSSM wrapper, no STA pump in this repo, noPostMortemMmf/RecyclePolicywe maintain ourselves. - OtOpcUa server stops needing AVEVA installed on its own host. The gateway runs where MXAccess lives; the OPC UA server can live on a different box, in a container, or on a hardened jump host.
- One canonical MXAccess surface across the org. Any future tool — a diagnostic CLI, a Historian replacement, an integration harness — talks to the same gw with the same parity guarantees we get.
- Multi-instance friendly. Two OtOpcUa servers (warm/hot redundancy) share
one gw and one MXAccess footprint instead of each running their own
Galaxy.Hostwith duplicate Wonderware client identities. - Browse + cache for free.
galaxy_repository.v1already implements the hierarchy cache, deploy-time gating, paging, andWatchDeployEvents— we deleteGalaxyRepository.cs,GalaxyHierarchyRow.cs, the change-detection poll loop, and the matching SQL plumbing. - Operability for free. API-key auth, Blazor dashboard at
/dashboard, metrics viaMeter, structured logs with redaction. We currently have none of that inGalaxy.Host. - Future backend swap. When AVEVA exposes managed NMX or another modern path, gw routes to it without OtOpcUa changes (gw's stated roadmap).
- Tighter blast radius. A hung COM event, a leaking COM object, a crashing worker — all owned by gw's session/worker isolation, not the OPC UA server process.
- Simpler version story for OtOpcUa. Driver is plain .NET 10; the bitness/runtime split lives entirely in mxaccessgw's repo.
Cons
- Extra deployment dependency. mxaccessgw is now a service that has to be installed, monitored, and kept on a compatible protocol version. For a single-box install this is one more moving piece.
- Two hops on every call (driver→gw, gw→worker) instead of one (proxy→host). Today's hop is MessagePack over a named pipe; the new outer hop is gRPC over TCP. Per-call overhead is a few hundred microseconds, not a regression for OPC UA workloads but measurable for very chatty bursts.
- Auth/secret surface added. OtOpcUa now holds an API key for gw and rotates it; gw's SQLite-backed key store has to be managed.
- Failure model spans two processes we don't own — gw + worker. Reconnect logic in our driver has to ride both: gw transport drop, gw session lease expiry, gw-detected worker crash, plus the worker's own MXAccess reconnect. All of it is exposed in the gRPC contract, but it's still surface area.
- Cross-repo protocol coupling. Bumping
mxaccessgwmajor version (gRPC contract changes, session shape changes) ripples into OtOpcUa releases. Mitigated by versioned contracts; not free. - Galaxy redundancy still has to think about gw. A redundancy fail-over of OtOpcUa is independent of the gw's session lifecycle. Need to decide whether the standby holds an open session or only opens it on takeover.
- Sensitive writes (
WriteSecured,AuthenticateUser) cross the network if gw is remote. TLS + mTLS solves it but adds setup.
Option 2 — Embed mxaccessgw worker, no gateway
Driver.Galaxy is still in-process .NET 10, but instead of speaking gRPC to a
gateway service, it directly launches and supervises one (or more)
MxGateway.Worker processes and talks to them over the same named-pipe
worker protocol gw uses internally
(docs/WorkerFrameProtocol.md, docs/WorkerProcessLauncher.md). Browse stays
local — driver runs the SQL queries against ZB itself.
OtOpcUa.Server (.NET 10 x64)
└── Driver.Galaxy (in-proc, .NET 10)
├── ZB SQL (local, in-proc)
└── pipe ──► MxGateway.Worker (.NET 4.8 x86, child process)
└── MXAccess COM (STA)
Pros
- One hop, not two. Driver → worker pipe is the same shape as today's Proxy → Host pipe. Latency is on par with the current implementation.
- No new service to deploy. Worker is launched as a child process the
same way
Galaxy.Hostis launched today (just with mxaccessgw's worker binary). Single-machine install story stays simple. - Keeps the trust boundary local. No API keys, no TLS, no exposed gRPC port on the OtOpcUa box.
- Reuses mxaccessgw's parity-tested worker code — STA pump, COM lifetime, event conversion, fault model — without inheriting gw's ASP.NET Core / Blazor / SQLite footprint.
- Tighter ownership. OtOpcUa owns the worker lifecycle; recycle, kill, restart, post-mortem all decided by the driver, not by an external service we don't control.
- Easier to reason about during integration tests. No second service to spin up in CI; just a child process per test fixture.
Cons
- OtOpcUa server box must still have AVEVA + MXAccess installed, since the worker runs locally. The major deployment win of Option 1 (separating where MXAccess runs from where OtOpcUa runs) is lost.
- OtOpcUa still ships an x86 .NET 4.8 binary alongside it. Even if we vendor mxaccessgw's worker rather than write our own, installer complexity and bitness considerations remain.
- We re-implement everything gw already gives. Process supervision,
watchdog, recycle policy, heartbeat, post-mortem — these are exactly what
Galaxy.Hostdoes today, and they'd live in our repo again, just calling a different worker binary. - No browse cache, no deploy gating, no
WatchDeployEvents— we keep running our own ZB queries and our owntime_of_last_deploypoll, or we port gw's cache code into the driver. Either way it's duplicated logic. - No auth, no dashboard, no metrics. Operability stays where it is today (i.e., minimal). Adding it ourselves is a separate project.
- Multiple OtOpcUa instances multiply MXAccess sessions. Redundancy pair → two MXAccess clients on the Galaxy from the same software, vs. Option 1 where one gw arbitrates.
- Worker protocol coupling without the contract surface. We depend on mxaccessgw's worker IPC frame format — a surface that mxaccessgw treats as internal to its own gw↔worker boundary. If they refactor it, we have to follow. The public gRPC contract (Option 1) is more stable by design.
- Loses the "common MXAccess access point" benefit. Other consumers (CLI, integration harnesses, future tools) can't share state with our embedded worker.
Status quo (for comparison)
Keep Galaxy.Host as today, and in-place rip out historian + alarming +
probe manager. End state: the Host shrinks to MxAccessClient + GalaxyRepository,
which is roughly what Option 2 ends up looking like — but with our hand-rolled
COM bridge instead of mxaccessgw's worker. Not a serious option once
mxaccessgw exists; we'd be maintaining a parallel implementation of the same
thing.
Recommendation (effort-agnostic)
Go with Option 1 — Tier-A driver against the MxAccess Gateway.
The decisive arguments:
-
It's the only option that aligns Galaxy with how every other driver in this repo is structured. The user's stated goal — "keep lmx to data + browsing, similar to other drivers" — only fully resolves if there is no
.Hostand no x86 build artifact in this repo at all. Option 2 still has an x86 child process and supervisor code; it'sGalaxy.Hostwith a different worker binary inside. -
It separates where MXAccess runs from where OtOpcUa runs. That is a strategically larger win than a few hundred microseconds of per-call latency. The OPC UA server stops being chained to AVEVA install footprint, bitness, and Wonderware client identity — which removes a class of deployment, redundancy, and CI problems we hit today (e.g., the
DESKTOP-6JL3KKOHyper-V/Docker conflict, thedohertj2-only pipe ACL, the live-Galaxy smoke test prerequisites). -
It collapses scope. A non-trivial fraction of
Galaxy.Host(browse cache, deploy-event watch, worker supervision, COM bridge, post-mortem, recycle, ACL hardening) is reproduced better in mxaccessgw. Option 1 deletes our copy. Option 2 keeps it. -
It positions historian and alarming for the right home. Once the Galaxy driver is "just another driver", historian becomes a server-level data source (one that can also feed Modbus/S7 history if we ever want it), and alarming becomes a server-level A&E subsystem. Option 2 nominally allows the same move, but the temptation to keep them in
Galaxy.Host"while we're already there" is real. -
It future-proofs against AVEVA's roadmap. Managed NMX, ASB, or any replacement that shows up over the next few years gets adopted in mxaccessgw without a release in this repo.
The case for Option 2 is real but narrow: it's the right call only if we commit to single-box deployments forever, refuse to take a gRPC dependency, and value local-trust simplicity over the consolidation/operability benefits gw provides. None of those constraints hold here.
What flips the recommendation
- If the gw protocol is unstable or perf-tested under our subscription patterns turns out worse than expected → revisit Option 2.
- If org-policy forbids running an MXAccess gateway as its own service → Option 2.
- If Galaxy goes from one of several drivers to the primary driver and raw call-rate matters more than architectural fit → revisit.
Otherwise: Option 1.
Out-of-scope follow-ups (don't decide here, but flag them)
- Where does the Wonderware Historian SDK live? Likely its own
.NET 4.8 x86 sidecar exposing a small
IHistorianDataSourceover a pipe or gRPC, plugged into the OPC UA server's HA service alongside any future historian sources. Independent of which option above is chosen. - Alarm subsystem ownership. Decide whether the server hosts a generic AlarmCondition state machine driven by driver-advertised alarm metadata, or whether each driver continues to emit pre-shaped alarm transitions. Galaxy's 4-attr quartet is a strong forcing function for the generic approach.
- Redundancy + gw sessions. Standby OtOpcUa holds an open gw session (warm) vs. opens on takeover (cold). Affects gw worker count and Galaxy client-identity collisions.
- Auth between OtOpcUa and gw. API key in DPAPI-protected secret file vs. Windows-auth gRPC. Both supported by gw; pick before rollout.