Audit (three parallel agent passes) found 43 markdown files carrying stale references to the deleted Galaxy.Host/Proxy/Shared projects after the v2-mxgw merge. This commit lands the prioritized fixes. Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted) - README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install text; leads with the multi-driver .NET 10 server identity and points at scripts/install/Install-Services.ps1 and the parity rig. - docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the Tier-C out-of-process spec with a Tier-A in-process description matching the current GalaxyDriver code, with the four-section GalaxyDriverOptions JSON shape pulled verbatim from Config/GalaxyDriverOptions.cs. - docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the current Browse/Runtime/Health/Config sub-folders. Track 2 — historical banners (5 files) - lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md, docs/v2/Galaxy.ParityMatrix.md, docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a "✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md also fixes two dead links (`docs/Galaxy.Driver.md` and `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`. Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs) - Moved 10 v1 docs under docs/v1/ preserving subpath structure: AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess, Subscriptions (top-level); drivers/Galaxy-Repository, drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs, reqs/MxAccessClientReqs, reqs/ServiceHostReqs. - New docs/v1/README.md is the shared archive banner + per-file table. - docs/README.md repointed to the v1 paths and updated to reflect the v2 two-process deploy shape (Server + Admin + optional OtOpcUaWonderwareHistorian). - docs/v2/Galaxy.ParityRig.md got a historical banner + four inline scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2. The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now describes only the post-PR-7.2 architecture. v1 docs are preserved as a labelled archive under docs/v1/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
283 lines
15 KiB
Markdown
283 lines
15 KiB
Markdown
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw backend-options decision.**
|
|
>
|
|
> This document evaluated alternative backend topologies before the
|
|
> v2-mxgw migration. **Option 1 (in-process driver + gRPC gateway) was
|
|
> selected and implemented**; see `lmx_mxgw.md` for the design and
|
|
> `lmx_mxgw_impl.md` for the implementation plan. Both shipped at
|
|
> commit `ae7106d` (2026-04-30). Preserved here as the audit trail.
|
|
|
|
# Galaxy / LMX Backend — Restructuring Options
|
|
|
|
## Context
|
|
|
|
Today the Galaxy driver is structured very differently from every other driver
|
|
in this repo:
|
|
|
|
- **Galaxy.Proxy** (.NET 10, in-process): tiny shim that frames IPC to the host.
|
|
- **Galaxy.Host** (.NET Framework 4.8 **x86**, NSSM-wrapped Windows service):
|
|
owns MXAccess COM, the STA pump, the ZB Galaxy Repository SQL queries, the
|
|
Wonderware Historian SDK plugin, the per-platform `ScanState` probe manager,
|
|
the alarm tracker (`.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` state
|
|
machine + ack writer), recycle policy, and post-mortem MMF.
|
|
|
|
Other drivers (Modbus, S7, AB CIP, OpcUaClient, TwinCAT, FOCAS Tier-C) are
|
|
**in-process Tier-A drivers** in the .NET 10 server. They do data + browse
|
|
only; historian and alarming are driver-agnostic concerns at the server layer.
|
|
|
|
A sibling project, **mxaccessgw**
|
|
(`C:\Users\dohertj2\Desktop\mxaccessgw`), already provides:
|
|
|
|
- A .NET 10 x64 gRPC gateway in front of per-session .NET 4.8 x86 worker
|
|
processes that own MXAccess COM, the STA, and event sinks
|
|
(`MxGateway.Server` + `MxGateway.Worker`).
|
|
- A full MXAccess command + event surface (`Register`, `AddItem`, `Advise`,
|
|
`Write`, `WriteSecured`, `OnDataChange`, `OnWriteComplete`, etc.).
|
|
- A cached, deploy-gated, paged **Galaxy Repository browse** RPC
|
|
(`galaxy_repository.v1`) reading the same ZB tables we read today, with the
|
|
query bodies kept byte-identical to OtOpcUa.
|
|
- A .NET client library (`clients/dotnet/MxGateway.Client`).
|
|
- API-key auth, Blazor dashboard, structured logs, metrics, watchdog/recycle.
|
|
|
|
The proposal is to **strip Galaxy down to data + browse** — push historian and
|
|
alarming out to server-level subsystems where they live for every other driver
|
|
— and pick how the slimmed-down driver talks to MXAccess.
|
|
|
|
---
|
|
|
|
## What "push historian and alarming out" means
|
|
|
|
Both options below assume the same scope reduction; they only differ in how
|
|
the driver reaches MXAccess.
|
|
|
|
| Concern | Today (Galaxy.Host) | After |
|
|
|---|---|---|
|
|
| Galaxy hierarchy browse | `GalaxyRepository` (SQL) inside Host | Driver (Option 1: via gw browse RPC; Option 2: own SQL or worker) |
|
|
| Live read / write / subscribe | `MxAccessClient` + STA pump in Host | gw (Option 1) or embedded worker (Option 2) |
|
|
| Wonderware Historian SDK | `HistorianDataSource` in Host (x86) | Separate Historian data source plugged into the server's HA service. Likely stays its own .NET 4.8 x86 sidecar because the SDK is x86-only; **independent of the Galaxy driver lifecycle**. |
|
|
| Alarm state machine (`.InAlarm`/`.Acked` quartet, transitions, ack writer) | `GalaxyAlarmTracker` in Host | Server-level A&E subsystem subscribes to alarm-bearing attributes the driver advertises and runs the AlarmCondition state machine generically. Driver only flags `IsAlarm=true` in node metadata. |
|
|
| `ScanState` per-platform probes | `GalaxyRuntimeProbeManager` in Host | Driver-side: ScanState is just another tag subscription; the driver re-advises one per discovered `$WinPlatform`/`$AppEngine` and reports `HostConnectivityStatus` from the value stream. No special host-side machinery. |
|
|
|
|
After the strip-down, the Galaxy driver looks like Modbus or OpcUaClient: it
|
|
discovers nodes, reads/writes/subscribes, and reports per-host transport
|
|
health. Everything else is the server's problem.
|
|
|
|
---
|
|
|
|
## Option 1 — Tier-A driver against the MxAccess Gateway
|
|
|
|
`Driver.Galaxy` becomes a regular **in-process .NET 10 driver** in the OtOpcUa
|
|
server (no `.Host`, no `.Proxy` split, no x86). It talks to a separately
|
|
deployed `MxGateway.Server` over gRPC using `MxGateway.Client`. Browse comes
|
|
from `galaxy_repository.v1.DiscoverHierarchy`. Live data comes from
|
|
`MxAccessGateway.OpenSession`/`AddItem`/`Advise`/`StreamEvents`.
|
|
|
|
```
|
|
OtOpcUa.Server (.NET 10 x64)
|
|
└── Driver.Galaxy (in-proc, .NET 10)
|
|
└── gRPC ──► MxGateway.Server (.NET 10 x64)
|
|
└── pipe ──► MxGateway.Worker (.NET 4.8 x86)
|
|
└── MXAccess COM (STA)
|
|
```
|
|
|
|
### Pros
|
|
|
|
- **Architectural parity with other drivers.** No bespoke `Host` service, no
|
|
x86 build target, no NSSM wrapper, no STA pump in this repo, no
|
|
`PostMortemMmf`/`RecyclePolicy` we maintain ourselves.
|
|
- **OtOpcUa server stops needing AVEVA installed on its own host.** The
|
|
gateway runs where MXAccess lives; the OPC UA server can live on a different
|
|
box, in a container, or on a hardened jump host.
|
|
- **One canonical MXAccess surface across the org.** Any future tool — a
|
|
diagnostic CLI, a Historian replacement, an integration harness — talks to
|
|
the same gw with the same parity guarantees we get.
|
|
- **Multi-instance friendly.** Two OtOpcUa servers (warm/hot redundancy) share
|
|
one gw and one MXAccess footprint instead of each running their own
|
|
`Galaxy.Host` with duplicate Wonderware client identities.
|
|
- **Browse + cache for free.** `galaxy_repository.v1` already implements the
|
|
hierarchy cache, deploy-time gating, paging, and `WatchDeployEvents` — we
|
|
delete `GalaxyRepository.cs`, `GalaxyHierarchyRow.cs`, the change-detection
|
|
poll loop, and the matching SQL plumbing.
|
|
- **Operability for free.** API-key auth, Blazor dashboard at `/dashboard`,
|
|
metrics via `Meter`, structured logs with redaction. We currently have
|
|
none of that in `Galaxy.Host`.
|
|
- **Future backend swap.** When AVEVA exposes managed NMX or another modern
|
|
path, gw routes to it without OtOpcUa changes (gw's stated roadmap).
|
|
- **Tighter blast radius.** A hung COM event, a leaking COM object, a
|
|
crashing worker — all owned by gw's session/worker isolation, not the
|
|
OPC UA server process.
|
|
- **Simpler version story for OtOpcUa.** Driver is plain .NET 10; the
|
|
bitness/runtime split lives entirely in mxaccessgw's repo.
|
|
|
|
### Cons
|
|
|
|
- **Extra deployment dependency.** mxaccessgw is now a service that has to be
|
|
installed, monitored, and kept on a compatible protocol version. For a
|
|
single-box install this is one more moving piece.
|
|
- **Two hops on every call** (driver→gw, gw→worker) instead of one
|
|
(proxy→host). Today's hop is MessagePack over a named pipe; the new outer
|
|
hop is gRPC over TCP. Per-call overhead is a few hundred microseconds, not
|
|
a regression for OPC UA workloads but measurable for very chatty bursts.
|
|
- **Auth/secret surface added.** OtOpcUa now holds an API key for gw and
|
|
rotates it; gw's SQLite-backed key store has to be managed.
|
|
- **Failure model spans two processes we don't own** — gw + worker. Reconnect
|
|
logic in our driver has to ride both: gw transport drop, gw session lease
|
|
expiry, gw-detected worker crash, plus the worker's own MXAccess reconnect.
|
|
All of it is exposed in the gRPC contract, but it's still surface area.
|
|
- **Cross-repo protocol coupling.** Bumping `mxaccessgw` major version (gRPC
|
|
contract changes, session shape changes) ripples into OtOpcUa releases.
|
|
Mitigated by versioned contracts; not free.
|
|
- **Galaxy redundancy still has to think about gw.** A redundancy fail-over of
|
|
OtOpcUa is independent of the gw's session lifecycle. Need to decide whether
|
|
the standby holds an open session or only opens it on takeover.
|
|
- **Sensitive writes (`WriteSecured`, `AuthenticateUser`) cross the network**
|
|
if gw is remote. TLS + mTLS solves it but adds setup.
|
|
|
|
---
|
|
|
|
## Option 2 — Embed mxaccessgw worker, no gateway
|
|
|
|
`Driver.Galaxy` is still in-process .NET 10, but instead of speaking gRPC to a
|
|
gateway service, it directly **launches and supervises one (or more)
|
|
`MxGateway.Worker` processes** and talks to them over the same named-pipe
|
|
worker protocol gw uses internally
|
|
(`docs/WorkerFrameProtocol.md`, `docs/WorkerProcessLauncher.md`). Browse stays
|
|
local — driver runs the SQL queries against ZB itself.
|
|
|
|
```
|
|
OtOpcUa.Server (.NET 10 x64)
|
|
└── Driver.Galaxy (in-proc, .NET 10)
|
|
├── ZB SQL (local, in-proc)
|
|
└── pipe ──► MxGateway.Worker (.NET 4.8 x86, child process)
|
|
└── MXAccess COM (STA)
|
|
```
|
|
|
|
### Pros
|
|
|
|
- **One hop, not two.** Driver → worker pipe is the same shape as today's
|
|
Proxy → Host pipe. Latency is on par with the current implementation.
|
|
- **No new service to deploy.** Worker is launched as a child process the
|
|
same way `Galaxy.Host` is launched today (just with mxaccessgw's worker
|
|
binary). Single-machine install story stays simple.
|
|
- **Keeps the trust boundary local.** No API keys, no TLS, no exposed gRPC
|
|
port on the OtOpcUa box.
|
|
- **Reuses mxaccessgw's parity-tested worker code** — STA pump, COM lifetime,
|
|
event conversion, fault model — without inheriting gw's ASP.NET Core /
|
|
Blazor / SQLite footprint.
|
|
- **Tighter ownership.** OtOpcUa owns the worker lifecycle; recycle, kill,
|
|
restart, post-mortem all decided by the driver, not by an external service
|
|
we don't control.
|
|
- **Easier to reason about during integration tests.** No second service to
|
|
spin up in CI; just a child process per test fixture.
|
|
|
|
### Cons
|
|
|
|
- **OtOpcUa server box must still have AVEVA + MXAccess installed**, since
|
|
the worker runs locally. The major deployment win of Option 1
|
|
(separating where MXAccess runs from where OtOpcUa runs) is lost.
|
|
- **OtOpcUa still ships an x86 .NET 4.8 binary alongside it.** Even if we
|
|
vendor mxaccessgw's worker rather than write our own, installer complexity
|
|
and bitness considerations remain.
|
|
- **We re-implement everything gw already gives.** Process supervision,
|
|
watchdog, recycle policy, heartbeat, post-mortem — these are exactly what
|
|
`Galaxy.Host` does today, and they'd live in our repo again, just calling a
|
|
different worker binary.
|
|
- **No browse cache, no deploy gating, no `WatchDeployEvents`** — we keep
|
|
running our own ZB queries and our own `time_of_last_deploy` poll, or we
|
|
port gw's cache code into the driver. Either way it's duplicated logic.
|
|
- **No auth, no dashboard, no metrics.** Operability stays where it is today
|
|
(i.e., minimal). Adding it ourselves is a separate project.
|
|
- **Multiple OtOpcUa instances multiply MXAccess sessions.** Redundancy pair
|
|
→ two MXAccess clients on the Galaxy from the same software, vs. Option 1
|
|
where one gw arbitrates.
|
|
- **Worker protocol coupling without the contract surface.** We depend on
|
|
mxaccessgw's worker IPC frame format — a surface that mxaccessgw treats as
|
|
*internal* to its own gw↔worker boundary. If they refactor it, we have to
|
|
follow. The public gRPC contract (Option 1) is more stable by design.
|
|
- **Loses the "common MXAccess access point" benefit.** Other consumers
|
|
(CLI, integration harnesses, future tools) can't share state with our
|
|
embedded worker.
|
|
|
|
---
|
|
|
|
## Status quo (for comparison)
|
|
|
|
Keep `Galaxy.Host` as today, and in-place rip out historian + alarming +
|
|
probe manager. End state: the Host shrinks to `MxAccessClient` + `GalaxyRepository`,
|
|
which is roughly what Option 2 ends up looking like — but with our hand-rolled
|
|
COM bridge instead of mxaccessgw's worker. Not a serious option once
|
|
mxaccessgw exists; we'd be maintaining a parallel implementation of the same
|
|
thing.
|
|
|
|
---
|
|
|
|
## Recommendation (effort-agnostic)
|
|
|
|
**Go with Option 1 — Tier-A driver against the MxAccess Gateway.**
|
|
|
|
The decisive arguments:
|
|
|
|
1. **It's the only option that aligns Galaxy with how every other driver in
|
|
this repo is structured.** The user's stated goal — "keep lmx to data +
|
|
browsing, similar to other drivers" — only fully resolves if there is no
|
|
`.Host` and no x86 build artifact in this repo at all. Option 2 still has
|
|
an x86 child process and supervisor code; it's `Galaxy.Host` with a
|
|
different worker binary inside.
|
|
|
|
2. **It separates *where MXAccess runs* from *where OtOpcUa runs*.** That is
|
|
a strategically larger win than a few hundred microseconds of per-call
|
|
latency. The OPC UA server stops being chained to AVEVA install footprint,
|
|
bitness, and Wonderware client identity — which removes a class of
|
|
deployment, redundancy, and CI problems we hit today (e.g., the
|
|
`DESKTOP-6JL3KKO` Hyper-V/Docker conflict, the `dohertj2`-only pipe ACL,
|
|
the live-Galaxy smoke test prerequisites).
|
|
|
|
3. **It collapses scope.** A non-trivial fraction of `Galaxy.Host` (browse
|
|
cache, deploy-event watch, worker supervision, COM bridge, post-mortem,
|
|
recycle, ACL hardening) is reproduced *better* in mxaccessgw. Option 1
|
|
deletes our copy. Option 2 keeps it.
|
|
|
|
4. **It positions historian and alarming for the right home.** Once the
|
|
Galaxy driver is "just another driver", historian becomes a server-level
|
|
data source (one that can also feed Modbus/S7 history if we ever want it),
|
|
and alarming becomes a server-level A&E subsystem. Option 2 nominally
|
|
allows the same move, but the temptation to keep them in `Galaxy.Host`
|
|
"while we're already there" is real.
|
|
|
|
5. **It future-proofs against AVEVA's roadmap.** Managed NMX, ASB, or any
|
|
replacement that shows up over the next few years gets adopted in
|
|
mxaccessgw without a release in this repo.
|
|
|
|
The case for Option 2 is real but narrow: it's the right call **only** if we
|
|
commit to single-box deployments forever, refuse to take a gRPC dependency,
|
|
and value local-trust simplicity over the consolidation/operability benefits
|
|
gw provides. None of those constraints hold here.
|
|
|
|
### What flips the recommendation
|
|
|
|
- If the gw protocol is unstable or perf-tested under our subscription
|
|
patterns turns out worse than expected → revisit Option 2.
|
|
- If org-policy forbids running an MXAccess gateway as its own service →
|
|
Option 2.
|
|
- If Galaxy goes from one of several drivers to *the* primary driver and
|
|
raw call-rate matters more than architectural fit → revisit.
|
|
|
|
Otherwise: Option 1.
|
|
|
|
---
|
|
|
|
## Out-of-scope follow-ups (don't decide here, but flag them)
|
|
|
|
- **Where does the Wonderware Historian SDK live?** Likely its own
|
|
.NET 4.8 x86 sidecar exposing a small `IHistorianDataSource` over a pipe or
|
|
gRPC, plugged into the OPC UA server's HA service alongside any future
|
|
historian sources. Independent of which option above is chosen.
|
|
- **Alarm subsystem ownership.** Decide whether the server hosts a generic
|
|
AlarmCondition state machine driven by driver-advertised alarm metadata, or
|
|
whether each driver continues to emit pre-shaped alarm transitions. Galaxy's
|
|
4-attr quartet is a strong forcing function for the generic approach.
|
|
- **Redundancy + gw sessions.** Standby OtOpcUa holds an open gw session
|
|
(warm) vs. opens on takeover (cold). Affects gw worker count and Galaxy
|
|
client-identity collisions.
|
|
- **Auth between OtOpcUa and gw.** API key in DPAPI-protected secret file vs.
|
|
Windows-auth gRPC. Both supported by gw; pick before rollout.
|