chore: drop root scratch + retired v2-mxgw plan docs
- Delete _p54.json / _p55.json (PR-body snapshots for the shipped S7 + Mitsubishi research docs). - Delete session.dat (38-byte CLI runtime cache, not produced by any current source code) and add it to .gitignore so it doesn't come back. - Delete lmx_backend.md / lmx_mxgw.md / lmx_mxgw_impl.md. All three carried "✅ Completed 2026-04-30" historical-record banners — the v2-mxgw migration shipped + merged to master, so the design plans served their purpose. Drop the cross-refs from CLAUDE.md and docs/v1/README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
3
.gitignore
vendored
3
.gitignore
vendored
@@ -37,3 +37,6 @@ src/ZB.MOM.WW.OtOpcUa.Server/config_cache.db
|
||||
# E2E sidecar config — NodeIds are specific to each dev's local seed (see scripts/e2e/README.md)
|
||||
scripts/e2e/e2e-config.json
|
||||
config_cache*.db
|
||||
|
||||
# Client CLI/UI runtime scratch (last-connected endpoint cache)
|
||||
session.dat
|
||||
|
||||
@@ -16,8 +16,7 @@ in this repo is .NET 10. PR 7.2 retired the legacy in-process
|
||||
`Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the
|
||||
`OtOpcUaGalaxyHost` Windows service.
|
||||
|
||||
See `lmx_mxgw.md` for the migration design and
|
||||
`docs/v2/Galaxy.Performance.md` for the runtime perf surface
|
||||
See `docs/v2/Galaxy.Performance.md` for the runtime perf surface
|
||||
(tracing, metrics, soak harness).
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
{"title":"Phase 3 PR 54 -- Siemens S7 Modbus TCP quirks research doc","body":"## Summary\n\nAdds `docs/v2/s7.md` (485 lines) covering Siemens SIMATIC S7 family Modbus TCP behavior. Mirrors the `docs/v2/dl205.md` template for future per-quirk implementation PRs.\n\n## Key findings for the implementation track\n\n- **No fixed memory map** — every S7 Modbus server is user-wired via `MB_SERVER`/`MODBUSCP`/`MODBUSPN` library blocks. Driver must accept per-site config, not assume a vendor layout.\n- **MB_SERVER requires non-optimized DBs** (STATUS `0x8383` if optimized). Most common field bug.\n- **Word order default = ABCD** (opposite of DL260). Driver's S7 profile default must be `ByteOrder.BigEndian`, not `WordSwap`.\n- **One port per MB_SERVER instance** — multi-client requires parallel FBs on 503/504/… Most clients assume port 502 multiplexes (wrong on S7).\n- **CP 343-1 Lean is server-only**, requires the `2XV9450-1MB00` license.\n- **FC20/21/22/23/43 all return Illegal Function** on every S7 variant — driver must not attempt FC23 bulk-read optimization for S7.\n- **STOP-mode behavior non-deterministic** across firmware bands — treat both read/write STOP-mode responses as unavailable.\n\nTwo items flagged as unconfirmed rumour (V2.0+ float byte-order claim, STOP-mode caching location).\n\nNo code, no tests — implementation lands in PRs 56+.\n\n## Test plan\n- [x] Doc renders as markdown\n- [x] 31 citations present\n- [x] Section structure matches dl205.md template","head":"phase-3-pr54-s7-research-doc","base":"v2"}
|
||||
@@ -1 +0,0 @@
|
||||
{"title":"Phase 3 PR 55 -- Mitsubishi MELSEC Modbus TCP quirks research doc","body":"## Summary\n\nAdds `docs/v2/mitsubishi.md` (451 lines) covering MELSEC Q/L/iQ-R/iQ-F/FX3U Modbus TCP behavior. Mirrors `docs/v2/dl205.md` template for per-quirk implementation PRs.\n\n## Key findings for the implementation track\n\n- **Module naming trap** — `QJ71MB91` is SERIAL RTU, not TCP. TCP module is `QJ71MT91`. Surface clearly in driver docs.\n- **No canonical mapping** — per-site 'Modbus Device Assignment Parameter' block (up to 16 entries). Treat mapping as runtime config.\n- **X/Y hex vs octal depends on family** — Q/L/iQ-R use HEX (X20 = decimal 32); FX/iQ-F use OCTAL (X20 = decimal 16). Helper must take a family selector.\n- **Word order CDAB default** across all MELSEC families (opposite of Siemens S7). Driver Mitsubishi profile default: `ByteOrder.WordSwap`.\n- **D-registers binary by default** (opposite of DL205's BCD default). Caller opts in to `Bcd16`/`Bcd32` when ladder uses BCD.\n- **FX5U needs firmware ≥ 1.060** for Modbus TCP server — older is client-only.\n- **FX3U-ENET vs FX3U-ENET-P502 vs FX3U-ENET-ADP** — only the middle one binds port 502; the last has no Modbus at all. Common operator mis-purchase.\n- **QJ71MT91 does NOT support FC22 / FC23** — iQ-R / iQ-F do. Bulk-read optimization must gate on capability.\n- **STOP-mode writes configurable** on Q/L/iQ-R/iQ-F (default accept), always rejected on FX3U-ENET.\n\nThree unconfirmed rumours flagged separately.\n\nNo code, no tests — implementation lands in PRs 58+.\n\n## Test plan\n- [x] Doc renders as markdown\n- [x] 17 citations present\n- [x] Per-model test naming matrix included (`Mitsubishi_QJ71MT91_*`, `Mitsubishi_FX5U_*`, `Mitsubishi_FX3U_ENET_*`, shared `Mitsubishi_Common_*`)","head":"phase-3-pr55-mitsubishi-research-doc","base":"v2"}
|
||||
@@ -15,7 +15,6 @@ For current architecture see:
|
||||
- `docs/drivers/Galaxy.md` — current Galaxy driver doc
|
||||
- `docs/v2/Galaxy.ParityRig.md` — current testing setup
|
||||
- `docs/v2/Galaxy.Performance.md` — observability + perf
|
||||
- `lmx_mxgw.md` (in repo root) — design rationale for the migration
|
||||
|
||||
| File | What it covered |
|
||||
|---|---|
|
||||
|
||||
282
lmx_backend.md
282
lmx_backend.md
@@ -1,282 +0,0 @@
|
||||
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw backend-options decision.**
|
||||
>
|
||||
> This document evaluated alternative backend topologies before the
|
||||
> v2-mxgw migration. **Option 1 (in-process driver + gRPC gateway) was
|
||||
> selected and implemented**; see `lmx_mxgw.md` for the design and
|
||||
> `lmx_mxgw_impl.md` for the implementation plan. Both shipped at
|
||||
> commit `ae7106d` (2026-04-30). Preserved here as the audit trail.
|
||||
|
||||
# Galaxy / LMX Backend — Restructuring Options
|
||||
|
||||
## Context
|
||||
|
||||
Today the Galaxy driver is structured very differently from every other driver
|
||||
in this repo:
|
||||
|
||||
- **Galaxy.Proxy** (.NET 10, in-process): tiny shim that frames IPC to the host.
|
||||
- **Galaxy.Host** (.NET Framework 4.8 **x86**, NSSM-wrapped Windows service):
|
||||
owns MXAccess COM, the STA pump, the ZB Galaxy Repository SQL queries, the
|
||||
Wonderware Historian SDK plugin, the per-platform `ScanState` probe manager,
|
||||
the alarm tracker (`.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` state
|
||||
machine + ack writer), recycle policy, and post-mortem MMF.
|
||||
|
||||
Other drivers (Modbus, S7, AB CIP, OpcUaClient, TwinCAT, FOCAS Tier-C) are
|
||||
**in-process Tier-A drivers** in the .NET 10 server. They do data + browse
|
||||
only; historian and alarming are driver-agnostic concerns at the server layer.
|
||||
|
||||
A sibling project, **mxaccessgw**
|
||||
(`C:\Users\dohertj2\Desktop\mxaccessgw`), already provides:
|
||||
|
||||
- A .NET 10 x64 gRPC gateway in front of per-session .NET 4.8 x86 worker
|
||||
processes that own MXAccess COM, the STA, and event sinks
|
||||
(`MxGateway.Server` + `MxGateway.Worker`).
|
||||
- A full MXAccess command + event surface (`Register`, `AddItem`, `Advise`,
|
||||
`Write`, `WriteSecured`, `OnDataChange`, `OnWriteComplete`, etc.).
|
||||
- A cached, deploy-gated, paged **Galaxy Repository browse** RPC
|
||||
(`galaxy_repository.v1`) reading the same ZB tables we read today, with the
|
||||
query bodies kept byte-identical to OtOpcUa.
|
||||
- A .NET client library (`clients/dotnet/MxGateway.Client`).
|
||||
- API-key auth, Blazor dashboard, structured logs, metrics, watchdog/recycle.
|
||||
|
||||
The proposal is to **strip Galaxy down to data + browse** — push historian and
|
||||
alarming out to server-level subsystems where they live for every other driver
|
||||
— and pick how the slimmed-down driver talks to MXAccess.
|
||||
|
||||
---
|
||||
|
||||
## What "push historian and alarming out" means
|
||||
|
||||
Both options below assume the same scope reduction; they only differ in how
|
||||
the driver reaches MXAccess.
|
||||
|
||||
| Concern | Today (Galaxy.Host) | After |
|
||||
|---|---|---|
|
||||
| Galaxy hierarchy browse | `GalaxyRepository` (SQL) inside Host | Driver (Option 1: via gw browse RPC; Option 2: own SQL or worker) |
|
||||
| Live read / write / subscribe | `MxAccessClient` + STA pump in Host | gw (Option 1) or embedded worker (Option 2) |
|
||||
| Wonderware Historian SDK | `HistorianDataSource` in Host (x86) | Separate Historian data source plugged into the server's HA service. Likely stays its own .NET 4.8 x86 sidecar because the SDK is x86-only; **independent of the Galaxy driver lifecycle**. |
|
||||
| Alarm state machine (`.InAlarm`/`.Acked` quartet, transitions, ack writer) | `GalaxyAlarmTracker` in Host | Server-level A&E subsystem subscribes to alarm-bearing attributes the driver advertises and runs the AlarmCondition state machine generically. Driver only flags `IsAlarm=true` in node metadata. |
|
||||
| `ScanState` per-platform probes | `GalaxyRuntimeProbeManager` in Host | Driver-side: ScanState is just another tag subscription; the driver re-advises one per discovered `$WinPlatform`/`$AppEngine` and reports `HostConnectivityStatus` from the value stream. No special host-side machinery. |
|
||||
|
||||
After the strip-down, the Galaxy driver looks like Modbus or OpcUaClient: it
|
||||
discovers nodes, reads/writes/subscribes, and reports per-host transport
|
||||
health. Everything else is the server's problem.
|
||||
|
||||
---
|
||||
|
||||
## Option 1 — Tier-A driver against the MxAccess Gateway
|
||||
|
||||
`Driver.Galaxy` becomes a regular **in-process .NET 10 driver** in the OtOpcUa
|
||||
server (no `.Host`, no `.Proxy` split, no x86). It talks to a separately
|
||||
deployed `MxGateway.Server` over gRPC using `MxGateway.Client`. Browse comes
|
||||
from `galaxy_repository.v1.DiscoverHierarchy`. Live data comes from
|
||||
`MxAccessGateway.OpenSession`/`AddItem`/`Advise`/`StreamEvents`.
|
||||
|
||||
```
|
||||
OtOpcUa.Server (.NET 10 x64)
|
||||
└── Driver.Galaxy (in-proc, .NET 10)
|
||||
└── gRPC ──► MxGateway.Server (.NET 10 x64)
|
||||
└── pipe ──► MxGateway.Worker (.NET 4.8 x86)
|
||||
└── MXAccess COM (STA)
|
||||
```
|
||||
|
||||
### Pros
|
||||
|
||||
- **Architectural parity with other drivers.** No bespoke `Host` service, no
|
||||
x86 build target, no NSSM wrapper, no STA pump in this repo, no
|
||||
`PostMortemMmf`/`RecyclePolicy` we maintain ourselves.
|
||||
- **OtOpcUa server stops needing AVEVA installed on its own host.** The
|
||||
gateway runs where MXAccess lives; the OPC UA server can live on a different
|
||||
box, in a container, or on a hardened jump host.
|
||||
- **One canonical MXAccess surface across the org.** Any future tool — a
|
||||
diagnostic CLI, a Historian replacement, an integration harness — talks to
|
||||
the same gw with the same parity guarantees we get.
|
||||
- **Multi-instance friendly.** Two OtOpcUa servers (warm/hot redundancy) share
|
||||
one gw and one MXAccess footprint instead of each running their own
|
||||
`Galaxy.Host` with duplicate Wonderware client identities.
|
||||
- **Browse + cache for free.** `galaxy_repository.v1` already implements the
|
||||
hierarchy cache, deploy-time gating, paging, and `WatchDeployEvents` — we
|
||||
delete `GalaxyRepository.cs`, `GalaxyHierarchyRow.cs`, the change-detection
|
||||
poll loop, and the matching SQL plumbing.
|
||||
- **Operability for free.** API-key auth, Blazor dashboard at `/dashboard`,
|
||||
metrics via `Meter`, structured logs with redaction. We currently have
|
||||
none of that in `Galaxy.Host`.
|
||||
- **Future backend swap.** When AVEVA exposes managed NMX or another modern
|
||||
path, gw routes to it without OtOpcUa changes (gw's stated roadmap).
|
||||
- **Tighter blast radius.** A hung COM event, a leaking COM object, a
|
||||
crashing worker — all owned by gw's session/worker isolation, not the
|
||||
OPC UA server process.
|
||||
- **Simpler version story for OtOpcUa.** Driver is plain .NET 10; the
|
||||
bitness/runtime split lives entirely in mxaccessgw's repo.
|
||||
|
||||
### Cons
|
||||
|
||||
- **Extra deployment dependency.** mxaccessgw is now a service that has to be
|
||||
installed, monitored, and kept on a compatible protocol version. For a
|
||||
single-box install this is one more moving piece.
|
||||
- **Two hops on every call** (driver→gw, gw→worker) instead of one
|
||||
(proxy→host). Today's hop is MessagePack over a named pipe; the new outer
|
||||
hop is gRPC over TCP. Per-call overhead is a few hundred microseconds, not
|
||||
a regression for OPC UA workloads but measurable for very chatty bursts.
|
||||
- **Auth/secret surface added.** OtOpcUa now holds an API key for gw and
|
||||
rotates it; gw's SQLite-backed key store has to be managed.
|
||||
- **Failure model spans two processes we don't own** — gw + worker. Reconnect
|
||||
logic in our driver has to ride both: gw transport drop, gw session lease
|
||||
expiry, gw-detected worker crash, plus the worker's own MXAccess reconnect.
|
||||
All of it is exposed in the gRPC contract, but it's still surface area.
|
||||
- **Cross-repo protocol coupling.** Bumping `mxaccessgw` major version (gRPC
|
||||
contract changes, session shape changes) ripples into OtOpcUa releases.
|
||||
Mitigated by versioned contracts; not free.
|
||||
- **Galaxy redundancy still has to think about gw.** A redundancy fail-over of
|
||||
OtOpcUa is independent of the gw's session lifecycle. Need to decide whether
|
||||
the standby holds an open session or only opens it on takeover.
|
||||
- **Sensitive writes (`WriteSecured`, `AuthenticateUser`) cross the network**
|
||||
if gw is remote. TLS + mTLS solves it but adds setup.
|
||||
|
||||
---
|
||||
|
||||
## Option 2 — Embed mxaccessgw worker, no gateway
|
||||
|
||||
`Driver.Galaxy` is still in-process .NET 10, but instead of speaking gRPC to a
|
||||
gateway service, it directly **launches and supervises one (or more)
|
||||
`MxGateway.Worker` processes** and talks to them over the same named-pipe
|
||||
worker protocol gw uses internally
|
||||
(`docs/WorkerFrameProtocol.md`, `docs/WorkerProcessLauncher.md`). Browse stays
|
||||
local — driver runs the SQL queries against ZB itself.
|
||||
|
||||
```
|
||||
OtOpcUa.Server (.NET 10 x64)
|
||||
└── Driver.Galaxy (in-proc, .NET 10)
|
||||
├── ZB SQL (local, in-proc)
|
||||
└── pipe ──► MxGateway.Worker (.NET 4.8 x86, child process)
|
||||
└── MXAccess COM (STA)
|
||||
```
|
||||
|
||||
### Pros
|
||||
|
||||
- **One hop, not two.** Driver → worker pipe is the same shape as today's
|
||||
Proxy → Host pipe. Latency is on par with the current implementation.
|
||||
- **No new service to deploy.** Worker is launched as a child process the
|
||||
same way `Galaxy.Host` is launched today (just with mxaccessgw's worker
|
||||
binary). Single-machine install story stays simple.
|
||||
- **Keeps the trust boundary local.** No API keys, no TLS, no exposed gRPC
|
||||
port on the OtOpcUa box.
|
||||
- **Reuses mxaccessgw's parity-tested worker code** — STA pump, COM lifetime,
|
||||
event conversion, fault model — without inheriting gw's ASP.NET Core /
|
||||
Blazor / SQLite footprint.
|
||||
- **Tighter ownership.** OtOpcUa owns the worker lifecycle; recycle, kill,
|
||||
restart, post-mortem all decided by the driver, not by an external service
|
||||
we don't control.
|
||||
- **Easier to reason about during integration tests.** No second service to
|
||||
spin up in CI; just a child process per test fixture.
|
||||
|
||||
### Cons
|
||||
|
||||
- **OtOpcUa server box must still have AVEVA + MXAccess installed**, since
|
||||
the worker runs locally. The major deployment win of Option 1
|
||||
(separating where MXAccess runs from where OtOpcUa runs) is lost.
|
||||
- **OtOpcUa still ships an x86 .NET 4.8 binary alongside it.** Even if we
|
||||
vendor mxaccessgw's worker rather than write our own, installer complexity
|
||||
and bitness considerations remain.
|
||||
- **We re-implement everything gw already gives.** Process supervision,
|
||||
watchdog, recycle policy, heartbeat, post-mortem — these are exactly what
|
||||
`Galaxy.Host` does today, and they'd live in our repo again, just calling a
|
||||
different worker binary.
|
||||
- **No browse cache, no deploy gating, no `WatchDeployEvents`** — we keep
|
||||
running our own ZB queries and our own `time_of_last_deploy` poll, or we
|
||||
port gw's cache code into the driver. Either way it's duplicated logic.
|
||||
- **No auth, no dashboard, no metrics.** Operability stays where it is today
|
||||
(i.e., minimal). Adding it ourselves is a separate project.
|
||||
- **Multiple OtOpcUa instances multiply MXAccess sessions.** Redundancy pair
|
||||
→ two MXAccess clients on the Galaxy from the same software, vs. Option 1
|
||||
where one gw arbitrates.
|
||||
- **Worker protocol coupling without the contract surface.** We depend on
|
||||
mxaccessgw's worker IPC frame format — a surface that mxaccessgw treats as
|
||||
*internal* to its own gw↔worker boundary. If they refactor it, we have to
|
||||
follow. The public gRPC contract (Option 1) is more stable by design.
|
||||
- **Loses the "common MXAccess access point" benefit.** Other consumers
|
||||
(CLI, integration harnesses, future tools) can't share state with our
|
||||
embedded worker.
|
||||
|
||||
---
|
||||
|
||||
## Status quo (for comparison)
|
||||
|
||||
Keep `Galaxy.Host` as today, and in-place rip out historian + alarming +
|
||||
probe manager. End state: the Host shrinks to `MxAccessClient` + `GalaxyRepository`,
|
||||
which is roughly what Option 2 ends up looking like — but with our hand-rolled
|
||||
COM bridge instead of mxaccessgw's worker. Not a serious option once
|
||||
mxaccessgw exists; we'd be maintaining a parallel implementation of the same
|
||||
thing.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation (effort-agnostic)
|
||||
|
||||
**Go with Option 1 — Tier-A driver against the MxAccess Gateway.**
|
||||
|
||||
The decisive arguments:
|
||||
|
||||
1. **It's the only option that aligns Galaxy with how every other driver in
|
||||
this repo is structured.** The user's stated goal — "keep lmx to data +
|
||||
browsing, similar to other drivers" — only fully resolves if there is no
|
||||
`.Host` and no x86 build artifact in this repo at all. Option 2 still has
|
||||
an x86 child process and supervisor code; it's `Galaxy.Host` with a
|
||||
different worker binary inside.
|
||||
|
||||
2. **It separates *where MXAccess runs* from *where OtOpcUa runs*.** That is
|
||||
a strategically larger win than a few hundred microseconds of per-call
|
||||
latency. The OPC UA server stops being chained to AVEVA install footprint,
|
||||
bitness, and Wonderware client identity — which removes a class of
|
||||
deployment, redundancy, and CI problems we hit today (e.g., the
|
||||
`DESKTOP-6JL3KKO` Hyper-V/Docker conflict, the `dohertj2`-only pipe ACL,
|
||||
the live-Galaxy smoke test prerequisites).
|
||||
|
||||
3. **It collapses scope.** A non-trivial fraction of `Galaxy.Host` (browse
|
||||
cache, deploy-event watch, worker supervision, COM bridge, post-mortem,
|
||||
recycle, ACL hardening) is reproduced *better* in mxaccessgw. Option 1
|
||||
deletes our copy. Option 2 keeps it.
|
||||
|
||||
4. **It positions historian and alarming for the right home.** Once the
|
||||
Galaxy driver is "just another driver", historian becomes a server-level
|
||||
data source (one that can also feed Modbus/S7 history if we ever want it),
|
||||
and alarming becomes a server-level A&E subsystem. Option 2 nominally
|
||||
allows the same move, but the temptation to keep them in `Galaxy.Host`
|
||||
"while we're already there" is real.
|
||||
|
||||
5. **It future-proofs against AVEVA's roadmap.** Managed NMX, ASB, or any
|
||||
replacement that shows up over the next few years gets adopted in
|
||||
mxaccessgw without a release in this repo.
|
||||
|
||||
The case for Option 2 is real but narrow: it's the right call **only** if we
|
||||
commit to single-box deployments forever, refuse to take a gRPC dependency,
|
||||
and value local-trust simplicity over the consolidation/operability benefits
|
||||
gw provides. None of those constraints hold here.
|
||||
|
||||
### What flips the recommendation
|
||||
|
||||
- If the gw protocol is unstable or perf-tested under our subscription
|
||||
patterns turns out worse than expected → revisit Option 2.
|
||||
- If org-policy forbids running an MXAccess gateway as its own service →
|
||||
Option 2.
|
||||
- If Galaxy goes from one of several drivers to *the* primary driver and
|
||||
raw call-rate matters more than architectural fit → revisit.
|
||||
|
||||
Otherwise: Option 1.
|
||||
|
||||
---
|
||||
|
||||
## Out-of-scope follow-ups (don't decide here, but flag them)
|
||||
|
||||
- **Where does the Wonderware Historian SDK live?** Likely its own
|
||||
.NET 4.8 x86 sidecar exposing a small `IHistorianDataSource` over a pipe or
|
||||
gRPC, plugged into the OPC UA server's HA service alongside any future
|
||||
historian sources. Independent of which option above is chosen.
|
||||
- **Alarm subsystem ownership.** Decide whether the server hosts a generic
|
||||
AlarmCondition state machine driven by driver-advertised alarm metadata, or
|
||||
whether each driver continues to emit pre-shaped alarm transitions. Galaxy's
|
||||
4-attr quartet is a strong forcing function for the generic approach.
|
||||
- **Redundancy + gw sessions.** Standby OtOpcUa holds an open gw session
|
||||
(warm) vs. opens on takeover (cold). Affects gw worker count and Galaxy
|
||||
client-identity collisions.
|
||||
- **Auth between OtOpcUa and gw.** API key in DPAPI-protected secret file vs.
|
||||
Windows-auth gRPC. Both supported by gw; pick before rollout.
|
||||
486
lmx_mxgw.md
486
lmx_mxgw.md
@@ -1,486 +0,0 @@
|
||||
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw migration design.**
|
||||
>
|
||||
> This document is the design doc that drove the migration from the
|
||||
> legacy out-of-process Galaxy.Host topology to the in-process
|
||||
> GalaxyDriver + mxaccessgw architecture. Option 1 (the in-process
|
||||
> driver path) was selected and implemented across 39 PRs spanning
|
||||
> phases 0–7, merged to master at commit `ae7106d`. For current
|
||||
> architecture see `CLAUDE.md`, `docs/drivers/Galaxy.md`, and
|
||||
> `docs/v2/Galaxy.Performance.md`.
|
||||
|
||||
# Galaxy → MxAccessGateway Migration Plan
|
||||
|
||||
Implements **Option 1** from `lmx_backend.md`: replace the bespoke `Galaxy.Host`
|
||||
+ `Galaxy.Proxy` IPC pair with an **in-process Tier-A** `Driver.Galaxy` running
|
||||
in the .NET 10 OtOpcUa server, talking to a separately-deployed
|
||||
`MxGateway.Server` (mxaccessgw repo) over gRPC for live MXAccess work and
|
||||
Galaxy Repository browse.
|
||||
|
||||
## Outcome
|
||||
|
||||
After this work:
|
||||
|
||||
- `OtOpcUa.Server` is fully .NET 10 x64 — no x86 build artifacts in this repo.
|
||||
- `Driver.Galaxy.Host` (Windows service, NSSM-wrapped, .NET 4.8 x86) is
|
||||
retired. `Driver.Galaxy.Proxy` and `Driver.Galaxy.Shared` are deleted.
|
||||
AVEVA platform is no longer required on the OtOpcUa box.
|
||||
- A new in-process `Driver.Galaxy` lives next to `Driver.Modbus`,
|
||||
`Driver.OpcUaClient`, etc. It implements the same `IDriver` capability set
|
||||
the proxy implements today, but its body calls `MxGateway.Client`
|
||||
(`MxGatewayClient`, `MxGatewaySession`, `GalaxyRepositoryClient`).
|
||||
- Wonderware Historian SDK access moves out of the Galaxy driver into a
|
||||
driver-agnostic historian data source (`Driver.Historian.Wonderware`,
|
||||
separate sidecar, .NET 4.8 x86). The OPC UA HA service plugs into it the
|
||||
same way it would plug into any future historian.
|
||||
- Alarm condition tracking moves out of the driver into the OPC UA server's
|
||||
generic A&E subsystem. The driver only flags `IsAlarm=true` on attribute
|
||||
metadata and forwards live `.InAlarm`/`.Acked`/etc value changes; the
|
||||
server runs the AlarmCondition state machine.
|
||||
- Per-platform `ScanState` probes degrade to plain attribute subscriptions —
|
||||
no special probe manager.
|
||||
|
||||
---
|
||||
|
||||
## Pre-flight: improvements to land in mxaccessgw first
|
||||
|
||||
These are **integration-quality changes** in the mxaccessgw repo that make
|
||||
the OtOpcUa side dramatically simpler / faster / more robust. They aren't
|
||||
strictly required to start, but ship enough of them before phase 3 that we're
|
||||
not designing around gaps.
|
||||
|
||||
### gw-1. Galaxy attribute metadata parity
|
||||
|
||||
**What's there:** `galaxy_repository.v1.DiscoverHierarchy` returns
|
||||
`GalaxyObject` with name, parent, category, and dynamic attributes.
|
||||
|
||||
**What's missing for OtOpcUa:** every field today's `MxAccessGalaxyBackend`
|
||||
copies into `GalaxyAttributeInfo` — confirm gw's `Attribute` proto carries:
|
||||
- `mx_data_type` (int)
|
||||
- `is_array` (bool)
|
||||
- `array_dimension` (uint, optional)
|
||||
- `security_classification` (int)
|
||||
- `is_historized` (bool, from `HistorizedExtension` primitive)
|
||||
- `is_alarm` (bool, from `AlarmExtension` primitive)
|
||||
|
||||
If any are missing, add them to the proto and the server-side query mapper.
|
||||
Without `IsAlarm` and `IsHistorized` the OPC UA server can't decide which
|
||||
nodes get HasHistoricalConfiguration / which become AlarmConditions.
|
||||
|
||||
### gw-2. Stable, documented event-stream resume semantics
|
||||
|
||||
**What's needed:** the OtOpcUa driver must survive a transient gw transport
|
||||
drop without losing subscription state or duplicating change events. gw's
|
||||
`StreamEventsAsync(afterWorkerSequence)` already exposes resumption.
|
||||
Document the per-session retention window (how long does the worker buffer
|
||||
events the gateway hasn't acked?) and the "events were dropped, you must
|
||||
re-subscribe" signal. If retention is bounded by count rather than time,
|
||||
expose the bound in `OpenSessionReply` so the client can size its own buffer.
|
||||
|
||||
### gw-3. Reconnectable sessions
|
||||
|
||||
Listed under "post-v1 revisit" in `gateway.md`. Without it, every gw or
|
||||
OtOpcUa restart re-`Register`s, re-`AddItem`s, re-`Advise`s the entire
|
||||
address space — for a 50k-tag Galaxy that's a non-trivial cold-start. With
|
||||
reconnectable sessions, the driver presents its `SessionId` after a restart
|
||||
and the worker keeps its handles.
|
||||
|
||||
If full reconnection is too large, ship a **bulk replay** instead: a single
|
||||
RPC that takes the full subscription set and the worker performs the
|
||||
register/add/advise inside one round trip. We can drive it from a
|
||||
client-side cache rather than gw state. See gw-5 below.
|
||||
|
||||
### gw-4. Driver-shaped subscribe primitive
|
||||
|
||||
`MxGatewaySession` already has `SubscribeBulkAsync` (one RPC: `Register`
|
||||
implicit + `AddItem` + `Advise` for a list of tag addresses, returning
|
||||
per-tag `SubscribeResult`). That's exactly what `ISubscribable.SubscribeAsync`
|
||||
wants. Confirm it returns enough per-tag detail to surface a partial-failure
|
||||
list to OPC UA monitored items (good handle, status code, error text).
|
||||
|
||||
If not already, expose **`SubscribeBulk` with optional update-rate hint**
|
||||
forwarded to `SetBufferedUpdateInterval` so the OPC UA publishing interval
|
||||
becomes a single field on the subscribe call rather than a follow-up RPC.
|
||||
|
||||
### gw-5. Subscription replay snapshot
|
||||
|
||||
Provide an RPC `ReplaySubscriptionsAsync(SessionId, IEnumerable<TagAddress>)`
|
||||
that re-establishes a list of subscriptions after a session reset and returns
|
||||
per-tag results. The client stores its tag list locally (the driver already
|
||||
has it from `Discover`), and the gw worker turns it into one
|
||||
register/add/advise sequence. This is the minimum surface we need; full
|
||||
"reattach to a previous session by id" (gw-3) is a richer version of the
|
||||
same thing.
|
||||
|
||||
### gw-6. Transport-health stream
|
||||
|
||||
The gw already exposes worker / session health on its dashboard. Add a small
|
||||
streaming RPC `StreamSessionHealth(SessionId) → stream SessionHealth` so the
|
||||
OtOpcUa driver can surface "MXAccess transport up/down" to its
|
||||
`IHostConnectivityProbe` without faking it via probe-tag subscriptions.
|
||||
Today `MxAccessClient.ConnectionStateChanged` does this in-process; we want
|
||||
the same signal at the gw boundary.
|
||||
|
||||
### gw-7. Optional .NET 10 client polish
|
||||
|
||||
- Async-disposable session pattern is already there.
|
||||
- Add a **typed `MxValue` ⇄ `object` adapter** for the seven Galaxy types
|
||||
OtOpcUa cares about (Boolean, Int32, Float, Double, String, DateTime,
|
||||
arrays of the same). Today every consumer writes its own `MxValue.From<T>`
|
||||
helpers; this shaves boilerplate from the driver.
|
||||
- Add a **`SubscribeWithCallback`** convenience wrapper that combines
|
||||
`OpenSession` + `SubscribeBulk` + `StreamEvents` and routes events through
|
||||
a delegate per tag. Keeps the OPC UA driver from re-implementing the
|
||||
fan-out / sequencer pattern.
|
||||
|
||||
### gw-8. Auth minimums
|
||||
|
||||
Document API-key scoping as it applies to OtOpcUa: the server identity needs
|
||||
`session`, `invoke`, `event`, and `metadata:read` scopes. Provide a CLI to
|
||||
mint a key bound to those scopes for an OtOpcUa instance.
|
||||
|
||||
### gw-9. Performance: bulk paths and value coalescing
|
||||
|
||||
- Confirm `SubscribeBulkAsync` is implemented as a single MXAccess
|
||||
`AddItem`+`Advise` loop on the worker, not N pipe round trips. If not, fix
|
||||
before we drive 50k-tag Galaxies through it.
|
||||
- Expose `SetBufferedUpdateInterval` per session so OtOpcUa can request
|
||||
buffered updates at the OPC UA publishing interval and get one batched
|
||||
`OnBufferedDataChange` per tick rather than N `OnDataChange` events.
|
||||
|
||||
These can all ship in mxaccessgw independently and improve every consumer.
|
||||
|
||||
---
|
||||
|
||||
## OtOpcUa-side improvements to land in parallel
|
||||
|
||||
Some are forced by removing `Galaxy.Host`; others are quality-of-life.
|
||||
|
||||
### ot-1. Promote `IHistorianDataSource` to a server-level extension point
|
||||
|
||||
Today `IHistorianDataSource` is a Galaxy-internal abstraction in
|
||||
`Driver.Galaxy.Host`. Lift it to `OtOpcUa.Core.Abstractions` (or a similar
|
||||
home next to `IDriver`) and let the OPC UA HA service consume **any number
|
||||
of registered data sources** keyed by node namespace. Drivers don't own
|
||||
historian access; the server mounts data sources alongside drivers. This is
|
||||
the prerequisite that lets us move Wonderware Historian out of the Galaxy
|
||||
driver without losing the feature.
|
||||
|
||||
### ot-2. Generic alarm condition state machine in the server
|
||||
|
||||
Move the `.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` quartet handling
|
||||
out of `GalaxyAlarmTracker` into a server-level alarm subsystem keyed off the
|
||||
`IsAlarm=true` flag drivers set during discovery. The server subscribes to
|
||||
the four sub-attributes itself and runs the AlarmCondition state machine.
|
||||
Driver only:
|
||||
- declares `IsAlarm=true` in `DriverAttributeInfo`,
|
||||
- forwards plain attribute value changes (already done by `ISubscribable`).
|
||||
|
||||
This is also a precondition for future drivers (Modbus DL205 alarm bits,
|
||||
S7 alarm DBs) to emit alarms without each writing their own tracker.
|
||||
|
||||
### ot-3. Driver capabilities trim
|
||||
|
||||
After ot-1 and ot-2, `Driver.Galaxy` no longer needs to implement:
|
||||
- `IHistoryProvider` (server's HA service handles it via Wonderware
|
||||
historian data source)
|
||||
- `IAlarmHistorianWriter` (server's A&E historian, or kept generic — Galaxy
|
||||
shouldn't own the SQLite path)
|
||||
- `IAlarmSource` ack route (server-level alarm subsystem writes back via the
|
||||
driver's `IWritable.WriteAsync`, which the gw already supports)
|
||||
|
||||
Keep:
|
||||
- `IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`,
|
||||
`IRediscoverable`, `IHostConnectivityProbe`.
|
||||
|
||||
### ot-4. Treat `time_of_last_deploy` as `IRediscoverable`'s pump
|
||||
|
||||
Replace the Host-side change-detection poll with a managed
|
||||
`GalaxyRepositoryClient.WatchDeployEventsAsync` consumer in the driver.
|
||||
Each event raises `OnRediscoveryNeeded` with the new deploy time as the
|
||||
`scopeHint`. No polling code in this repo.
|
||||
|
||||
### ot-5. Connection pool at the server, not the driver
|
||||
|
||||
If the redundancy pair runs two OtOpcUa instances against one gw, both
|
||||
should share a single `GrpcChannel` per process (already gRPC default) but
|
||||
**different sessions** (one MXAccess client identity per OtOpcUa instance,
|
||||
not one shared session that fights over Wonderware client state). Encode
|
||||
the per-instance MXAccess client name in driver config — already partly
|
||||
there (`OTOPCUA_GALAXY_CLIENT_NAME`); make it explicit in the new driver's
|
||||
`appsettings.json` shape.
|
||||
|
||||
---
|
||||
|
||||
## Phased implementation
|
||||
|
||||
Each phase is a working, mergeable slice. Keep `Galaxy.Host` running
|
||||
alongside the new driver until phase 7 — gated by a config switch
|
||||
`Galaxy:Backend = legacy-host | mxgateway`.
|
||||
|
||||
### Phase 0 — pre-flight (mxaccessgw repo)
|
||||
|
||||
Ship gw-1, gw-2, gw-4, gw-9 (the parity, performance, and contract bits the
|
||||
plan immediately depends on). gw-3, gw-5, gw-6, gw-7 can come during or
|
||||
after phase 5.
|
||||
|
||||
**Exit:** local OtOpcUa dev box can `MxGatewayClient.Create` a client, open a
|
||||
session, `SubscribeBulkAsync` 100 tags, and observe `OnDataChange` events at
|
||||
the configured update rate.
|
||||
|
||||
### Phase 1 — server-level historian extension point (ot-1)
|
||||
|
||||
1. Extract `IHistorianDataSource` (and its DTOs `HistorianSample`,
|
||||
`HistorianAggregateSample`, `HistoricalEvent`) from
|
||||
`Driver.Galaxy.Host/Backend/Historian/` into
|
||||
`src/ZB.MOM.WW.OtOpcUa.Core/Abstractions/Historian/`.
|
||||
2. Extend the OPC UA HA service to look up a registered
|
||||
`IHistorianDataSource` per namespace and call into it for `HistoryRead`,
|
||||
`HistoryReadProcessed`, `HistoryReadAtTime`, `HistoryReadEvents`. Drivers
|
||||
stop implementing `IHistoryProvider` directly; the server proxies.
|
||||
3. Add a no-op default registration so drivers without history keep working.
|
||||
|
||||
**Exit:** all current Galaxy history reads route through an
|
||||
`IHistorianDataSource` registered by `Driver.Galaxy.Host` (still legacy)
|
||||
without behavior change. Other drivers untouched.
|
||||
|
||||
### Phase 2 — server-level alarm subsystem (ot-2)
|
||||
|
||||
1. Add an `IAlarmConditionDeclaration` API on the address-space builder so
|
||||
discovery can flag a node as alarm-bearing and supply the four
|
||||
sub-attribute references.
|
||||
2. Add a hosted `AlarmConditionService` in the server that, on driver
|
||||
`Discover`, subscribes to the four sub-attributes via the driver's own
|
||||
`ISubscribable`, runs the state machine, and emits
|
||||
`IAlarmSource.OnAlarmEvent` itself. Acks route back through the driver's
|
||||
`IWritable.WriteAsync` to the `.AckMsg` attribute.
|
||||
3. Add Galaxy-specific defaults (sub-attribute naming) as a small adapter
|
||||
so the same service can serve future drivers with different conventions.
|
||||
|
||||
**Exit:** Galaxy alarms still work end-to-end; the tracker code that runs
|
||||
inside `Galaxy.Host` is dead but kept for the legacy-host backend path.
|
||||
|
||||
### Phase 3 — Wonderware Historian sidecar (`Driver.Historian.Wonderware`)
|
||||
|
||||
1. New solution project: `Driver.Historian.Wonderware`, .NET 4.8 x86,
|
||||
console app + NSSM (mirrors today's Galaxy.Host packaging exactly,
|
||||
minus Galaxy responsibilities).
|
||||
2. Hosts the existing `HistorianDataSource`, `HistorianClusterEndpointPicker`,
|
||||
`HistorianHealthSnapshot` code lifted from `Galaxy.Host/Backend/Historian/`
|
||||
and exposes them over a small named-pipe protocol (or local gRPC if
|
||||
.NET 4.8 cost is acceptable; named pipe is simpler).
|
||||
3. Add `Driver.Historian.Wonderware.Client` — .NET 10 — implementing
|
||||
`IHistorianDataSource` against the sidecar.
|
||||
4. Server registers it as a data source for the `Galaxy` namespace.
|
||||
|
||||
**Exit:** OPC UA history reads work via the sidecar with the legacy-host
|
||||
backend still in place. We've decoupled history from MXAccess.
|
||||
|
||||
### Phase 4 — new `Driver.Galaxy` against gw
|
||||
|
||||
This is the meat. New project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`, .NET 10,
|
||||
in-process. Capabilities (post ot-3): `IDriver`, `ITagDiscovery`, `IReadable`,
|
||||
`IWritable`, `ISubscribable`, `IRediscoverable`, `IHostConnectivityProbe`.
|
||||
|
||||
Shape:
|
||||
|
||||
```
|
||||
Driver.Galaxy/
|
||||
GalaxyDriver.cs # IDriver root
|
||||
Browse/
|
||||
GalaxyDiscoverer.cs # consumes GalaxyRepositoryClient.DiscoverHierarchyAsync
|
||||
DataTypeMap.cs # mx_data_type → DriverDataType
|
||||
SecurityMap.cs # security_classification → SecurityClassification
|
||||
Runtime/
|
||||
GalaxyMxSession.cs # owns one MxGatewaySession; Register + map per-driver client name
|
||||
SubscriptionRegistry.cs # tag → server/item handles; persists to memory only
|
||||
EventPump.cs # consumes session.StreamEventsAsync, fans out to OnDataChange
|
||||
ReconnectSupervisor.cs # gw transport drop / session-lost recovery
|
||||
DeployWatcher.cs # GalaxyRepositoryClient.WatchDeployEventsAsync → OnRediscoveryNeeded
|
||||
Health/
|
||||
HostConnectivityForwarder.cs # gw-6 SessionHealth → IHostConnectivityProbe
|
||||
Config/
|
||||
GalaxyDriverOptions.cs # endpoint, ApiKey, ClientName, TLS, retry, intervals
|
||||
GalaxyDriverFactoryExtensions.cs # AddGalaxyDriver(IServiceCollection)
|
||||
```
|
||||
|
||||
Key behaviors:
|
||||
|
||||
- **Discovery** calls `GalaxyRepositoryClient.DiscoverHierarchyAsync()`
|
||||
once at init and on every `WatchDeployEvents` event, then drives the
|
||||
address space builder. Same node naming as today (parent contained-name
|
||||
hierarchy + leaf attributes named `tag_name.AttributeName`).
|
||||
- **Read** uses one-off `AddItem` + `Advise` + read-after-first-callback
|
||||
is overkill; instead, use **`Register` + per-call `AddItem`/`Read`** if gw
|
||||
exposes a synchronous read, otherwise short-lived advise. *Action item:*
|
||||
confirm gw's read story; if absent, request a synchronous `ReadAsync` RPC
|
||||
on top of MXAccess `Read` (which exists in the COM API).
|
||||
- **Write** maps `WriteRequest.Value` to `MxValue` via gw-7 helpers and
|
||||
calls `WriteAsync(serverHandle, itemHandle, value, userId=0)`. Routes
|
||||
`WriteSecured` (where `SecurityClassification == SecuredWrite/Verified`)
|
||||
to `WriteSecuredAsync` once exposed on `MxGatewaySession`.
|
||||
- **Subscribe** calls `SubscribeBulkAsync` once per `ISubscribable.Subscribe`
|
||||
call. Stores `(tag → itemHandle, sid)` in `SubscriptionRegistry`. The
|
||||
single `EventPump` consumes one `StreamEventsAsync` per session and fans
|
||||
out per `sid`.
|
||||
- **Unsubscribe** calls `UnsubscribeBulkAsync` and drops registry entries.
|
||||
- **Reconnect** — when the gRPC channel drops or `StreamEvents` returns,
|
||||
`ReconnectSupervisor` reopens the session and replays subscriptions via
|
||||
gw-5 `ReplaySubscriptionsAsync`. The driver flags `DriverState.Degraded`
|
||||
during recovery; the server keeps publishing last-good values with
|
||||
`Uncertain` quality.
|
||||
- **Host connectivity** — single synthesized host entry named after
|
||||
`OTOPCUA_GALAXY_CLIENT_NAME` driven by gw-6 `SessionHealth` updates
|
||||
(or, until gw-6 lands, by transport drops).
|
||||
|
||||
Wire into the server next to other Tier-A drivers in the
|
||||
`AddDrivers(...)` call site.
|
||||
|
||||
**Exit:** flipping `Galaxy:Backend` to `mxgateway` runs the OPC UA server
|
||||
end-to-end with no `Galaxy.Host` involvement. Live read, live write, live
|
||||
subscribe pass against the dev Galaxy. Historian + alarms still work via
|
||||
phases 1–3.
|
||||
|
||||
### Phase 5 — parity test matrix
|
||||
|
||||
Reuse the existing live-Galaxy integration tests; run each scenario twice:
|
||||
once with `Galaxy:Backend=legacy-host`, once with `mxgateway`. Compare:
|
||||
|
||||
- discovered hierarchy node count + names + datatypes,
|
||||
- subscribed publish rates (allow ±10% tolerance vs. legacy),
|
||||
- write success / status codes for each `SecurityClassification`,
|
||||
- alarm condition transitions (Active / Acked / Inactive) — already
|
||||
routed through phase 2's server-level subsystem,
|
||||
- history reads — phase 3 sidecar, identical results both backends,
|
||||
- reconnect behavior under gw kill, worker kill, network drop, ZB drop.
|
||||
|
||||
Document the matrix; resolve every discrepancy or explicitly accept it.
|
||||
|
||||
**Exit:** parity matrix has zero unexplained deltas. Performance budget
|
||||
agreed: e.g. ≤ 2× per-call latency vs. named-pipe baseline at the 95th
|
||||
percentile, equal or better throughput in `SubscribeBulk` setup time.
|
||||
|
||||
### Phase 6 — perf + hardening
|
||||
|
||||
- Land gw-9 buffered-update intervals.
|
||||
- Add OpenTelemetry traces from the driver around every gw call,
|
||||
correlated via `client_correlation_id`.
|
||||
- Write soak test: 50k tags subscribed, 24h, count missed events, gw
|
||||
restarts, OtOpcUa restarts.
|
||||
- Tune `MxGatewayClientOptions.MaxGrpcMessageBytes`, retry pipeline,
|
||||
call timeouts based on soak results.
|
||||
|
||||
**Exit:** production-acceptable perf numbers documented in
|
||||
`docs/drivers/Galaxy.md`.
|
||||
|
||||
### Phase 7 — retirement
|
||||
|
||||
1. Default `Galaxy:Backend = mxgateway` everywhere (sample configs,
|
||||
install scripts, e2e configs).
|
||||
2. Delete `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`,
|
||||
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy`,
|
||||
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared`, and matching tests.
|
||||
3. Remove `OtOpcUaGalaxyHost` NSSM registration from
|
||||
`scripts/install/Install-Services.ps1`. Add a registration block for the
|
||||
Wonderware historian sidecar from phase 3.
|
||||
4. Remove every x86 .NET 4.8 reference, build target, and CI step from this
|
||||
repo; remove `mxaccess_documentation.md`-driven dependencies that no
|
||||
longer apply.
|
||||
5. Update CLAUDE.md, `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`,
|
||||
`docs/Redundancy.md` to reflect the new topology.
|
||||
6. Memory housekeeping: retire `project_galaxy_host_service.md` and
|
||||
`project_galaxy_host_installed.md`; add a short note about the gw
|
||||
dependency.
|
||||
|
||||
**Exit:** `git grep -i 'Galaxy\.Host'` returns nothing in source.
|
||||
|
||||
---
|
||||
|
||||
## Configuration shape (new driver)
|
||||
|
||||
```jsonc
|
||||
"Drivers": {
|
||||
"Galaxy": {
|
||||
"Type": "Galaxy",
|
||||
"InstanceId": "galaxy-prod-1",
|
||||
"Gateway": {
|
||||
"Endpoint": "https://mxgw.aveva.local:5001",
|
||||
"ApiKeySecretRef": "galaxy:apiKey", // resolved via existing secret store
|
||||
"UseTls": true,
|
||||
"CaCertificatePath": "C:\\publish\\mxgw\\ca.crt",
|
||||
"ConnectTimeoutSeconds": 10,
|
||||
"DefaultCallTimeoutSeconds": 5,
|
||||
"StreamTimeoutSeconds": 0 // unbounded
|
||||
},
|
||||
"MxAccess": {
|
||||
"ClientName": "OtOpcUa-A", // unique per OtOpcUa instance
|
||||
"PublishingIntervalMs": 1000, // hint for SetBufferedUpdateInterval
|
||||
"WriteUserId": 0
|
||||
},
|
||||
"Repository": {
|
||||
"DiscoverPageSize": 5000,
|
||||
"WatchDeployEvents": true
|
||||
},
|
||||
"Reconnect": {
|
||||
"InitialBackoffMs": 500,
|
||||
"MaxBackoffMs": 30000,
|
||||
"ReplayOnSessionLost": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The OtOpcUa secret store already handles DPAPI-protected values for LDAP
|
||||
binds; reuse it for the gw API key. Never put the key in plaintext in the
|
||||
sample config.
|
||||
|
||||
---
|
||||
|
||||
## Risks and mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| gw protocol regression breaks production | Pin gw NuGet to a contract version range; CI runs parity matrix on every gw bump; staged rollout via `Galaxy:Backend` flag. |
|
||||
| Per-call latency regresses for chatty workloads | Land gw-9 (buffered updates) before phase 5; soak the 95p in phase 6. |
|
||||
| Reconnect storm after gw restart re-registers 50k tags | Land gw-3 or gw-5 before phase 6; client-side bulk replay throttled by `SubscribeBulkAsync` chunk size. |
|
||||
| Alarm parity gap from moving tracker server-side | Phase 2 ships before phase 4; parity matrix gates phase 7. |
|
||||
| Historian sidecar adds a second .NET 4.8 x86 service | Acceptable: it's a *driver-agnostic* component, and it ships only where Wonderware historian access is actually needed. |
|
||||
| Two OtOpcUa instances both registering as same MXAccess client | `ClientName` is per-instance config (ot-5); install scripts lint that the redundancy pair has distinct names. |
|
||||
| Cross-machine MXAccess writes traverse plaintext gRPC | Phase 0 enforces `UseTls=true` for any non-loopback `Endpoint`; CI lints the sample configs. |
|
||||
| gw API key leaked in logs | gw and `MxGatewayClient` already redact `authorization` metadata; phase 6 audit. |
|
||||
| Memory leak in `EventPump` under high event rate | Bounded channel between `StreamEventsAsync` and per-sub fan-out, drop-newest with a metric counter; soak test catches. |
|
||||
|
||||
---
|
||||
|
||||
## Cross-cutting deliverables
|
||||
|
||||
- **Docs:** `docs/drivers/Galaxy.md` (new), updates to
|
||||
`docs/v2/dev-environment.md`, `docs/ServiceHosting.md`,
|
||||
`docs/Redundancy.md`, `CLAUDE.md`.
|
||||
- **Install scripts:** `scripts/install/Install-Services.ps1` removes
|
||||
`OtOpcUaGalaxyHost`, adds `OtOpcUaWonderwareHistorian`, no Galaxy
|
||||
service registration on the OtOpcUa node.
|
||||
- **e2e:** `scripts/e2e/e2e-config.sample.json` — drop `OTOPCUA_GALAXY_*`
|
||||
pipe vars, add `Drivers:Galaxy:Gateway:Endpoint` etc.
|
||||
- **Memory:** retire stale Galaxy.Host entries; add gw dependency entry,
|
||||
redundancy + client-name guidance.
|
||||
|
||||
---
|
||||
|
||||
## Order-of-work summary
|
||||
|
||||
```
|
||||
Phase 0 (gw repo): gw-1, gw-2, gw-4, gw-9
|
||||
Phase 1 (this): ot-1 — historian extension point
|
||||
Phase 2 (this): ot-2 — alarm subsystem
|
||||
Phase 3 (this): Driver.Historian.Wonderware sidecar
|
||||
Phase 4 (this): Driver.Galaxy (new) behind backend flag
|
||||
— depends on Phase 0, 1, 2
|
||||
Phase 5 (this+gw): parity matrix
|
||||
— drives gw-3 / gw-5 / gw-6 / gw-7 if gaps surface
|
||||
Phase 6 (this): perf + hardening
|
||||
Phase 7 (this): retire Galaxy.Host / Proxy / Shared
|
||||
```
|
||||
|
||||
Phases 1–3 are independent of each other and can run in parallel. Phase 4
|
||||
needs all three plus Phase 0. Phase 5 requires Phase 4. Phases 6 and 7 are
|
||||
sequential after Phase 5.
|
||||
1062
lmx_mxgw_impl.md
1062
lmx_mxgw_impl.md
File diff suppressed because it is too large
Load Diff
@@ -1 +0,0 @@
|
||||
opc.tcp://opcuademo.sterfive.com:26543
|
||||
Reference in New Issue
Block a user