diff --git a/.gitignore b/.gitignore index 98ceb5a..ce9d607 100644 --- a/.gitignore +++ b/.gitignore @@ -37,3 +37,6 @@ src/ZB.MOM.WW.OtOpcUa.Server/config_cache.db # E2E sidecar config — NodeIds are specific to each dev's local seed (see scripts/e2e/README.md) scripts/e2e/e2e-config.json config_cache*.db + +# Client CLI/UI runtime scratch (last-connected endpoint cache) +session.dat diff --git a/CLAUDE.md b/CLAUDE.md index 6b24ce6..840f3b3 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -16,8 +16,7 @@ in this repo is .NET 10. PR 7.2 retired the legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the `OtOpcUaGalaxyHost` Windows service. -See `lmx_mxgw.md` for the migration design and -`docs/v2/Galaxy.Performance.md` for the runtime perf surface +See `docs/v2/Galaxy.Performance.md` for the runtime perf surface (tracing, metrics, soak harness). ## Architecture Overview diff --git a/_p54.json b/_p54.json deleted file mode 100644 index 8120253..0000000 --- a/_p54.json +++ /dev/null @@ -1 +0,0 @@ -{"title":"Phase 3 PR 54 -- Siemens S7 Modbus TCP quirks research doc","body":"## Summary\n\nAdds `docs/v2/s7.md` (485 lines) covering Siemens SIMATIC S7 family Modbus TCP behavior. Mirrors the `docs/v2/dl205.md` template for future per-quirk implementation PRs.\n\n## Key findings for the implementation track\n\n- **No fixed memory map** — every S7 Modbus server is user-wired via `MB_SERVER`/`MODBUSCP`/`MODBUSPN` library blocks. Driver must accept per-site config, not assume a vendor layout.\n- **MB_SERVER requires non-optimized DBs** (STATUS `0x8383` if optimized). Most common field bug.\n- **Word order default = ABCD** (opposite of DL260). Driver's S7 profile default must be `ByteOrder.BigEndian`, not `WordSwap`.\n- **One port per MB_SERVER instance** — multi-client requires parallel FBs on 503/504/… Most clients assume port 502 multiplexes (wrong on S7).\n- **CP 343-1 Lean is server-only**, requires the `2XV9450-1MB00` license.\n- **FC20/21/22/23/43 all return Illegal Function** on every S7 variant — driver must not attempt FC23 bulk-read optimization for S7.\n- **STOP-mode behavior non-deterministic** across firmware bands — treat both read/write STOP-mode responses as unavailable.\n\nTwo items flagged as unconfirmed rumour (V2.0+ float byte-order claim, STOP-mode caching location).\n\nNo code, no tests — implementation lands in PRs 56+.\n\n## Test plan\n- [x] Doc renders as markdown\n- [x] 31 citations present\n- [x] Section structure matches dl205.md template","head":"phase-3-pr54-s7-research-doc","base":"v2"} diff --git a/_p55.json b/_p55.json deleted file mode 100644 index 00c2066..0000000 --- a/_p55.json +++ /dev/null @@ -1 +0,0 @@ -{"title":"Phase 3 PR 55 -- Mitsubishi MELSEC Modbus TCP quirks research doc","body":"## Summary\n\nAdds `docs/v2/mitsubishi.md` (451 lines) covering MELSEC Q/L/iQ-R/iQ-F/FX3U Modbus TCP behavior. Mirrors `docs/v2/dl205.md` template for per-quirk implementation PRs.\n\n## Key findings for the implementation track\n\n- **Module naming trap** — `QJ71MB91` is SERIAL RTU, not TCP. TCP module is `QJ71MT91`. Surface clearly in driver docs.\n- **No canonical mapping** — per-site 'Modbus Device Assignment Parameter' block (up to 16 entries). Treat mapping as runtime config.\n- **X/Y hex vs octal depends on family** — Q/L/iQ-R use HEX (X20 = decimal 32); FX/iQ-F use OCTAL (X20 = decimal 16). Helper must take a family selector.\n- **Word order CDAB default** across all MELSEC families (opposite of Siemens S7). Driver Mitsubishi profile default: `ByteOrder.WordSwap`.\n- **D-registers binary by default** (opposite of DL205's BCD default). Caller opts in to `Bcd16`/`Bcd32` when ladder uses BCD.\n- **FX5U needs firmware ≥ 1.060** for Modbus TCP server — older is client-only.\n- **FX3U-ENET vs FX3U-ENET-P502 vs FX3U-ENET-ADP** — only the middle one binds port 502; the last has no Modbus at all. Common operator mis-purchase.\n- **QJ71MT91 does NOT support FC22 / FC23** — iQ-R / iQ-F do. Bulk-read optimization must gate on capability.\n- **STOP-mode writes configurable** on Q/L/iQ-R/iQ-F (default accept), always rejected on FX3U-ENET.\n\nThree unconfirmed rumours flagged separately.\n\nNo code, no tests — implementation lands in PRs 58+.\n\n## Test plan\n- [x] Doc renders as markdown\n- [x] 17 citations present\n- [x] Per-model test naming matrix included (`Mitsubishi_QJ71MT91_*`, `Mitsubishi_FX5U_*`, `Mitsubishi_FX3U_ENET_*`, shared `Mitsubishi_Common_*`)","head":"phase-3-pr55-mitsubishi-research-doc","base":"v2"} diff --git a/docs/v1/README.md b/docs/v1/README.md index 069ef9d..00ea7d6 100644 --- a/docs/v1/README.md +++ b/docs/v1/README.md @@ -15,7 +15,6 @@ For current architecture see: - `docs/drivers/Galaxy.md` — current Galaxy driver doc - `docs/v2/Galaxy.ParityRig.md` — current testing setup - `docs/v2/Galaxy.Performance.md` — observability + perf -- `lmx_mxgw.md` (in repo root) — design rationale for the migration | File | What it covered | |---|---| diff --git a/lmx_backend.md b/lmx_backend.md deleted file mode 100644 index e33cba1..0000000 --- a/lmx_backend.md +++ /dev/null @@ -1,282 +0,0 @@ -> **✅ Completed 2026-04-30 — historical record of the v2-mxgw backend-options decision.** -> -> This document evaluated alternative backend topologies before the -> v2-mxgw migration. **Option 1 (in-process driver + gRPC gateway) was -> selected and implemented**; see `lmx_mxgw.md` for the design and -> `lmx_mxgw_impl.md` for the implementation plan. Both shipped at -> commit `ae7106d` (2026-04-30). Preserved here as the audit trail. - -# Galaxy / LMX Backend — Restructuring Options - -## Context - -Today the Galaxy driver is structured very differently from every other driver -in this repo: - -- **Galaxy.Proxy** (.NET 10, in-process): tiny shim that frames IPC to the host. -- **Galaxy.Host** (.NET Framework 4.8 **x86**, NSSM-wrapped Windows service): - owns MXAccess COM, the STA pump, the ZB Galaxy Repository SQL queries, the - Wonderware Historian SDK plugin, the per-platform `ScanState` probe manager, - the alarm tracker (`.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` state - machine + ack writer), recycle policy, and post-mortem MMF. - -Other drivers (Modbus, S7, AB CIP, OpcUaClient, TwinCAT, FOCAS Tier-C) are -**in-process Tier-A drivers** in the .NET 10 server. They do data + browse -only; historian and alarming are driver-agnostic concerns at the server layer. - -A sibling project, **mxaccessgw** -(`C:\Users\dohertj2\Desktop\mxaccessgw`), already provides: - -- A .NET 10 x64 gRPC gateway in front of per-session .NET 4.8 x86 worker - processes that own MXAccess COM, the STA, and event sinks - (`MxGateway.Server` + `MxGateway.Worker`). -- A full MXAccess command + event surface (`Register`, `AddItem`, `Advise`, - `Write`, `WriteSecured`, `OnDataChange`, `OnWriteComplete`, etc.). -- A cached, deploy-gated, paged **Galaxy Repository browse** RPC - (`galaxy_repository.v1`) reading the same ZB tables we read today, with the - query bodies kept byte-identical to OtOpcUa. -- A .NET client library (`clients/dotnet/MxGateway.Client`). -- API-key auth, Blazor dashboard, structured logs, metrics, watchdog/recycle. - -The proposal is to **strip Galaxy down to data + browse** — push historian and -alarming out to server-level subsystems where they live for every other driver -— and pick how the slimmed-down driver talks to MXAccess. - ---- - -## What "push historian and alarming out" means - -Both options below assume the same scope reduction; they only differ in how -the driver reaches MXAccess. - -| Concern | Today (Galaxy.Host) | After | -|---|---|---| -| Galaxy hierarchy browse | `GalaxyRepository` (SQL) inside Host | Driver (Option 1: via gw browse RPC; Option 2: own SQL or worker) | -| Live read / write / subscribe | `MxAccessClient` + STA pump in Host | gw (Option 1) or embedded worker (Option 2) | -| Wonderware Historian SDK | `HistorianDataSource` in Host (x86) | Separate Historian data source plugged into the server's HA service. Likely stays its own .NET 4.8 x86 sidecar because the SDK is x86-only; **independent of the Galaxy driver lifecycle**. | -| Alarm state machine (`.InAlarm`/`.Acked` quartet, transitions, ack writer) | `GalaxyAlarmTracker` in Host | Server-level A&E subsystem subscribes to alarm-bearing attributes the driver advertises and runs the AlarmCondition state machine generically. Driver only flags `IsAlarm=true` in node metadata. | -| `ScanState` per-platform probes | `GalaxyRuntimeProbeManager` in Host | Driver-side: ScanState is just another tag subscription; the driver re-advises one per discovered `$WinPlatform`/`$AppEngine` and reports `HostConnectivityStatus` from the value stream. No special host-side machinery. | - -After the strip-down, the Galaxy driver looks like Modbus or OpcUaClient: it -discovers nodes, reads/writes/subscribes, and reports per-host transport -health. Everything else is the server's problem. - ---- - -## Option 1 — Tier-A driver against the MxAccess Gateway - -`Driver.Galaxy` becomes a regular **in-process .NET 10 driver** in the OtOpcUa -server (no `.Host`, no `.Proxy` split, no x86). It talks to a separately -deployed `MxGateway.Server` over gRPC using `MxGateway.Client`. Browse comes -from `galaxy_repository.v1.DiscoverHierarchy`. Live data comes from -`MxAccessGateway.OpenSession`/`AddItem`/`Advise`/`StreamEvents`. - -``` -OtOpcUa.Server (.NET 10 x64) - └── Driver.Galaxy (in-proc, .NET 10) - └── gRPC ──► MxGateway.Server (.NET 10 x64) - └── pipe ──► MxGateway.Worker (.NET 4.8 x86) - └── MXAccess COM (STA) -``` - -### Pros - -- **Architectural parity with other drivers.** No bespoke `Host` service, no - x86 build target, no NSSM wrapper, no STA pump in this repo, no - `PostMortemMmf`/`RecyclePolicy` we maintain ourselves. -- **OtOpcUa server stops needing AVEVA installed on its own host.** The - gateway runs where MXAccess lives; the OPC UA server can live on a different - box, in a container, or on a hardened jump host. -- **One canonical MXAccess surface across the org.** Any future tool — a - diagnostic CLI, a Historian replacement, an integration harness — talks to - the same gw with the same parity guarantees we get. -- **Multi-instance friendly.** Two OtOpcUa servers (warm/hot redundancy) share - one gw and one MXAccess footprint instead of each running their own - `Galaxy.Host` with duplicate Wonderware client identities. -- **Browse + cache for free.** `galaxy_repository.v1` already implements the - hierarchy cache, deploy-time gating, paging, and `WatchDeployEvents` — we - delete `GalaxyRepository.cs`, `GalaxyHierarchyRow.cs`, the change-detection - poll loop, and the matching SQL plumbing. -- **Operability for free.** API-key auth, Blazor dashboard at `/dashboard`, - metrics via `Meter`, structured logs with redaction. We currently have - none of that in `Galaxy.Host`. -- **Future backend swap.** When AVEVA exposes managed NMX or another modern - path, gw routes to it without OtOpcUa changes (gw's stated roadmap). -- **Tighter blast radius.** A hung COM event, a leaking COM object, a - crashing worker — all owned by gw's session/worker isolation, not the - OPC UA server process. -- **Simpler version story for OtOpcUa.** Driver is plain .NET 10; the - bitness/runtime split lives entirely in mxaccessgw's repo. - -### Cons - -- **Extra deployment dependency.** mxaccessgw is now a service that has to be - installed, monitored, and kept on a compatible protocol version. For a - single-box install this is one more moving piece. -- **Two hops on every call** (driver→gw, gw→worker) instead of one - (proxy→host). Today's hop is MessagePack over a named pipe; the new outer - hop is gRPC over TCP. Per-call overhead is a few hundred microseconds, not - a regression for OPC UA workloads but measurable for very chatty bursts. -- **Auth/secret surface added.** OtOpcUa now holds an API key for gw and - rotates it; gw's SQLite-backed key store has to be managed. -- **Failure model spans two processes we don't own** — gw + worker. Reconnect - logic in our driver has to ride both: gw transport drop, gw session lease - expiry, gw-detected worker crash, plus the worker's own MXAccess reconnect. - All of it is exposed in the gRPC contract, but it's still surface area. -- **Cross-repo protocol coupling.** Bumping `mxaccessgw` major version (gRPC - contract changes, session shape changes) ripples into OtOpcUa releases. - Mitigated by versioned contracts; not free. -- **Galaxy redundancy still has to think about gw.** A redundancy fail-over of - OtOpcUa is independent of the gw's session lifecycle. Need to decide whether - the standby holds an open session or only opens it on takeover. -- **Sensitive writes (`WriteSecured`, `AuthenticateUser`) cross the network** - if gw is remote. TLS + mTLS solves it but adds setup. - ---- - -## Option 2 — Embed mxaccessgw worker, no gateway - -`Driver.Galaxy` is still in-process .NET 10, but instead of speaking gRPC to a -gateway service, it directly **launches and supervises one (or more) -`MxGateway.Worker` processes** and talks to them over the same named-pipe -worker protocol gw uses internally -(`docs/WorkerFrameProtocol.md`, `docs/WorkerProcessLauncher.md`). Browse stays -local — driver runs the SQL queries against ZB itself. - -``` -OtOpcUa.Server (.NET 10 x64) - └── Driver.Galaxy (in-proc, .NET 10) - ├── ZB SQL (local, in-proc) - └── pipe ──► MxGateway.Worker (.NET 4.8 x86, child process) - └── MXAccess COM (STA) -``` - -### Pros - -- **One hop, not two.** Driver → worker pipe is the same shape as today's - Proxy → Host pipe. Latency is on par with the current implementation. -- **No new service to deploy.** Worker is launched as a child process the - same way `Galaxy.Host` is launched today (just with mxaccessgw's worker - binary). Single-machine install story stays simple. -- **Keeps the trust boundary local.** No API keys, no TLS, no exposed gRPC - port on the OtOpcUa box. -- **Reuses mxaccessgw's parity-tested worker code** — STA pump, COM lifetime, - event conversion, fault model — without inheriting gw's ASP.NET Core / - Blazor / SQLite footprint. -- **Tighter ownership.** OtOpcUa owns the worker lifecycle; recycle, kill, - restart, post-mortem all decided by the driver, not by an external service - we don't control. -- **Easier to reason about during integration tests.** No second service to - spin up in CI; just a child process per test fixture. - -### Cons - -- **OtOpcUa server box must still have AVEVA + MXAccess installed**, since - the worker runs locally. The major deployment win of Option 1 - (separating where MXAccess runs from where OtOpcUa runs) is lost. -- **OtOpcUa still ships an x86 .NET 4.8 binary alongside it.** Even if we - vendor mxaccessgw's worker rather than write our own, installer complexity - and bitness considerations remain. -- **We re-implement everything gw already gives.** Process supervision, - watchdog, recycle policy, heartbeat, post-mortem — these are exactly what - `Galaxy.Host` does today, and they'd live in our repo again, just calling a - different worker binary. -- **No browse cache, no deploy gating, no `WatchDeployEvents`** — we keep - running our own ZB queries and our own `time_of_last_deploy` poll, or we - port gw's cache code into the driver. Either way it's duplicated logic. -- **No auth, no dashboard, no metrics.** Operability stays where it is today - (i.e., minimal). Adding it ourselves is a separate project. -- **Multiple OtOpcUa instances multiply MXAccess sessions.** Redundancy pair - → two MXAccess clients on the Galaxy from the same software, vs. Option 1 - where one gw arbitrates. -- **Worker protocol coupling without the contract surface.** We depend on - mxaccessgw's worker IPC frame format — a surface that mxaccessgw treats as - *internal* to its own gw↔worker boundary. If they refactor it, we have to - follow. The public gRPC contract (Option 1) is more stable by design. -- **Loses the "common MXAccess access point" benefit.** Other consumers - (CLI, integration harnesses, future tools) can't share state with our - embedded worker. - ---- - -## Status quo (for comparison) - -Keep `Galaxy.Host` as today, and in-place rip out historian + alarming + -probe manager. End state: the Host shrinks to `MxAccessClient` + `GalaxyRepository`, -which is roughly what Option 2 ends up looking like — but with our hand-rolled -COM bridge instead of mxaccessgw's worker. Not a serious option once -mxaccessgw exists; we'd be maintaining a parallel implementation of the same -thing. - ---- - -## Recommendation (effort-agnostic) - -**Go with Option 1 — Tier-A driver against the MxAccess Gateway.** - -The decisive arguments: - -1. **It's the only option that aligns Galaxy with how every other driver in - this repo is structured.** The user's stated goal — "keep lmx to data + - browsing, similar to other drivers" — only fully resolves if there is no - `.Host` and no x86 build artifact in this repo at all. Option 2 still has - an x86 child process and supervisor code; it's `Galaxy.Host` with a - different worker binary inside. - -2. **It separates *where MXAccess runs* from *where OtOpcUa runs*.** That is - a strategically larger win than a few hundred microseconds of per-call - latency. The OPC UA server stops being chained to AVEVA install footprint, - bitness, and Wonderware client identity — which removes a class of - deployment, redundancy, and CI problems we hit today (e.g., the - `DESKTOP-6JL3KKO` Hyper-V/Docker conflict, the `dohertj2`-only pipe ACL, - the live-Galaxy smoke test prerequisites). - -3. **It collapses scope.** A non-trivial fraction of `Galaxy.Host` (browse - cache, deploy-event watch, worker supervision, COM bridge, post-mortem, - recycle, ACL hardening) is reproduced *better* in mxaccessgw. Option 1 - deletes our copy. Option 2 keeps it. - -4. **It positions historian and alarming for the right home.** Once the - Galaxy driver is "just another driver", historian becomes a server-level - data source (one that can also feed Modbus/S7 history if we ever want it), - and alarming becomes a server-level A&E subsystem. Option 2 nominally - allows the same move, but the temptation to keep them in `Galaxy.Host` - "while we're already there" is real. - -5. **It future-proofs against AVEVA's roadmap.** Managed NMX, ASB, or any - replacement that shows up over the next few years gets adopted in - mxaccessgw without a release in this repo. - -The case for Option 2 is real but narrow: it's the right call **only** if we -commit to single-box deployments forever, refuse to take a gRPC dependency, -and value local-trust simplicity over the consolidation/operability benefits -gw provides. None of those constraints hold here. - -### What flips the recommendation - -- If the gw protocol is unstable or perf-tested under our subscription - patterns turns out worse than expected → revisit Option 2. -- If org-policy forbids running an MXAccess gateway as its own service → - Option 2. -- If Galaxy goes from one of several drivers to *the* primary driver and - raw call-rate matters more than architectural fit → revisit. - -Otherwise: Option 1. - ---- - -## Out-of-scope follow-ups (don't decide here, but flag them) - -- **Where does the Wonderware Historian SDK live?** Likely its own - .NET 4.8 x86 sidecar exposing a small `IHistorianDataSource` over a pipe or - gRPC, plugged into the OPC UA server's HA service alongside any future - historian sources. Independent of which option above is chosen. -- **Alarm subsystem ownership.** Decide whether the server hosts a generic - AlarmCondition state machine driven by driver-advertised alarm metadata, or - whether each driver continues to emit pre-shaped alarm transitions. Galaxy's - 4-attr quartet is a strong forcing function for the generic approach. -- **Redundancy + gw sessions.** Standby OtOpcUa holds an open gw session - (warm) vs. opens on takeover (cold). Affects gw worker count and Galaxy - client-identity collisions. -- **Auth between OtOpcUa and gw.** API key in DPAPI-protected secret file vs. - Windows-auth gRPC. Both supported by gw; pick before rollout. diff --git a/lmx_mxgw.md b/lmx_mxgw.md deleted file mode 100644 index bdb4457..0000000 --- a/lmx_mxgw.md +++ /dev/null @@ -1,486 +0,0 @@ -> **✅ Completed 2026-04-30 — historical record of the v2-mxgw migration design.** -> -> This document is the design doc that drove the migration from the -> legacy out-of-process Galaxy.Host topology to the in-process -> GalaxyDriver + mxaccessgw architecture. Option 1 (the in-process -> driver path) was selected and implemented across 39 PRs spanning -> phases 0–7, merged to master at commit `ae7106d`. For current -> architecture see `CLAUDE.md`, `docs/drivers/Galaxy.md`, and -> `docs/v2/Galaxy.Performance.md`. - -# Galaxy → MxAccessGateway Migration Plan - -Implements **Option 1** from `lmx_backend.md`: replace the bespoke `Galaxy.Host` -+ `Galaxy.Proxy` IPC pair with an **in-process Tier-A** `Driver.Galaxy` running -in the .NET 10 OtOpcUa server, talking to a separately-deployed -`MxGateway.Server` (mxaccessgw repo) over gRPC for live MXAccess work and -Galaxy Repository browse. - -## Outcome - -After this work: - -- `OtOpcUa.Server` is fully .NET 10 x64 — no x86 build artifacts in this repo. -- `Driver.Galaxy.Host` (Windows service, NSSM-wrapped, .NET 4.8 x86) is - retired. `Driver.Galaxy.Proxy` and `Driver.Galaxy.Shared` are deleted. - AVEVA platform is no longer required on the OtOpcUa box. -- A new in-process `Driver.Galaxy` lives next to `Driver.Modbus`, - `Driver.OpcUaClient`, etc. It implements the same `IDriver` capability set - the proxy implements today, but its body calls `MxGateway.Client` - (`MxGatewayClient`, `MxGatewaySession`, `GalaxyRepositoryClient`). -- Wonderware Historian SDK access moves out of the Galaxy driver into a - driver-agnostic historian data source (`Driver.Historian.Wonderware`, - separate sidecar, .NET 4.8 x86). The OPC UA HA service plugs into it the - same way it would plug into any future historian. -- Alarm condition tracking moves out of the driver into the OPC UA server's - generic A&E subsystem. The driver only flags `IsAlarm=true` on attribute - metadata and forwards live `.InAlarm`/`.Acked`/etc value changes; the - server runs the AlarmCondition state machine. -- Per-platform `ScanState` probes degrade to plain attribute subscriptions — - no special probe manager. - ---- - -## Pre-flight: improvements to land in mxaccessgw first - -These are **integration-quality changes** in the mxaccessgw repo that make -the OtOpcUa side dramatically simpler / faster / more robust. They aren't -strictly required to start, but ship enough of them before phase 3 that we're -not designing around gaps. - -### gw-1. Galaxy attribute metadata parity - -**What's there:** `galaxy_repository.v1.DiscoverHierarchy` returns -`GalaxyObject` with name, parent, category, and dynamic attributes. - -**What's missing for OtOpcUa:** every field today's `MxAccessGalaxyBackend` -copies into `GalaxyAttributeInfo` — confirm gw's `Attribute` proto carries: -- `mx_data_type` (int) -- `is_array` (bool) -- `array_dimension` (uint, optional) -- `security_classification` (int) -- `is_historized` (bool, from `HistorizedExtension` primitive) -- `is_alarm` (bool, from `AlarmExtension` primitive) - -If any are missing, add them to the proto and the server-side query mapper. -Without `IsAlarm` and `IsHistorized` the OPC UA server can't decide which -nodes get HasHistoricalConfiguration / which become AlarmConditions. - -### gw-2. Stable, documented event-stream resume semantics - -**What's needed:** the OtOpcUa driver must survive a transient gw transport -drop without losing subscription state or duplicating change events. gw's -`StreamEventsAsync(afterWorkerSequence)` already exposes resumption. -Document the per-session retention window (how long does the worker buffer -events the gateway hasn't acked?) and the "events were dropped, you must -re-subscribe" signal. If retention is bounded by count rather than time, -expose the bound in `OpenSessionReply` so the client can size its own buffer. - -### gw-3. Reconnectable sessions - -Listed under "post-v1 revisit" in `gateway.md`. Without it, every gw or -OtOpcUa restart re-`Register`s, re-`AddItem`s, re-`Advise`s the entire -address space — for a 50k-tag Galaxy that's a non-trivial cold-start. With -reconnectable sessions, the driver presents its `SessionId` after a restart -and the worker keeps its handles. - -If full reconnection is too large, ship a **bulk replay** instead: a single -RPC that takes the full subscription set and the worker performs the -register/add/advise inside one round trip. We can drive it from a -client-side cache rather than gw state. See gw-5 below. - -### gw-4. Driver-shaped subscribe primitive - -`MxGatewaySession` already has `SubscribeBulkAsync` (one RPC: `Register` -implicit + `AddItem` + `Advise` for a list of tag addresses, returning -per-tag `SubscribeResult`). That's exactly what `ISubscribable.SubscribeAsync` -wants. Confirm it returns enough per-tag detail to surface a partial-failure -list to OPC UA monitored items (good handle, status code, error text). - -If not already, expose **`SubscribeBulk` with optional update-rate hint** -forwarded to `SetBufferedUpdateInterval` so the OPC UA publishing interval -becomes a single field on the subscribe call rather than a follow-up RPC. - -### gw-5. Subscription replay snapshot - -Provide an RPC `ReplaySubscriptionsAsync(SessionId, IEnumerable)` -that re-establishes a list of subscriptions after a session reset and returns -per-tag results. The client stores its tag list locally (the driver already -has it from `Discover`), and the gw worker turns it into one -register/add/advise sequence. This is the minimum surface we need; full -"reattach to a previous session by id" (gw-3) is a richer version of the -same thing. - -### gw-6. Transport-health stream - -The gw already exposes worker / session health on its dashboard. Add a small -streaming RPC `StreamSessionHealth(SessionId) → stream SessionHealth` so the -OtOpcUa driver can surface "MXAccess transport up/down" to its -`IHostConnectivityProbe` without faking it via probe-tag subscriptions. -Today `MxAccessClient.ConnectionStateChanged` does this in-process; we want -the same signal at the gw boundary. - -### gw-7. Optional .NET 10 client polish - -- Async-disposable session pattern is already there. -- Add a **typed `MxValue` ⇄ `object` adapter** for the seven Galaxy types - OtOpcUa cares about (Boolean, Int32, Float, Double, String, DateTime, - arrays of the same). Today every consumer writes its own `MxValue.From` - helpers; this shaves boilerplate from the driver. -- Add a **`SubscribeWithCallback`** convenience wrapper that combines - `OpenSession` + `SubscribeBulk` + `StreamEvents` and routes events through - a delegate per tag. Keeps the OPC UA driver from re-implementing the - fan-out / sequencer pattern. - -### gw-8. Auth minimums - -Document API-key scoping as it applies to OtOpcUa: the server identity needs -`session`, `invoke`, `event`, and `metadata:read` scopes. Provide a CLI to -mint a key bound to those scopes for an OtOpcUa instance. - -### gw-9. Performance: bulk paths and value coalescing - -- Confirm `SubscribeBulkAsync` is implemented as a single MXAccess - `AddItem`+`Advise` loop on the worker, not N pipe round trips. If not, fix - before we drive 50k-tag Galaxies through it. -- Expose `SetBufferedUpdateInterval` per session so OtOpcUa can request - buffered updates at the OPC UA publishing interval and get one batched - `OnBufferedDataChange` per tick rather than N `OnDataChange` events. - -These can all ship in mxaccessgw independently and improve every consumer. - ---- - -## OtOpcUa-side improvements to land in parallel - -Some are forced by removing `Galaxy.Host`; others are quality-of-life. - -### ot-1. Promote `IHistorianDataSource` to a server-level extension point - -Today `IHistorianDataSource` is a Galaxy-internal abstraction in -`Driver.Galaxy.Host`. Lift it to `OtOpcUa.Core.Abstractions` (or a similar -home next to `IDriver`) and let the OPC UA HA service consume **any number -of registered data sources** keyed by node namespace. Drivers don't own -historian access; the server mounts data sources alongside drivers. This is -the prerequisite that lets us move Wonderware Historian out of the Galaxy -driver without losing the feature. - -### ot-2. Generic alarm condition state machine in the server - -Move the `.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` quartet handling -out of `GalaxyAlarmTracker` into a server-level alarm subsystem keyed off the -`IsAlarm=true` flag drivers set during discovery. The server subscribes to -the four sub-attributes itself and runs the AlarmCondition state machine. -Driver only: -- declares `IsAlarm=true` in `DriverAttributeInfo`, -- forwards plain attribute value changes (already done by `ISubscribable`). - -This is also a precondition for future drivers (Modbus DL205 alarm bits, -S7 alarm DBs) to emit alarms without each writing their own tracker. - -### ot-3. Driver capabilities trim - -After ot-1 and ot-2, `Driver.Galaxy` no longer needs to implement: -- `IHistoryProvider` (server's HA service handles it via Wonderware - historian data source) -- `IAlarmHistorianWriter` (server's A&E historian, or kept generic — Galaxy - shouldn't own the SQLite path) -- `IAlarmSource` ack route (server-level alarm subsystem writes back via the - driver's `IWritable.WriteAsync`, which the gw already supports) - -Keep: -- `IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, - `IRediscoverable`, `IHostConnectivityProbe`. - -### ot-4. Treat `time_of_last_deploy` as `IRediscoverable`'s pump - -Replace the Host-side change-detection poll with a managed -`GalaxyRepositoryClient.WatchDeployEventsAsync` consumer in the driver. -Each event raises `OnRediscoveryNeeded` with the new deploy time as the -`scopeHint`. No polling code in this repo. - -### ot-5. Connection pool at the server, not the driver - -If the redundancy pair runs two OtOpcUa instances against one gw, both -should share a single `GrpcChannel` per process (already gRPC default) but -**different sessions** (one MXAccess client identity per OtOpcUa instance, -not one shared session that fights over Wonderware client state). Encode -the per-instance MXAccess client name in driver config — already partly -there (`OTOPCUA_GALAXY_CLIENT_NAME`); make it explicit in the new driver's -`appsettings.json` shape. - ---- - -## Phased implementation - -Each phase is a working, mergeable slice. Keep `Galaxy.Host` running -alongside the new driver until phase 7 — gated by a config switch -`Galaxy:Backend = legacy-host | mxgateway`. - -### Phase 0 — pre-flight (mxaccessgw repo) - -Ship gw-1, gw-2, gw-4, gw-9 (the parity, performance, and contract bits the -plan immediately depends on). gw-3, gw-5, gw-6, gw-7 can come during or -after phase 5. - -**Exit:** local OtOpcUa dev box can `MxGatewayClient.Create` a client, open a -session, `SubscribeBulkAsync` 100 tags, and observe `OnDataChange` events at -the configured update rate. - -### Phase 1 — server-level historian extension point (ot-1) - -1. Extract `IHistorianDataSource` (and its DTOs `HistorianSample`, - `HistorianAggregateSample`, `HistoricalEvent`) from - `Driver.Galaxy.Host/Backend/Historian/` into - `src/ZB.MOM.WW.OtOpcUa.Core/Abstractions/Historian/`. -2. Extend the OPC UA HA service to look up a registered - `IHistorianDataSource` per namespace and call into it for `HistoryRead`, - `HistoryReadProcessed`, `HistoryReadAtTime`, `HistoryReadEvents`. Drivers - stop implementing `IHistoryProvider` directly; the server proxies. -3. Add a no-op default registration so drivers without history keep working. - -**Exit:** all current Galaxy history reads route through an -`IHistorianDataSource` registered by `Driver.Galaxy.Host` (still legacy) -without behavior change. Other drivers untouched. - -### Phase 2 — server-level alarm subsystem (ot-2) - -1. Add an `IAlarmConditionDeclaration` API on the address-space builder so - discovery can flag a node as alarm-bearing and supply the four - sub-attribute references. -2. Add a hosted `AlarmConditionService` in the server that, on driver - `Discover`, subscribes to the four sub-attributes via the driver's own - `ISubscribable`, runs the state machine, and emits - `IAlarmSource.OnAlarmEvent` itself. Acks route back through the driver's - `IWritable.WriteAsync` to the `.AckMsg` attribute. -3. Add Galaxy-specific defaults (sub-attribute naming) as a small adapter - so the same service can serve future drivers with different conventions. - -**Exit:** Galaxy alarms still work end-to-end; the tracker code that runs -inside `Galaxy.Host` is dead but kept for the legacy-host backend path. - -### Phase 3 — Wonderware Historian sidecar (`Driver.Historian.Wonderware`) - -1. New solution project: `Driver.Historian.Wonderware`, .NET 4.8 x86, - console app + NSSM (mirrors today's Galaxy.Host packaging exactly, - minus Galaxy responsibilities). -2. Hosts the existing `HistorianDataSource`, `HistorianClusterEndpointPicker`, - `HistorianHealthSnapshot` code lifted from `Galaxy.Host/Backend/Historian/` - and exposes them over a small named-pipe protocol (or local gRPC if - .NET 4.8 cost is acceptable; named pipe is simpler). -3. Add `Driver.Historian.Wonderware.Client` — .NET 10 — implementing - `IHistorianDataSource` against the sidecar. -4. Server registers it as a data source for the `Galaxy` namespace. - -**Exit:** OPC UA history reads work via the sidecar with the legacy-host -backend still in place. We've decoupled history from MXAccess. - -### Phase 4 — new `Driver.Galaxy` against gw - -This is the meat. New project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`, .NET 10, -in-process. Capabilities (post ot-3): `IDriver`, `ITagDiscovery`, `IReadable`, -`IWritable`, `ISubscribable`, `IRediscoverable`, `IHostConnectivityProbe`. - -Shape: - -``` -Driver.Galaxy/ - GalaxyDriver.cs # IDriver root - Browse/ - GalaxyDiscoverer.cs # consumes GalaxyRepositoryClient.DiscoverHierarchyAsync - DataTypeMap.cs # mx_data_type → DriverDataType - SecurityMap.cs # security_classification → SecurityClassification - Runtime/ - GalaxyMxSession.cs # owns one MxGatewaySession; Register + map per-driver client name - SubscriptionRegistry.cs # tag → server/item handles; persists to memory only - EventPump.cs # consumes session.StreamEventsAsync, fans out to OnDataChange - ReconnectSupervisor.cs # gw transport drop / session-lost recovery - DeployWatcher.cs # GalaxyRepositoryClient.WatchDeployEventsAsync → OnRediscoveryNeeded - Health/ - HostConnectivityForwarder.cs # gw-6 SessionHealth → IHostConnectivityProbe - Config/ - GalaxyDriverOptions.cs # endpoint, ApiKey, ClientName, TLS, retry, intervals - GalaxyDriverFactoryExtensions.cs # AddGalaxyDriver(IServiceCollection) -``` - -Key behaviors: - -- **Discovery** calls `GalaxyRepositoryClient.DiscoverHierarchyAsync()` - once at init and on every `WatchDeployEvents` event, then drives the - address space builder. Same node naming as today (parent contained-name - hierarchy + leaf attributes named `tag_name.AttributeName`). -- **Read** uses one-off `AddItem` + `Advise` + read-after-first-callback - is overkill; instead, use **`Register` + per-call `AddItem`/`Read`** if gw - exposes a synchronous read, otherwise short-lived advise. *Action item:* - confirm gw's read story; if absent, request a synchronous `ReadAsync` RPC - on top of MXAccess `Read` (which exists in the COM API). -- **Write** maps `WriteRequest.Value` to `MxValue` via gw-7 helpers and - calls `WriteAsync(serverHandle, itemHandle, value, userId=0)`. Routes - `WriteSecured` (where `SecurityClassification == SecuredWrite/Verified`) - to `WriteSecuredAsync` once exposed on `MxGatewaySession`. -- **Subscribe** calls `SubscribeBulkAsync` once per `ISubscribable.Subscribe` - call. Stores `(tag → itemHandle, sid)` in `SubscriptionRegistry`. The - single `EventPump` consumes one `StreamEventsAsync` per session and fans - out per `sid`. -- **Unsubscribe** calls `UnsubscribeBulkAsync` and drops registry entries. -- **Reconnect** — when the gRPC channel drops or `StreamEvents` returns, - `ReconnectSupervisor` reopens the session and replays subscriptions via - gw-5 `ReplaySubscriptionsAsync`. The driver flags `DriverState.Degraded` - during recovery; the server keeps publishing last-good values with - `Uncertain` quality. -- **Host connectivity** — single synthesized host entry named after - `OTOPCUA_GALAXY_CLIENT_NAME` driven by gw-6 `SessionHealth` updates - (or, until gw-6 lands, by transport drops). - -Wire into the server next to other Tier-A drivers in the -`AddDrivers(...)` call site. - -**Exit:** flipping `Galaxy:Backend` to `mxgateway` runs the OPC UA server -end-to-end with no `Galaxy.Host` involvement. Live read, live write, live -subscribe pass against the dev Galaxy. Historian + alarms still work via -phases 1–3. - -### Phase 5 — parity test matrix - -Reuse the existing live-Galaxy integration tests; run each scenario twice: -once with `Galaxy:Backend=legacy-host`, once with `mxgateway`. Compare: - -- discovered hierarchy node count + names + datatypes, -- subscribed publish rates (allow ±10% tolerance vs. legacy), -- write success / status codes for each `SecurityClassification`, -- alarm condition transitions (Active / Acked / Inactive) — already - routed through phase 2's server-level subsystem, -- history reads — phase 3 sidecar, identical results both backends, -- reconnect behavior under gw kill, worker kill, network drop, ZB drop. - -Document the matrix; resolve every discrepancy or explicitly accept it. - -**Exit:** parity matrix has zero unexplained deltas. Performance budget -agreed: e.g. ≤ 2× per-call latency vs. named-pipe baseline at the 95th -percentile, equal or better throughput in `SubscribeBulk` setup time. - -### Phase 6 — perf + hardening - -- Land gw-9 buffered-update intervals. -- Add OpenTelemetry traces from the driver around every gw call, - correlated via `client_correlation_id`. -- Write soak test: 50k tags subscribed, 24h, count missed events, gw - restarts, OtOpcUa restarts. -- Tune `MxGatewayClientOptions.MaxGrpcMessageBytes`, retry pipeline, - call timeouts based on soak results. - -**Exit:** production-acceptable perf numbers documented in -`docs/drivers/Galaxy.md`. - -### Phase 7 — retirement - -1. Default `Galaxy:Backend = mxgateway` everywhere (sample configs, - install scripts, e2e configs). -2. Delete `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`, - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy`, - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared`, and matching tests. -3. Remove `OtOpcUaGalaxyHost` NSSM registration from - `scripts/install/Install-Services.ps1`. Add a registration block for the - Wonderware historian sidecar from phase 3. -4. Remove every x86 .NET 4.8 reference, build target, and CI step from this - repo; remove `mxaccess_documentation.md`-driven dependencies that no - longer apply. -5. Update CLAUDE.md, `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`, - `docs/Redundancy.md` to reflect the new topology. -6. Memory housekeeping: retire `project_galaxy_host_service.md` and - `project_galaxy_host_installed.md`; add a short note about the gw - dependency. - -**Exit:** `git grep -i 'Galaxy\.Host'` returns nothing in source. - ---- - -## Configuration shape (new driver) - -```jsonc -"Drivers": { - "Galaxy": { - "Type": "Galaxy", - "InstanceId": "galaxy-prod-1", - "Gateway": { - "Endpoint": "https://mxgw.aveva.local:5001", - "ApiKeySecretRef": "galaxy:apiKey", // resolved via existing secret store - "UseTls": true, - "CaCertificatePath": "C:\\publish\\mxgw\\ca.crt", - "ConnectTimeoutSeconds": 10, - "DefaultCallTimeoutSeconds": 5, - "StreamTimeoutSeconds": 0 // unbounded - }, - "MxAccess": { - "ClientName": "OtOpcUa-A", // unique per OtOpcUa instance - "PublishingIntervalMs": 1000, // hint for SetBufferedUpdateInterval - "WriteUserId": 0 - }, - "Repository": { - "DiscoverPageSize": 5000, - "WatchDeployEvents": true - }, - "Reconnect": { - "InitialBackoffMs": 500, - "MaxBackoffMs": 30000, - "ReplayOnSessionLost": true - } - } -} -``` - -The OtOpcUa secret store already handles DPAPI-protected values for LDAP -binds; reuse it for the gw API key. Never put the key in plaintext in the -sample config. - ---- - -## Risks and mitigations - -| Risk | Mitigation | -|---|---| -| gw protocol regression breaks production | Pin gw NuGet to a contract version range; CI runs parity matrix on every gw bump; staged rollout via `Galaxy:Backend` flag. | -| Per-call latency regresses for chatty workloads | Land gw-9 (buffered updates) before phase 5; soak the 95p in phase 6. | -| Reconnect storm after gw restart re-registers 50k tags | Land gw-3 or gw-5 before phase 6; client-side bulk replay throttled by `SubscribeBulkAsync` chunk size. | -| Alarm parity gap from moving tracker server-side | Phase 2 ships before phase 4; parity matrix gates phase 7. | -| Historian sidecar adds a second .NET 4.8 x86 service | Acceptable: it's a *driver-agnostic* component, and it ships only where Wonderware historian access is actually needed. | -| Two OtOpcUa instances both registering as same MXAccess client | `ClientName` is per-instance config (ot-5); install scripts lint that the redundancy pair has distinct names. | -| Cross-machine MXAccess writes traverse plaintext gRPC | Phase 0 enforces `UseTls=true` for any non-loopback `Endpoint`; CI lints the sample configs. | -| gw API key leaked in logs | gw and `MxGatewayClient` already redact `authorization` metadata; phase 6 audit. | -| Memory leak in `EventPump` under high event rate | Bounded channel between `StreamEventsAsync` and per-sub fan-out, drop-newest with a metric counter; soak test catches. | - ---- - -## Cross-cutting deliverables - -- **Docs:** `docs/drivers/Galaxy.md` (new), updates to - `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`, - `docs/Redundancy.md`, `CLAUDE.md`. -- **Install scripts:** `scripts/install/Install-Services.ps1` removes - `OtOpcUaGalaxyHost`, adds `OtOpcUaWonderwareHistorian`, no Galaxy - service registration on the OtOpcUa node. -- **e2e:** `scripts/e2e/e2e-config.sample.json` — drop `OTOPCUA_GALAXY_*` - pipe vars, add `Drivers:Galaxy:Gateway:Endpoint` etc. -- **Memory:** retire stale Galaxy.Host entries; add gw dependency entry, - redundancy + client-name guidance. - ---- - -## Order-of-work summary - -``` -Phase 0 (gw repo): gw-1, gw-2, gw-4, gw-9 -Phase 1 (this): ot-1 — historian extension point -Phase 2 (this): ot-2 — alarm subsystem -Phase 3 (this): Driver.Historian.Wonderware sidecar -Phase 4 (this): Driver.Galaxy (new) behind backend flag - — depends on Phase 0, 1, 2 -Phase 5 (this+gw): parity matrix - — drives gw-3 / gw-5 / gw-6 / gw-7 if gaps surface -Phase 6 (this): perf + hardening -Phase 7 (this): retire Galaxy.Host / Proxy / Shared -``` - -Phases 1–3 are independent of each other and can run in parallel. Phase 4 -needs all three plus Phase 0. Phase 5 requires Phase 4. Phases 6 and 7 are -sequential after Phase 5. diff --git a/lmx_mxgw_impl.md b/lmx_mxgw_impl.md deleted file mode 100644 index 2739351..0000000 --- a/lmx_mxgw_impl.md +++ /dev/null @@ -1,1062 +0,0 @@ -> **✅ Completed 2026-04-30 — historical record of the v2-mxgw implementation plan.** -> -> All 39 PRs across 7 phases (1.1–1.3 + 2.1–2.3 + 1+2.W + 3.1–3.W + -> 4.0–4.W + 5.1–5.W + 6.1–6.W + 7.1–7.3) shipped and merged to master -> at commit `ae7106d`. Per-phase status tracking below is preserved as -> the historical PR-execution log; phase descriptions are -> retrospective, not pending. Parity matrix verified green on the dev -> rig 2026-04-30 (14 passed / 1 skipped / 0 failed — -> see `docs/v2/Galaxy.ParityMatrix.md`). - -# Galaxy → MxGateway Migration — Detailed Implementation Plan - -Companion to `lmx_mxgw.md` (design plan). This document breaks the plan into -PR-sized tasks with concrete file paths, acceptance checks, test deltas, and -explicit parallel-safety analysis for subagent execution. - -Cross-repo scope: -- **`lmxopcua`** (this repo) — drivers, server, install scripts, e2e, docs. -- **`mxaccessgw`** (`C:\Users\dohertj2\Desktop\mxaccessgw`) — gRPC gateway, - worker, .NET client. - ---- - -## How to use parallel subagents safely - -The plan lists each task with a `parallel-key`. Two tasks share a key when -they touch the same file(s); tasks with **disjoint keys are safe to run in -parallel**. Tasks within the same phase that share a key MUST run -sequentially. - -### Subagent execution rules - -1. **One git worktree per parallel subagent.** Spawn each parallel agent - with `Agent({ isolation: "worktree", ... })` so they never collide on the - working tree. Merge back to a shared integration branch after each - parallel batch completes. -2. **Interface-defining tasks run first, then their consumers.** Anywhere - the plan says "PR X.0: define interface", that PR must merge to the - integration branch before its consumers fan out in parallel. -3. **Shared-file edits serialize.** Files touched by more than one PR in a - batch — `ZB.MOM.WW.OtOpcUa.slnx`, `Install-Services.ps1`, - `appsettings.json`, `CLAUDE.md`, `MEMORY.md` — get a single dedicated - "wire-up" PR at the end of the batch that ingests every parallel branch's - needed line. Don't let parallel agents edit them. -4. **Test fixtures own their fixture file.** When two PRs both need a - `FakeMxGatewayClient`, the first PR creates it and exposes the contract; - subsequent PRs add cases to the same file or extend it via partial class - in their own test files. -5. **Subagent prompt must include the parallel-key and disallowed paths.** - Any agent prompt must say "you may NOT edit ``, - ``, or files outside ``. If you discover a - needed change there, surface it as a task for the wire-up PR; do not - make it yourself." This prevents merge conflicts at integration time. -6. **Choose the right subagent type.** - - `Explore` — read-only research/locate. Cheap. Use before any PR that - needs to learn the surrounding code. - - `Plan` — produce a step-by-step PR plan from a brief; no code writes. - Use when a task description below is too coarse for a fresh agent. - - `general-purpose` — code-writing. Use for PRs that create/modify - source. - - `code-simplifier` — post-PR cleanup pass on the same files. - - `codex:rescue` — a stuck PR; use sparingly. -7. **Foreground vs. background.** Run one PR foreground if its result - gates the rest of your work this turn. Run the rest in background and - read results when they complete. -8. **Trust but verify.** After every subagent claims completion, the - parent runs the build (`dotnet build ZB.MOM.WW.OtOpcUa.slnx`) and the - target tests. The agent's report is hearsay until the build is green. -9. **Worktree cleanup.** When `isolation: "worktree"` returns no path, - nothing was changed; if it returns a path, integrate by cherry-picking - or fast-forwarding into the integration branch, then prune the worktree. - -### Locked files (never edit from a parallel batch) - -These get a dedicated wire-up PR at the **end** of each phase's parallel -fanout: - -| File | Why locked | -|---|---| -| `ZB.MOM.WW.OtOpcUa.slnx` | New project additions stack and conflict | -| `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` | Config schema additions stack | -| `src/ZB.MOM.WW.OtOpcUa.Server/Program.cs` (or `Startup.cs`) | DI registrations stack | -| `scripts/install/Install-Services.ps1` | Service registrations stack | -| `scripts/e2e/e2e-config.sample.json` | E2E config stacks | -| `CLAUDE.md`, `docs/v2/dev-environment.md` | Doc edits stack | -| `MEMORY.md` (auto-memory index) | One line per change; conflicts often | -| `mxaccessgw/MxGateway.sln` | Same reason as our slnx | -| `mxaccessgw/clients/proto/*.proto` files | Proto edits stack and reorder field numbers | - ---- - -## Phase 0 — mxaccessgw foundation work - -Repo: `C:\Users\dohertj2\Desktop\mxaccessgw`. Branch off `main` per task. - -| PR | Title | Parallel-key | Files | -|----|-------|--------------|-------| -| 0.1 | Galaxy attribute metadata parity | `gw-proto-galaxy` | `clients/proto/galaxy_repository.proto`, `src/MxGateway.Server/Galaxy/AttributeMapper.cs`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`, `gr/`-equivalent SQL in `src/MxGateway.Server/Galaxy/Sql/`, contract tests | -| 0.2 | Bulk subscribe with publishing-interval hint | `gw-proto-mxaccess` | `clients/proto/mxaccess_gateway.proto` (extend `SubscribeBulkCommand` with `optional uint32 buffered_update_interval_ms`), `src/MxGateway.Worker/MxAccess/Commands/SubscribeBulkHandler.cs`, `src/MxGateway.Server/Sessions/Mappers.cs`, worker tests | -| 0.3 | Subscription replay RPC | `gw-proto-mxaccess` | Same proto file as 0.2 (add `ReplaySubscriptionsCommand`), `src/MxGateway.Worker/MxAccess/Commands/ReplaySubscriptionsHandler.cs`, gateway forwarder, tests | -| 0.4 | Session health stream | `gw-proto-mxaccess` | Same proto (add `StreamSessionHealth(SessionId) returns (stream SessionHealth)`), `src/MxGateway.Server/Sessions/SessionHealthService.cs`, dashboard projection, tests | -| 0.5 | Document event-stream resume contract | `gw-docs` | `docs/Sessions.md`, `docs/gateway-process-design.md` — define retention bound, `events_lost` signal in `MxEvent` envelope | -| 0.6 | .NET client `MxValue` adapter + `SubscribeWithCallback` | `gw-dotnet-client` | `clients/dotnet/MxGateway.Client/MxValueAdapter.cs` (new), `clients/dotnet/MxGateway.Client/MxGatewaySession.cs` (extend with `SubscribeWithCallbackAsync`), `clients/dotnet/MxGateway.Client.Tests/` | -| 0.7 | API key scopes + `mxgw-key` minting CLI | `gw-auth` | `src/MxGateway.Server/Auth/`, `src/MxGateway.Cli/`, `docs/Authentication.md` | - -### Phase 0 parallel batches - -- **Batch 0a (parallel):** 0.1 (`gw-proto-galaxy`), 0.5 (`gw-docs`), - 0.6 (`gw-dotnet-client`), 0.7 (`gw-auth`). Four worktrees, four - `general-purpose` agents. -- **Batch 0b (sequential within key, parallel across keys):** 0.2 → 0.3 → - 0.4 all share `gw-proto-mxaccess`. Land them in order on the same agent - (or three sequential calls). Field number assignment must be coordinated - through the wire-up PR. -- **Wire-up 0.W:** integrate proto-generated descriptors, regenerate - `clients/proto/descriptors`, run cross-language smoke matrix. - -**Phase 0 exit:** mxaccessgw `main` carries all seven PRs. Tag the gw NuGet -release. Bump `MxGateway.Client` consumed by lmxopcua. - ---- - -## Phase 1 — Server-level historian extension point (lmxopcua) - -Goal: detach `IHistorianDataSource` from the Galaxy driver. Server's -`HistoryRead*` operations call into a registered data source by namespace, -not into `IHistoryProvider` on the driver. - -### Tasks - -#### PR 1.1 — Lift `IHistorianDataSource` to `Core.Abstractions` - -**Parallel-key:** `core-abs-historian` (locks files in -`Core.Abstractions/Historian/`). - -**Files** -- Create: - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs` - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianSample.cs` - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianAggregateSample.cs` - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianEvent.cs` - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianHealthSnapshot.cs` -- Move-from (Galaxy.Host originals stay until phase 7; new copies live in - Core.Abstractions and are pure POCO): - - source bodies in `src/.../Driver.Galaxy.Host/Backend/Historian/` -- Modify: - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj` (no change if files auto-included) -- Tests: - - `tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Historian/IHistorianDataSourceContractTests.cs` — - contract documentation tests (null arg behavior, time-range ordering). - -**Acceptance** -- `dotnet build` clean. -- New tests run and pass. -- Galaxy.Host still compiles (it keeps its own copies until phase 7). - -**Subagent prompt boilerplate** (template — re-use this shape for each PR): -> You are working in worktree ``. Create the files listed in PR 1.1 of -> `lmx_mxgw_impl.md`. Do NOT edit any file under `Driver.Galaxy.Host/`, -> `appsettings.json`, the `.slnx`, or `Program.cs`. The DTOs are pure value -> records — do not import OPC UA types or COM types. Run -> `dotnet build src/ZB.MOM.WW.OtOpcUa.Core.Abstractions` before reporting. - -#### PR 1.2 — `IHistoryService` plugin host on the server - -**Parallel-key:** `server-history`. - -**Files** -- Create: - - `src/ZB.MOM.WW.OtOpcUa.Server/History/IHistoryRouter.cs` — namespace → `IHistorianDataSource`. - - `src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryRouter.cs` — registry impl. - - `src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryServiceAdapter.cs` — - bridges OPC UA `HistoryRead`/`HistoryReadProcessed`/`HistoryReadAtTime`/ - `HistoryReadEvents` to the router. -- Modify: - - `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — register - `HistoryServiceAdapter`. *Locked file* — defer to wire-up PR 1.W. -- Tests: - - `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/History/HistoryRouterTests.cs`. - -**Acceptance** -- Router resolves data source by namespace prefix. -- Unknown namespace returns `BadHistoryOperationUnsupported` (or current - status used for that case — verify against existing server behavior in - `OpcUaServerService.cs` before coding). - -**Depends on:** 1.1 merged. - -#### PR 1.3 — Driver capability shrink: drop `IHistoryProvider` requirement - -**Parallel-key:** `server-history`. - -**Files** -- Modify: - - `src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs` (or wherever - `IHistoryProvider` is consumed; locate via `Grep "IHistoryProvider"`). - Replace direct calls with `IHistoryRouter.Resolve(...)`. -- Tests: - - Update any test that exercised `IHistoryProvider` paths to register a - fake data source via the router. - -**Depends on:** 1.2 merged. - -#### PR 1.W — Phase 1 wire-up - -**Parallel-key:** locked-files. - -**Files** -- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — DI registration of - `HistoryRouter` + the legacy Galaxy.Host historian adapter. -- `ZB.MOM.WW.OtOpcUa.slnx` — no change unless a new project was added; if - PR 1.1 went into the existing `Core.Abstractions` project, no slnx edit. - -### Phase 1 parallel batches - -- **Batch 1a (sequential):** 1.1 → 1.2 → 1.3 → 1.W. Each blocks the next. -- Total: one foreground sequence; no parallelism in Phase 1. Use one - `general-purpose` agent across all four PRs, or one PR per agent in - order. - ---- - -## Phase 2 — Server-level alarm condition subsystem (lmxopcua) - -Goal: drop `GalaxyAlarmTracker` from the driver's responsibilities; the -server runs the AlarmCondition state machine driven by `IsAlarm=true` -attribute metadata. - -### Tasks - -#### PR 2.1 — Address-space builder alarm-declaration API - -**Parallel-key:** `core-abs-alarms`. - -**Files** -- Modify: - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs` — - add `IAlarmConditionDeclaration MarkAsAlarmCondition(...)` (the - method already exists per `GalaxyProxyDriver.cs:146`; verify shape and - extend with the four sub-attribute references). - - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Alarms/AlarmConditionInfo.cs` - — add `InAlarmRef`, `PriorityRef`, `DescAttrNameRef`, `AckedRef`, - `AckMsgWriteRef` fields. -- Tests: - - `tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Alarms/AlarmConditionInfoTests.cs`. - -**Acceptance** -- Existing call sites (`GalaxyProxyDriver.DiscoverAsync`) still compile — - add the new fields with safe defaults. - -#### PR 2.2 — `AlarmConditionService` (state machine) - -**Parallel-key:** `server-alarms`. - -**Files** -- Create: - - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs` - - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionState.cs` - - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/IAlarmAcknowledger.cs` -- Reference impl to **port** (do not duplicate — read it for invariants): - - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs` -- Tests: - - `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Alarms/AlarmConditionServiceTests.cs` — - port the existing tracker tests (`tests/.../Galaxy.Host.Tests/`). - -**Subagent guidance** -- **Two-step.** First a `Plan` agent: read `GalaxyAlarmTracker.cs` and - produce a state-transition table + a list of tests to port. Then a - `general-purpose` agent: implement `AlarmConditionService` against that - table. - -**Depends on:** 2.1 merged. - -#### PR 2.3 — Wire alarm service into `DriverNodeManager` - -**Parallel-key:** `server-alarms`. - -**Files** -- Modify: - - `src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs` — on each driver's - discovery, collect alarm declarations and hand to `AlarmConditionService` - along with the driver's `ISubscribable` and `IWritable` for sub-attribute - advise + ack writes. -- Tests: - - extend `DriverNodeManagerTests` with a fake driver that declares one - alarm-bearing node. - -**Depends on:** 2.2 merged. - -#### PR 2.W — Phase 2 wire-up - -DI registration of `AlarmConditionService` in `OpcUaServerService.cs`. - -### Phase 2 parallel batches - -- **Batch 2a (sequential):** 2.1 → 2.2 → 2.3 → 2.W. - -### Phases 1 + 2 cross-batch parallelism - -PR 1.1 and PR 2.1 touch **different files** in `Core.Abstractions/` (one -under `Historian/`, one in `IAddressSpaceBuilder.cs` + `Alarms/`). They are -**parallel-safe**. - -PR 1.2/1.3 and PR 2.2/2.3 both modify `OpcUaServerService.cs` and -`DriverNodeManager.cs`. They share **two locked files** — but only at the -DI-registration level. If we split the `OpcUaServerService.cs` edits into a -single combined wire-up PR (1+2.W), the body PRs 1.2/1.3 and 2.2/2.3 don't -touch them. Then the body PRs *can* run in parallel batches across -phase 1 and phase 2. - -**Recommended Phase 1+2 plan** (parallel): -1. Run **PR 1.1 and PR 2.1 in parallel** (two worktrees, two - `general-purpose` agents). Both target `Core.Abstractions` only. -2. Merge both to integration branch. -3. Run **PR 1.2/1.3 and PR 2.2/2.3 in parallel**, each as a sequential - 2-PR chain on its own worktree. Constraint: neither chain edits - `OpcUaServerService.cs` or `DriverNodeManager.cs` — defer all DI/wiring - to the combined wire-up. -4. Merge both chains. -5. **Combined wire-up PR 1+2.W** edits `OpcUaServerService.cs` and - `DriverNodeManager.cs` once. - ---- - -## Phase 3 — `Driver.Historian.Wonderware` sidecar - -Goal: house the existing `HistorianDataSource` code in its own .NET 4.8 x86 -service, exposed over named pipe; ship a .NET 10 client implementing -`IHistorianDataSource`. - -### Tasks - -#### PR 3.1 — Create the sidecar shell project - -**Parallel-key:** `historian-sidecar-host`. - -**Files** -- Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/` - - `Driver.Historian.Wonderware.csproj` (`net48`, - `x86`). - - `Program.cs` — Serilog + console host + named pipe server (mirror - `Driver.Galaxy.Host/Program.cs` shape: env-driven pipe name, allowed SID, - shared secret). -- Create test project: - - `tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` -- *Locked:* `.slnx`, `Install-Services.ps1` (wire-up). - -#### PR 3.2 — Lift `HistorianDataSource` & friends - -**Parallel-key:** `historian-sidecar-host`. - -**Files** -- Move (preserve git history with `git mv`): - - `src/.../Driver.Galaxy.Host/Backend/Historian/HistorianDataSource.cs` - → `src/.../Driver.Historian.Wonderware/Backend/HistorianDataSource.cs` - - `HistorianClusterEndpointPicker.cs` - - `HistorianClusterNodeState.cs` - - `HistorianConfiguration.cs` - - `HistorianEventDto.cs` - - `HistorianHealthSnapshot.cs` - - `HistorianQualityMapper.cs` - - `HistorianSample.cs` - - `IHistorianConnectionFactory.cs` -- Add a thin `IHistorianDataSource` shim in the sidecar that re-implements - the **interface from `Core.Abstractions/Historian/`** (after PR 1.1). -- Galaxy.Host needs to keep building until phase 7. Either: - - Add `Driver.Historian.Wonderware` ProjectReference from - `Driver.Galaxy.Host` and re-use the moved code, OR - - Leave a stub copy in Galaxy.Host that delegates to the sidecar via the - new client. Pick option 1 (cleaner). -- Tests: - - `git mv` matching test files from - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/Backend/Historian/` - to `tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/`. - -**Depends on:** PR 1.1 merged (interface lives in Core.Abstractions). - -#### PR 3.3 — Pipe contract + handler - -**Parallel-key:** `historian-sidecar-pipe`. - -**Files** -- Create: - - `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Ipc/Contracts.cs` - (MessagePack DTOs: `ReadRawRequest/Reply`, `ReadProcessedRequest/Reply`, - `ReadAtTimeRequest/Reply`, `ReadEventsRequest/Reply`, - **`WriteAlarmEventsRequest/Reply`** — alarm-event persistence write - path; mirror today's `GalaxyHistorianWriter.WriteBatchAsync` payload - so the SQLite store-and-forward sink in `Core.AlarmHistorian` can - drain into the Wonderware historian event store after Galaxy.Proxy is - deleted). - - `Ipc/PipeServer.cs` — copy + adapt - `Driver.Galaxy.Host/Ipc/PipeServer.cs` (same ACL/secret model). - - `Ipc/HistorianFrameHandler.cs` — handles all five contract pairs - above. -- Tests: - - `tests/.../Driver.Historian.Wonderware.Tests/Ipc/PipeRoundTripTests.cs` - — round-trip every contract pair including `WriteAlarmEvents`. - -#### PR 3.4 — .NET 10 client - -**Parallel-key:** `historian-sidecar-client`. - -**Files** -- Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/` - (.NET 10 x64). Implements: - - `IHistorianDataSource` (read path: raw / processed / at-time / events) - against the sidecar pipe. - - `IAlarmHistorianWriter` (write path: alarm-event persistence) against - the sidecar pipe `WriteAlarmEvents` contract from PR 3.3. -- Tests: - - `tests/.../Driver.Historian.Wonderware.Client.Tests/` against an - in-proc fake pipe server. Cover both the read interface and the - alarm-event write interface; verify the SQLite store-and-forward sink - (`Core.AlarmHistorian.SqliteStoreAndForwardSink`) drains successfully - when the client is plugged in as its target. - -**Depends on:** PR 3.3 merged (contracts published). - -#### PR 3.W — Phase 3 wire-up - -**Files** -- `ZB.MOM.WW.OtOpcUa.slnx` — register three new projects + two new test - projects. -- `scripts/install/Install-Services.ps1` — register - `OtOpcUaWonderwareHistorian` NSSM service. -- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — register the - client as both an `IHistorianDataSource` for the Galaxy namespace **and** - the `IAlarmHistorianWriter` target for the SQLite store-and-forward - sink, replacing today's `GalaxyProxyDriver.WriteBatchAsync` route. -- `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` — `Historian:Wonderware` - block. - -### Phase 3 parallel batches - -- **Batch 3a (sequential):** 3.1 (shell) → 3.2 (lift code). -- **Batch 3b (parallel after 3.2):** 3.3 (pipe) and 3.4 (client) — but - 3.4 depends on 3.3's contracts. So sequential within Phase 3. -- **Batch 3c:** 3.W. - -But Phase 3 is **fully independent of Phase 1.1's downstream work** once -1.1 has merged. Phase 3 can run in parallel with Phase 1.2/1.3 and all of -Phase 2. - -**Recommended phasing**: kick off Phase 3 in parallel with Phase 2, both -gated only on Phase 1.1's merge. - ---- - -## Phase 4 — New `Driver.Galaxy` (Tier-A, .NET 10) against gw - -This is the bulk of the work. Each PR adds one capability to the new driver. -The driver builds and links from PR 4.0 onward; capabilities arrive as -incremental green bars. - -The driver lives at `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (note: same -short name as the old `.Proxy`, but new project. The `.Host`, `.Proxy`, -`.Shared` projects continue to coexist until phase 7). - -### Tasks - -#### PR 4.0 — Project skeleton, options, factory - -**Parallel-key:** `galaxy-shell`. - -**Files** -- Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` - - `Driver.Galaxy.csproj` (.NET 10 x64), references - `Core.Abstractions`, `Core`, `MxGateway.Client` (NuGet from gw repo). - - `GalaxyDriver.cs` — `IDriver` + `IDisposable` skeleton; `Initialize` - creates `MxGatewayClient` and opens a session; `Shutdown` disposes. - - `Config/GalaxyDriverOptions.cs` — POCO matching the JSON shape in - `lmx_mxgw.md`. - - `GalaxyDriverFactoryExtensions.cs` — `AddGalaxyDriver(IServiceCollection)`. -- Tests: - - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/` (new project) - - `Tests/GalaxyDriverInitializationTests.cs` — uses a fake - `IMxGatewayClientTransport` to verify open-session behavior. -- *Locked:* `.slnx` (wire-up PR 4.W). - -**Acceptance** -- Driver builds, `Initialize` opens a session against a fake transport, - `Shutdown` closes it. -- `IDriver.RecycleAsync` (if present in the interface today) returns the - same stub shape as the legacy backend — `{Accepted = true, GraceSeconds - = 15}` — and is documented in the file as intentionally a no-op until a - future PR wires it through gw. Today's `MxAccessGalaxyBackend.RecycleAsync` - is itself a stub, so this preserves behavior exactly. - -#### PR 4.1 — `ITagDiscovery` via `GalaxyRepositoryClient` - -**Parallel-key:** `galaxy-discover`. - -**Files** -- Create: - - `src/.../Driver.Galaxy/Browse/GalaxyDiscoverer.cs` - - `src/.../Driver.Galaxy/Browse/DataTypeMap.cs` — - `mx_data_type → DriverDataType`. Port table from - `GalaxyProxyDriver.MapDataType` (lines 523–532) and verify against - `gr/data_type_mapping.md`. - - `src/.../Driver.Galaxy/Browse/SecurityMap.cs` — port - `GalaxyProxyDriver.MapSecurity` (lines 534–544). - - `src/.../Driver.Galaxy/Browse/AlarmRefBuilder.cs` — for any attribute - where `IsAlarm=true`, compute the five sub-attribute references by - Galaxy naming convention (`..InAlarm`, - `..Priority`, `..DescAttrName`, - `..Acked`, `..AckMsg`) and populate - `AlarmConditionInfo.{InAlarmRef, PriorityRef, DescAttrNameRef, - AckedRef, AckMsgWriteRef}` before passing to `MarkAsAlarmCondition`. - Mirrors today's behavior in - `MxAccessGalaxyBackend.SubscribeAlarmsAsync` so the server-level - `AlarmConditionService` (Phase 2) has every ref it needs. -- Modify: - - `GalaxyDriver.cs` — implement `ITagDiscovery.DiscoverAsync` calling - discoverer. -- Tests: - - `Tests/Browse/GalaxyDiscovererTests.cs` — fake - `IGalaxyRepositoryClientTransport` with canned `GalaxyObject` list. - - `Tests/Browse/AlarmRefBuilderTests.cs` — for an alarm-bearing - attribute, verify all five refs match the `..{...}` shape - and round-trip cleanly through `MarkAsAlarmCondition`. - -**Acceptance** -- Discovered nodes carry `mx_data_type`, `IsArray`, `ArrayDim`, - `SecurityClassification`, `IsHistorized`, `IsAlarm` matching what the - legacy backend produces (snapshot-compared in Phase 5). -- Every `IsAlarm=true` attribute calls `MarkAsAlarmCondition` with all - five sub-attribute refs populated. The `AlarmConditionService` from - Phase 2 must be able to subscribe and ack without further help from - the driver. - -**Subagent guidance** -- Use an `Explore` agent first: "find every place in - `Driver.Galaxy.Proxy/GalaxyProxyDriver.cs` that consumes - `DiscoverHierarchyResponse` and list every wire field it reads, so we - know what gw's proto must surface." - -**Depends on:** PR 4.0 merged + PR 0.1 (gw attribute parity) NuGet bumped. - -#### PR 4.2 — `IReadable` (one-shot read path) - -**Parallel-key:** `galaxy-read`. - -**Files** -- Create: - - `src/.../Driver.Galaxy/Runtime/GalaxyMxSession.cs` — owns - `MxGatewaySession`, `Register` server handle, in-memory - `tag → itemHandle` registry. - - `src/.../Driver.Galaxy/Runtime/MxValueDecoder.cs` — - `MxValue → object` (boolean/int32/float/double/string/datetime, plus - array variants). - - `src/.../Driver.Galaxy/Runtime/StatusCodeMap.cs` — explicit - `MxStatusProxy → uint OPC UA StatusCode` mapping table. Today's - coarse `vtq.Quality >= 192 ? Good : Uncertain_Placeholder` becomes a - full mapping covering at minimum: - `Good (0x0)`, `Uncertain (0x40000000)`, `Uncertain_LastUsableValue - (0x40A40000)`, `Bad (0x80000000)`, `Bad_NotConnected (0x808A0000)`, - `Bad_NoCommunication (0x80310000)`, `Bad_OutOfService (0x808D0000)`. - Document any unmapped category as `Bad_InternalError` and log once - with the raw `MxStatusProxy` so the matrix can be extended from - field data. -- Modify: - - `GalaxyDriver.cs` — implement `IReadable.ReadAsync`: per tag, - `AddItem` → short-lived `Advise` → first `OnDataChange`. (If - Phase 0 added a synchronous `ReadAsync` RPC, use that; flag a follow-up - if missing.) -- Tests: - - `Tests/Runtime/GalaxyReadTests.cs` — fake transport with scripted - `OnDataChange` responses. - - `Tests/Runtime/StatusCodeMapTests.cs` — exhaustive mapping cases plus - "unknown category falls back to Bad_InternalError and emits a single - diagnostic log" assertion. - -**Depends on:** PR 4.0. - -#### PR 4.3 — `IWritable` + secured-write routing - -**Parallel-key:** `galaxy-write`. - -**Files** -- Create: - - `src/.../Driver.Galaxy/Runtime/MxValueEncoder.cs` — - `object → MxValue` (the inverse of 4.2's decoder; unify into one type - if simpler). -- Modify: - - `GalaxyDriver.cs` — implement `IWritable.WriteAsync`. - Route writes whose attribute carries - `SecurityClassification.SecuredWrite` / `VerifiedWrite` through - `WriteSecuredAsync` (mxaccessgw exposes this in `MxGatewaySession`). -- Tests: - - `Tests/Runtime/GalaxyWriteTests.cs` — verify the routing decision - given each `SecurityClassification` value. - -**Depends on:** PR 4.2 merged (shares `GalaxyMxSession` + value type code). - -#### PR 4.4 — `ISubscribable` + `EventPump` - -**Parallel-key:** `galaxy-subscribe`. - -**Files** -- Create: - - `src/.../Driver.Galaxy/Runtime/SubscriptionRegistry.cs` — - `(driverSubId → list)` and reverse map. - - `src/.../Driver.Galaxy/Runtime/EventPump.cs` — single consumer of - `MxGatewaySession.StreamEventsAsync`. Maps each `OnDataChange` to a - `DataChangeEventArgs` per registered driver subscription. - - `src/.../Driver.Galaxy/Runtime/GalaxySubscriptionHandle.cs` (port from - Proxy). -- Modify: - - `GalaxyDriver.cs` — implement `ISubscribable.SubscribeAsync` using - `SubscribeBulkAsync` with the `buffered_update_interval_ms` hint - from PR 0.2. -- Tests: - - `Tests/Runtime/EventPumpFanoutTests.cs` — one item → multiple driver - subscriptions → one event per driver subscription. - - `Tests/Runtime/SubscribeBulkTests.cs` — partial failures. - -**Depends on:** PR 4.3. - -#### PR 4.5 — `ReconnectSupervisor` - -**Parallel-key:** `galaxy-reconnect`. - -**Files** -- Create: - - `src/.../Driver.Galaxy/Runtime/ReconnectSupervisor.cs` — state machine - `(Healthy → TransportLost → ReopeningSession → ReplayingSubscriptions - → Healthy)`. Surfaces `DriverState.Degraded` while not Healthy. -- Modify: - - `GalaxyDriver.cs` + `GalaxyMxSession.cs` — wire transport-error - callbacks into the supervisor; replay subscriptions via - `ReplaySubscriptionsCommand` (PR 0.3). -- Tests: - - `Tests/Runtime/ReconnectSupervisorTests.cs` with simulated drops. - -**Depends on:** PR 4.4. Strong recommend Phase 0.3 (replay RPC) merged. - -#### PR 4.6 — `IRediscoverable` via `WatchDeployEvents` - -**Parallel-key:** `galaxy-deploy`. - -**Files** -- Create: - - `src/.../Driver.Galaxy/Browse/DeployWatcher.cs` — long-lived consumer - of `GalaxyRepositoryClient.WatchDeployEventsAsync`. -- Modify: - - `GalaxyDriver.cs` — start watcher on Initialize; raise - `OnRediscoveryNeeded` per event. -- Tests: - - `Tests/Browse/DeployWatcherTests.cs`. - -**Depends on:** PR 4.0. **Independent of PR 4.2–4.5** — can run in -parallel with all of them. - -#### PR 4.7 — `IHostConnectivityProbe` (transport health + per-platform probes) - -**Parallel-key:** `galaxy-health`. - -The current driver reports two flavors of host connectivity: - -1. **Top-level transport health** — flips `Running`/`Stopped` on the - synthetic host named after `OTOPCUA_GALAXY_CLIENT_NAME` whenever the - MXAccess COM proxy connects/disconnects. -2. **Per-platform `ScanState` probes** — for each discovered - `$WinPlatform` and `$AppEngine` gobject, advise its `ScanState` - attribute and translate value transitions into per-host - `Running`/`Stopped`/`Unknown`. Lives in - `Driver.Galaxy.Host/Backend/Stability/GalaxyRuntimeProbeManager.cs`. - -This PR ports both. - -**Files** -- Create: - - `src/.../Driver.Galaxy/Health/HostConnectivityForwarder.cs` — - consumes PR 0.4 `StreamSessionHealth` and surfaces the synthetic - top-level host entry (named after the configured MXAccess - `ClientName`). - - `src/.../Driver.Galaxy/Health/PerPlatformProbeWatcher.cs` — port of - `GalaxyRuntimeProbeManager`. On `Discover`, takes the list of - discovered `$WinPlatform`/`$AppEngine` tag names, subscribes their - `ScanState` via the driver's own `GalaxyMxSession.SubscribeBulkAsync` - (or directly through the gw session), runs the same state machine - (`OnProbeCallback` interpretation logic — port verbatim with tests), - and raises per-host `HostStatusChangedEventArgs` through the - aggregator below. - - `src/.../Driver.Galaxy/Health/HostStatusAggregator.cs` — single - sink that merges the forwarder's transport entry with the watcher's - per-platform entries into the `IReadOnlyList` - surfaced by `IHostConnectivityProbe.GetHostStatuses()`. Owns the - de-dup + diff logic that today lives in - `GalaxyProxyDriver.OnHostConnectivityUpdate`. -- Modify: - - `GalaxyDriver.cs` — wire forwarder + watcher + aggregator into - Initialize. On every `ITagDiscovery.DiscoverAsync` completion (incl. - re-discovery from PR 4.6), feed the watcher the fresh platform list - so probe subscriptions follow Galaxy redeploys. -- Tests: - - `Tests/Health/HostConnectivityForwarderTests.cs`. - - `Tests/Health/PerPlatformProbeWatcherTests.cs` — port the existing - `GalaxyRuntimeProbeManagerTests` (or whatever covers - `OnProbeCallback`) verbatim. Cover: initial subscribe on Discover, - re-subscribe after Rediscover, value-transition state machine, - cleanup on Shutdown. - - `Tests/Health/HostStatusAggregatorTests.cs` — transport entry plus - multiple per-platform entries, transitions, aggregator emits - `OnHostStatusChanged` only on actual state change. - -**Acceptance** -- Top-level transport up/down reflected within 1s of gw `SessionHealth` - flip. -- Each `$WinPlatform` / `$AppEngine` gobject in the discovered hierarchy - produces exactly one entry in `GetHostStatuses()`, transitioning on - `ScanState` changes. -- After a redeploy that adds a new platform, the watcher subscribes its - `ScanState` without restarting the driver. - -**Depends on:** PR 4.0 + PR 4.1 (needs the discoverer's platform list). -**Independent of PR 4.2–4.6** — parallel-safe with the runtime track. - -#### PR 4.W — Backend-flag wiring - -**Parallel-key:** locked-files. - -**Files** -- `src/.../Server/Configuration/DriverFactoryRegistry.cs` (or wherever - drivers are wired) — add a `Galaxy:Backend` switch: - - `legacy-host` → existing `GalaxyProxyDriver` registration (untouched). - - `mxgateway` → new `GalaxyDriver` registration via PR 4.0's extension. -- `src/.../Server/appsettings.json` — sample new config block. -- `ZB.MOM.WW.OtOpcUa.slnx` — register `Driver.Galaxy` and its tests. -- `CLAUDE.md` — note new driver, retain old driver pointers. - -**Acceptance** -- With `Galaxy:Backend=legacy-host` (default), unchanged behavior. -- With `Galaxy:Backend=mxgateway`, server boots against the new driver and - passes a smoke test against the dev gw. - -### Phase 4 parallel batches - -Dependency graph: - -``` -4.0 (shell) ──┬── 4.1 (discover) ──┬── 4.6 (deploy) - │ └── 4.7 (health: needs platform list) - ├── 4.2 (read) ── 4.3 (write) ── 4.4 (subscribe) ── 4.5 (reconnect) - │ \ - │ → 4.W (wire-up) - └── (no longer parallel-with-4.1: 4.7 moved under 4.1) -``` - -- After 4.0 merges, **4.1 and the 4.2-chain head** can run in two parallel - worktrees. -- After 4.1 merges, **4.6 and 4.7** can run in two parallel worktrees. -- 4.2 → 4.3 → 4.4 → 4.5 is one sequential chain on its own worktree - (they all touch `GalaxyDriver.cs` and `GalaxyMxSession.cs`) and runs - alongside the discover/deploy/health track. -- 4.W gathers everything. - -**Recommended Phase 4 plan:** -- Stage 1 (after 4.0): two worktrees — W1: 4.1; W2: 4.2 → 4.3 → 4.4 → 4.5. -- Stage 2 (after 4.1 merges, W2 still running): three worktrees — - W1: 4.6; W3: 4.7; W2: continues runtime chain. -- Stage 3: 4.W wire-up. - ---- - -## Phase 5 — Parity test matrix - -### Tasks - -#### PR 5.1 — `Driver.Galaxy.ParityTests` project - -**Parallel-key:** `parity-shell`. - -**Files** -- Create: `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/` - - `ParityHarness.cs` — boots the OtOpcUa server twice with each backend, - drives the same OPC UA scenarios, captures structured snapshots. - - Theory data per scenario (browse, subscribe, alarm transition, write - by classification, history read). -- Reuses existing live-Galaxy fixtures from - `tests/.../Driver.Galaxy.E2E/`. - -#### PR 5.2 — Browse + read parity scenarios - -**Parallel-key:** `parity-browse`. - -#### PR 5.3 — Subscribe + event-rate parity scenarios - -**Parallel-key:** `parity-subscribe`. - -#### PR 5.4 — Write-by-classification parity scenarios - -**Parallel-key:** `parity-write`. - -#### PR 5.5 — Alarm-transition parity scenarios - -**Parallel-key:** `parity-alarms`. - -Cover both: -- **Live transitions:** Active / Acknowledged / Inactive sequences against - `.InAlarm` / `.Acked` value flips on the dev Galaxy. Must match - legacy-host event ordering and severity mapping. -- **Alarm-event persistence:** trigger N alarm transitions, then verify - the SQLite store-and-forward sink drains them into the Wonderware - historian event store via the new sidecar's `WriteAlarmEvents` - contract (PR 3.3). Compare the persisted rows to those produced by the - legacy `GalaxyHistorianWriter` path. - -#### PR 5.6 — History-read parity scenarios - -**Parallel-key:** `parity-history`. - -#### PR 5.7 — Reconnect/disruption scenarios - -**Parallel-key:** `parity-reconnect`. - -#### PR 5.8 — Per-platform `ScanState` probe parity - -**Parallel-key:** `parity-probes`. - -Verify the new `PerPlatformProbeWatcher` (PR 4.7) produces the same -per-host `HostConnectivityStatus` stream as the legacy -`GalaxyRuntimeProbeManager`: -- Initial state on Discover for each `$WinPlatform` / `$AppEngine`. -- Transition events when a runtime is stopped/started on the dev Galaxy. -- Re-subscription after a redeploy that adds/removes a platform. -- Cleanup of probe subscriptions on Shutdown (no leaked advises in gw). - -#### PR 5.W — Parity matrix doc - -**Files** -- `docs/v2/Galaxy.ParityMatrix.md` — table of scenario × result for both - backends. Resolved deltas marked, accepted deltas justified. - -### Phase 5 parallel batches - -After 5.1 lands, scenarios 5.2–5.8 are **fully parallel** — they each add -a separate test class file. Seven worktrees, seven `general-purpose` agents. - -5.W runs after all scenarios merge and pass. - ---- - -## Phase 6 — Performance + hardening - -### Tasks - -#### PR 6.1 — OpenTelemetry traces - -**Parallel-key:** `perf-otel`. - -#### PR 6.2 — Bounded channel + drop-newest metrics - -**Parallel-key:** `perf-eventpump`. - -#### PR 6.3 — Buffered update interval landing - -**Parallel-key:** `perf-buffered`. -Wire `MxAccess:PublishingIntervalMs` → `SetBufferedUpdateInterval` once -gw exposes it. - -#### PR 6.4 — Soak test scenario - -**Parallel-key:** `perf-soak`. -50k tags, 24h, automated metric collection. - -#### PR 6.5 — Tune `MxGatewayClientOptions` defaults - -**Parallel-key:** `perf-tuning`. -Based on soak data. - -#### PR 6.W — Performance doc - -`docs/v2/Galaxy.Performance.md`. - -### Phase 6 parallel batches - -6.1, 6.2, 6.3 all touch `Driver.Galaxy/Runtime/`. Serialize them, OR split -files explicitly: -- 6.1 owns a new `Runtime/Tracing.cs` injected via decorator. Parallel-safe. -- 6.2 owns `Runtime/EventPump.cs`. Conflicts with PR 4.4 only if reordered; - not in parallel with 6.1 if 6.1 also wraps EventPump. Decide upfront: - PR 6.1 wraps at the gateway-client boundary, PR 6.2 owns EventPump - internals. Parallel-safe. -- 6.3 modifies `GalaxyDriver.SubscribeAsync` only. Parallel-safe. - -So 6.1, 6.2, 6.3 parallel, then 6.4 (depends on all three). 6.5 sequential -after 6.4 (uses its data). 6.W last. - ---- - -## Phase 7 — Retire legacy - -### Tasks - -#### PR 7.1 — Default flip - -**Parallel-key:** `retire-defaults`. - -**Files** -- `src/.../Server/appsettings.json` → `Galaxy:Backend = mxgateway`. -- `scripts/e2e/e2e-config.sample.json` → drop `OTOPCUA_GALAXY_*` pipe vars, - add gw endpoint. -- `scripts/install/Install-Services.ps1` → remove - `OtOpcUaGalaxyHost` registration; keep `OtOpcUaWonderwareHistorian` from - PR 3.W. - -#### PR 7.2 — Delete legacy projects - -**Parallel-key:** `retire-delete`. - -**Files** -- Delete: - - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/` - - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/` - - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/` - - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/` - - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/` - - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests/` -- Modify: - - `ZB.MOM.WW.OtOpcUa.slnx` — remove the six entries. - - `Server/Configuration/DriverFactoryRegistry.cs` — remove the - `legacy-host` switch arm. - -**Depends on:** parity matrix in `docs/v2/Galaxy.ParityMatrix.md` is -fully green or carries documented accepted-deltas (verified -2026-04-30 on the dev rig: 14 passed / 1 skipped / 0 failed). - -#### PR 7.3 — Doc + memory housekeeping - -**Parallel-key:** `retire-docs`. - -**Files** -- `CLAUDE.md` — rewrite Galaxy section. -- `docs/v2/dev-environment.md` — drop `OtOpcUaGalaxyHost` references. -- `docs/ServiceHosting.md`, `docs/Redundancy.md`, `docs/security.md` — - scrub `Galaxy.Host`/`Galaxy.Proxy` mentions. -- `~/.claude/projects/.../memory/MEMORY.md` — retire entries: - - `project_galaxy_host_service.md` - - `project_galaxy_host_installed.md` - - `project_aveva_platform_installed.md` (revise — server box no longer - needs AVEVA; gw box does) -- Delete: - - `mxaccess_documentation.md` (no longer consumed by this repo). -- Add memory entry: `project_galaxy_via_mxgateway.md`. - -### Phase 7 parallel batches - -- **Batch 7a (sequential, gated by phase 6 production soak):** 7.1. -- **Batch 7b (parallel after 7.1):** 7.2 (`retire-delete`) and 7.3 - (`retire-docs`) — disjoint files. - ---- - -## Cross-phase dependency graph - -``` -Phase 0 (gw repo) ────────────────────────────────────┐ - │ -Phase 1.1 (Core.Abs/Historian) ──┐ │ - ├── Phase 1.2/1.3 │ - │ (server History)│ -Phase 2.1 (Core.Abs/Alarms) ──────┤ │ - ├── Phase 2.2/2.3 │ - │ (server Alarms) │ - │ │ - └── Phase 3 (sidecar host + client) - │ │ - └─────────┴── Phase 4 (Driver.Galaxy) - │ - Phase 5 (parity) - │ - Phase 6 (perf) - │ - Phase 7 (retire) -``` - -### Maximum-parallelism rollout (one possible execution) - -- **Day 0–N (mxaccessgw):** Phase 0 batches 0a + 0b + 0.W in parallel - worktrees, separate repo from this one — runs in parallel with everything - below until consumers need the gw bump. -- **Day 0–N (this repo):** Phases 1.1 and 2.1 in parallel (two worktrees). - Merge. -- **Day N+:** Phases 1.2/1.3, 2.2/2.3, 3.1+3.2+3.3+3.4 in parallel (three - worktrees, each a sequential chain). -- **Day M:** combined wire-up PR 1+2.W, then PR 3.W. Server passes existing - e2e against legacy backend. -- **Day M+:** Phase 4.0 lands. Phase 4 fan-out (four worktrees) starts. -- **Day P:** Phase 4 wire-up. Phase 5 fan-out (six worktrees) starts. -- **Day Q:** Phase 5 wire-up. Phase 6 fan-out (three worktrees + sequential). -- **Day R:** Phase 7. Done. - ---- - -## Subagent prompt template - -Re-use this shell when launching any of the parallel coding tasks. Replace -`` parts. - -``` -You are implementing PR from lmx_mxgw_impl.md (""). -Repo: <C:\Users\dohertj2\Desktop\lmxopcua | C:\Users\dohertj2\Desktop\mxaccessgw>. -Worktree: <path>. - -Scope (you may create/edit only these files): -<list> - -DO NOT edit: -- Any file outside the scope above -- ZB.MOM.WW.OtOpcUa.slnx / mxaccessgw/MxGateway.sln -- src/.../Server/Program.cs, OpcUaServerService.cs, appsettings.json -- scripts/install/Install-Services.ps1 -- scripts/e2e/e2e-config.sample.json -- CLAUDE.md, docs/**, MEMORY.md, mxaccess_documentation.md - -Acceptance: -<list> - -Tests: -<list> - -If you find a needed change outside scope, STOP and surface it as a -finding rather than editing — it will be picked up by the wire-up PR. - -Before reporting completion: -1. Run `dotnet build <smallest project tree that covers your scope>`. -2. Run the new/changed tests. -3. Report: files changed, test command + result, any out-of-scope - findings. -``` - ---- - -## Risk register (operational) - -| Risk | When it bites | Mitigation | -|---|---|---| -| Phase 0 gw bump breaks existing mxaccessgw consumers | Phase 0 wire-up | Cross-language smoke matrix in mxaccessgw must run before merge | -| Two parallel agents both edit `OpcUaServerService.cs` despite the rule | Phases 1+2 parallel | Wire-up convention + grep-based pre-merge check (`git diff --stat origin/main` of locked files in the integration branch must be empty until the wire-up PR) | -| Subagent silently adds a stray `using` to a locked file | Anytime | The build-and-test step in the prompt will fail if the locked file changed and broke compile; a `git diff --name-only` whitelist check at integration-branch merge time enforces it | -| Galaxy.Host can't build during phase 3.2 because lifted files vanished | Phase 3 mid-flight | PR 3.2 adds a ProjectReference from Galaxy.Host to Driver.Historian.Wonderware so the moved files remain reachable; tests cover both call sites | -| Phase 4 chain stalls because gw exposes no synchronous read | PR 4.2 | Surface as a Phase 0 finding immediately — add a `ReadCommand` to gw or accept short-lived advise as the read mechanism (document as a perf accepted delta in 5.W) | -| Phase 5 parity matrix exposes a delta no one wants to fix | Phase 5 | Phase 7 gating: `Galaxy:Backend=mxgateway` does not become default until every parity delta is either resolved or has a written acceptance from the user | -| Soak test in 6.4 finds a memory leak in `EventPump` | Phase 6 | EventPump bounded-channel design (PR 6.2) is shipped before soak so the leak is bounded by design | -| Stale memory file references retired code after phase 7 | Phase 7 | PR 7.3 explicitly retires `project_galaxy_host_*` entries; add a memory-audit step to phase-close checklist | - ---- - -## Phase-close checklist (apply at the end of each phase) - -Before declaring a phase done: -1. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` clean on integration branch. -2. `dotnet test ZB.MOM.WW.OtOpcUa.slnx` clean (or all-but-known-skipped). -3. Live-Galaxy smoke (when applicable) green on dev box. -4. No locked files modified outside their wire-up PR - (`git log --name-only origin/main..HEAD -- <locked-paths>` shows only - the wire-up commit). -5. `MEMORY.md` updated for any persistent context this phase introduced. -6. Doc updates limited to the phase's scope (no doc edits sprinkled across - non-doc PRs). diff --git a/session.dat b/session.dat deleted file mode 100644 index 1651a03..0000000 --- a/session.dat +++ /dev/null @@ -1 +0,0 @@ -opc.tcp://opcuademo.sterfive.com:26543 \ No newline at end of file