# Galaxy → MxGateway Migration — Detailed Implementation Plan Companion to `lmx_mxgw.md` (design plan). This document breaks the plan into PR-sized tasks with concrete file paths, acceptance checks, test deltas, and explicit parallel-safety analysis for subagent execution. Cross-repo scope: - **`lmxopcua`** (this repo) — drivers, server, install scripts, e2e, docs. - **`mxaccessgw`** (`C:\Users\dohertj2\Desktop\mxaccessgw`) — gRPC gateway, worker, .NET client. --- ## How to use parallel subagents safely The plan lists each task with a `parallel-key`. Two tasks share a key when they touch the same file(s); tasks with **disjoint keys are safe to run in parallel**. Tasks within the same phase that share a key MUST run sequentially. ### Subagent execution rules 1. **One git worktree per parallel subagent.** Spawn each parallel agent with `Agent({ isolation: "worktree", ... })` so they never collide on the working tree. Merge back to a shared integration branch after each parallel batch completes. 2. **Interface-defining tasks run first, then their consumers.** Anywhere the plan says "PR X.0: define interface", that PR must merge to the integration branch before its consumers fan out in parallel. 3. **Shared-file edits serialize.** Files touched by more than one PR in a batch — `ZB.MOM.WW.OtOpcUa.slnx`, `Install-Services.ps1`, `appsettings.json`, `CLAUDE.md`, `MEMORY.md` — get a single dedicated "wire-up" PR at the end of the batch that ingests every parallel branch's needed line. Don't let parallel agents edit them. 4. **Test fixtures own their fixture file.** When two PRs both need a `FakeMxGatewayClient`, the first PR creates it and exposes the contract; subsequent PRs add cases to the same file or extend it via partial class in their own test files. 5. **Subagent prompt must include the parallel-key and disallowed paths.** Any agent prompt must say "you may NOT edit ``, ``, or files outside ``. If you discover a needed change there, surface it as a task for the wire-up PR; do not make it yourself." This prevents merge conflicts at integration time. 6. **Choose the right subagent type.** - `Explore` — read-only research/locate. Cheap. Use before any PR that needs to learn the surrounding code. - `Plan` — produce a step-by-step PR plan from a brief; no code writes. Use when a task description below is too coarse for a fresh agent. - `general-purpose` — code-writing. Use for PRs that create/modify source. - `code-simplifier` — post-PR cleanup pass on the same files. - `codex:rescue` — a stuck PR; use sparingly. 7. **Foreground vs. background.** Run one PR foreground if its result gates the rest of your work this turn. Run the rest in background and read results when they complete. 8. **Trust but verify.** After every subagent claims completion, the parent runs the build (`dotnet build ZB.MOM.WW.OtOpcUa.slnx`) and the target tests. The agent's report is hearsay until the build is green. 9. **Worktree cleanup.** When `isolation: "worktree"` returns no path, nothing was changed; if it returns a path, integrate by cherry-picking or fast-forwarding into the integration branch, then prune the worktree. ### Locked files (never edit from a parallel batch) These get a dedicated wire-up PR at the **end** of each phase's parallel fanout: | File | Why locked | |---|---| | `ZB.MOM.WW.OtOpcUa.slnx` | New project additions stack and conflict | | `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` | Config schema additions stack | | `src/ZB.MOM.WW.OtOpcUa.Server/Program.cs` (or `Startup.cs`) | DI registrations stack | | `scripts/install/Install-Services.ps1` | Service registrations stack | | `scripts/e2e/e2e-config.sample.json` | E2E config stacks | | `CLAUDE.md`, `docs/v2/dev-environment.md` | Doc edits stack | | `MEMORY.md` (auto-memory index) | One line per change; conflicts often | | `mxaccessgw/MxGateway.sln` | Same reason as our slnx | | `mxaccessgw/clients/proto/*.proto` files | Proto edits stack and reorder field numbers | --- ## Phase 0 — mxaccessgw foundation work Repo: `C:\Users\dohertj2\Desktop\mxaccessgw`. Branch off `main` per task. | PR | Title | Parallel-key | Files | |----|-------|--------------|-------| | 0.1 | Galaxy attribute metadata parity | `gw-proto-galaxy` | `clients/proto/galaxy_repository.proto`, `src/MxGateway.Server/Galaxy/AttributeMapper.cs`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`, `gr/`-equivalent SQL in `src/MxGateway.Server/Galaxy/Sql/`, contract tests | | 0.2 | Bulk subscribe with publishing-interval hint | `gw-proto-mxaccess` | `clients/proto/mxaccess_gateway.proto` (extend `SubscribeBulkCommand` with `optional uint32 buffered_update_interval_ms`), `src/MxGateway.Worker/MxAccess/Commands/SubscribeBulkHandler.cs`, `src/MxGateway.Server/Sessions/Mappers.cs`, worker tests | | 0.3 | Subscription replay RPC | `gw-proto-mxaccess` | Same proto file as 0.2 (add `ReplaySubscriptionsCommand`), `src/MxGateway.Worker/MxAccess/Commands/ReplaySubscriptionsHandler.cs`, gateway forwarder, tests | | 0.4 | Session health stream | `gw-proto-mxaccess` | Same proto (add `StreamSessionHealth(SessionId) returns (stream SessionHealth)`), `src/MxGateway.Server/Sessions/SessionHealthService.cs`, dashboard projection, tests | | 0.5 | Document event-stream resume contract | `gw-docs` | `docs/Sessions.md`, `docs/gateway-process-design.md` — define retention bound, `events_lost` signal in `MxEvent` envelope | | 0.6 | .NET client `MxValue` adapter + `SubscribeWithCallback` | `gw-dotnet-client` | `clients/dotnet/MxGateway.Client/MxValueAdapter.cs` (new), `clients/dotnet/MxGateway.Client/MxGatewaySession.cs` (extend with `SubscribeWithCallbackAsync`), `clients/dotnet/MxGateway.Client.Tests/` | | 0.7 | API key scopes + `mxgw-key` minting CLI | `gw-auth` | `src/MxGateway.Server/Auth/`, `src/MxGateway.Cli/`, `docs/Authentication.md` | ### Phase 0 parallel batches - **Batch 0a (parallel):** 0.1 (`gw-proto-galaxy`), 0.5 (`gw-docs`), 0.6 (`gw-dotnet-client`), 0.7 (`gw-auth`). Four worktrees, four `general-purpose` agents. - **Batch 0b (sequential within key, parallel across keys):** 0.2 → 0.3 → 0.4 all share `gw-proto-mxaccess`. Land them in order on the same agent (or three sequential calls). Field number assignment must be coordinated through the wire-up PR. - **Wire-up 0.W:** integrate proto-generated descriptors, regenerate `clients/proto/descriptors`, run cross-language smoke matrix. **Phase 0 exit:** mxaccessgw `main` carries all seven PRs. Tag the gw NuGet release. Bump `MxGateway.Client` consumed by lmxopcua. --- ## Phase 1 — Server-level historian extension point (lmxopcua) Goal: detach `IHistorianDataSource` from the Galaxy driver. Server's `HistoryRead*` operations call into a registered data source by namespace, not into `IHistoryProvider` on the driver. ### Tasks #### PR 1.1 — Lift `IHistorianDataSource` to `Core.Abstractions` **Parallel-key:** `core-abs-historian` (locks files in `Core.Abstractions/Historian/`). **Files** - Create: - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs` - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianSample.cs` - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianAggregateSample.cs` - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianEvent.cs` - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianHealthSnapshot.cs` - Move-from (Galaxy.Host originals stay until phase 7; new copies live in Core.Abstractions and are pure POCO): - source bodies in `src/.../Driver.Galaxy.Host/Backend/Historian/` - Modify: - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj` (no change if files auto-included) - Tests: - `tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Historian/IHistorianDataSourceContractTests.cs` — contract documentation tests (null arg behavior, time-range ordering). **Acceptance** - `dotnet build` clean. - New tests run and pass. - Galaxy.Host still compiles (it keeps its own copies until phase 7). **Subagent prompt boilerplate** (template — re-use this shape for each PR): > You are working in worktree ``. Create the files listed in PR 1.1 of > `lmx_mxgw_impl.md`. Do NOT edit any file under `Driver.Galaxy.Host/`, > `appsettings.json`, the `.slnx`, or `Program.cs`. The DTOs are pure value > records — do not import OPC UA types or COM types. Run > `dotnet build src/ZB.MOM.WW.OtOpcUa.Core.Abstractions` before reporting. #### PR 1.2 — `IHistoryService` plugin host on the server **Parallel-key:** `server-history`. **Files** - Create: - `src/ZB.MOM.WW.OtOpcUa.Server/History/IHistoryRouter.cs` — namespace → `IHistorianDataSource`. - `src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryRouter.cs` — registry impl. - `src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryServiceAdapter.cs` — bridges OPC UA `HistoryRead`/`HistoryReadProcessed`/`HistoryReadAtTime`/ `HistoryReadEvents` to the router. - Modify: - `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — register `HistoryServiceAdapter`. *Locked file* — defer to wire-up PR 1.W. - Tests: - `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/History/HistoryRouterTests.cs`. **Acceptance** - Router resolves data source by namespace prefix. - Unknown namespace returns `BadHistoryOperationUnsupported` (or current status used for that case — verify against existing server behavior in `OpcUaServerService.cs` before coding). **Depends on:** 1.1 merged. #### PR 1.3 — Driver capability shrink: drop `IHistoryProvider` requirement **Parallel-key:** `server-history`. **Files** - Modify: - `src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs` (or wherever `IHistoryProvider` is consumed; locate via `Grep "IHistoryProvider"`). Replace direct calls with `IHistoryRouter.Resolve(...)`. - Tests: - Update any test that exercised `IHistoryProvider` paths to register a fake data source via the router. **Depends on:** 1.2 merged. #### PR 1.W — Phase 1 wire-up **Parallel-key:** locked-files. **Files** - `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — DI registration of `HistoryRouter` + the legacy Galaxy.Host historian adapter. - `ZB.MOM.WW.OtOpcUa.slnx` — no change unless a new project was added; if PR 1.1 went into the existing `Core.Abstractions` project, no slnx edit. ### Phase 1 parallel batches - **Batch 1a (sequential):** 1.1 → 1.2 → 1.3 → 1.W. Each blocks the next. - Total: one foreground sequence; no parallelism in Phase 1. Use one `general-purpose` agent across all four PRs, or one PR per agent in order. --- ## Phase 2 — Server-level alarm condition subsystem (lmxopcua) Goal: drop `GalaxyAlarmTracker` from the driver's responsibilities; the server runs the AlarmCondition state machine driven by `IsAlarm=true` attribute metadata. ### Tasks #### PR 2.1 — Address-space builder alarm-declaration API **Parallel-key:** `core-abs-alarms`. **Files** - Modify: - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs` — add `IAlarmConditionDeclaration MarkAsAlarmCondition(...)` (the method already exists per `GalaxyProxyDriver.cs:146`; verify shape and extend with the four sub-attribute references). - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Alarms/AlarmConditionInfo.cs` — add `InAlarmRef`, `PriorityRef`, `DescAttrNameRef`, `AckedRef`, `AckMsgWriteRef` fields. - Tests: - `tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Alarms/AlarmConditionInfoTests.cs`. **Acceptance** - Existing call sites (`GalaxyProxyDriver.DiscoverAsync`) still compile — add the new fields with safe defaults. #### PR 2.2 — `AlarmConditionService` (state machine) **Parallel-key:** `server-alarms`. **Files** - Create: - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs` - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionState.cs` - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/IAlarmAcknowledger.cs` - Reference impl to **port** (do not duplicate — read it for invariants): - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs` - Tests: - `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Alarms/AlarmConditionServiceTests.cs` — port the existing tracker tests (`tests/.../Galaxy.Host.Tests/`). **Subagent guidance** - **Two-step.** First a `Plan` agent: read `GalaxyAlarmTracker.cs` and produce a state-transition table + a list of tests to port. Then a `general-purpose` agent: implement `AlarmConditionService` against that table. **Depends on:** 2.1 merged. #### PR 2.3 — Wire alarm service into `DriverNodeManager` **Parallel-key:** `server-alarms`. **Files** - Modify: - `src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs` — on each driver's discovery, collect alarm declarations and hand to `AlarmConditionService` along with the driver's `ISubscribable` and `IWritable` for sub-attribute advise + ack writes. - Tests: - extend `DriverNodeManagerTests` with a fake driver that declares one alarm-bearing node. **Depends on:** 2.2 merged. #### PR 2.W — Phase 2 wire-up DI registration of `AlarmConditionService` in `OpcUaServerService.cs`. ### Phase 2 parallel batches - **Batch 2a (sequential):** 2.1 → 2.2 → 2.3 → 2.W. ### Phases 1 + 2 cross-batch parallelism PR 1.1 and PR 2.1 touch **different files** in `Core.Abstractions/` (one under `Historian/`, one in `IAddressSpaceBuilder.cs` + `Alarms/`). They are **parallel-safe**. PR 1.2/1.3 and PR 2.2/2.3 both modify `OpcUaServerService.cs` and `DriverNodeManager.cs`. They share **two locked files** — but only at the DI-registration level. If we split the `OpcUaServerService.cs` edits into a single combined wire-up PR (1+2.W), the body PRs 1.2/1.3 and 2.2/2.3 don't touch them. Then the body PRs *can* run in parallel batches across phase 1 and phase 2. **Recommended Phase 1+2 plan** (parallel): 1. Run **PR 1.1 and PR 2.1 in parallel** (two worktrees, two `general-purpose` agents). Both target `Core.Abstractions` only. 2. Merge both to integration branch. 3. Run **PR 1.2/1.3 and PR 2.2/2.3 in parallel**, each as a sequential 2-PR chain on its own worktree. Constraint: neither chain edits `OpcUaServerService.cs` or `DriverNodeManager.cs` — defer all DI/wiring to the combined wire-up. 4. Merge both chains. 5. **Combined wire-up PR 1+2.W** edits `OpcUaServerService.cs` and `DriverNodeManager.cs` once. --- ## Phase 3 — `Driver.Historian.Wonderware` sidecar Goal: house the existing `HistorianDataSource` code in its own .NET 4.8 x86 service, exposed over named pipe; ship a .NET 10 client implementing `IHistorianDataSource`. ### Tasks #### PR 3.1 — Create the sidecar shell project **Parallel-key:** `historian-sidecar-host`. **Files** - Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/` - `Driver.Historian.Wonderware.csproj` (`net48`, `x86`). - `Program.cs` — Serilog + console host + named pipe server (mirror `Driver.Galaxy.Host/Program.cs` shape: env-driven pipe name, allowed SID, shared secret). - Create test project: - `tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` - *Locked:* `.slnx`, `Install-Services.ps1` (wire-up). #### PR 3.2 — Lift `HistorianDataSource` & friends **Parallel-key:** `historian-sidecar-host`. **Files** - Move (preserve git history with `git mv`): - `src/.../Driver.Galaxy.Host/Backend/Historian/HistorianDataSource.cs` → `src/.../Driver.Historian.Wonderware/Backend/HistorianDataSource.cs` - `HistorianClusterEndpointPicker.cs` - `HistorianClusterNodeState.cs` - `HistorianConfiguration.cs` - `HistorianEventDto.cs` - `HistorianHealthSnapshot.cs` - `HistorianQualityMapper.cs` - `HistorianSample.cs` - `IHistorianConnectionFactory.cs` - Add a thin `IHistorianDataSource` shim in the sidecar that re-implements the **interface from `Core.Abstractions/Historian/`** (after PR 1.1). - Galaxy.Host needs to keep building until phase 7. Either: - Add `Driver.Historian.Wonderware` ProjectReference from `Driver.Galaxy.Host` and re-use the moved code, OR - Leave a stub copy in Galaxy.Host that delegates to the sidecar via the new client. Pick option 1 (cleaner). - Tests: - `git mv` matching test files from `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/Backend/Historian/` to `tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/`. **Depends on:** PR 1.1 merged (interface lives in Core.Abstractions). #### PR 3.3 — Pipe contract + handler **Parallel-key:** `historian-sidecar-pipe`. **Files** - Create: - `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Ipc/Contracts.cs` (MessagePack DTOs: `ReadRawRequest/Reply`, `ReadProcessedRequest/Reply`, `ReadAtTimeRequest/Reply`, `ReadEventsRequest/Reply`, **`WriteAlarmEventsRequest/Reply`** — alarm-event persistence write path; mirror today's `GalaxyHistorianWriter.WriteBatchAsync` payload so the SQLite store-and-forward sink in `Core.AlarmHistorian` can drain into the Wonderware historian event store after Galaxy.Proxy is deleted). - `Ipc/PipeServer.cs` — copy + adapt `Driver.Galaxy.Host/Ipc/PipeServer.cs` (same ACL/secret model). - `Ipc/HistorianFrameHandler.cs` — handles all five contract pairs above. - Tests: - `tests/.../Driver.Historian.Wonderware.Tests/Ipc/PipeRoundTripTests.cs` — round-trip every contract pair including `WriteAlarmEvents`. #### PR 3.4 — .NET 10 client **Parallel-key:** `historian-sidecar-client`. **Files** - Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/` (.NET 10 x64). Implements: - `IHistorianDataSource` (read path: raw / processed / at-time / events) against the sidecar pipe. - `IAlarmHistorianWriter` (write path: alarm-event persistence) against the sidecar pipe `WriteAlarmEvents` contract from PR 3.3. - Tests: - `tests/.../Driver.Historian.Wonderware.Client.Tests/` against an in-proc fake pipe server. Cover both the read interface and the alarm-event write interface; verify the SQLite store-and-forward sink (`Core.AlarmHistorian.SqliteStoreAndForwardSink`) drains successfully when the client is plugged in as its target. **Depends on:** PR 3.3 merged (contracts published). #### PR 3.W — Phase 3 wire-up **Files** - `ZB.MOM.WW.OtOpcUa.slnx` — register three new projects + two new test projects. - `scripts/install/Install-Services.ps1` — register `OtOpcUaWonderwareHistorian` NSSM service. - `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — register the client as both an `IHistorianDataSource` for the Galaxy namespace **and** the `IAlarmHistorianWriter` target for the SQLite store-and-forward sink, replacing today's `GalaxyProxyDriver.WriteBatchAsync` route. - `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` — `Historian:Wonderware` block. ### Phase 3 parallel batches - **Batch 3a (sequential):** 3.1 (shell) → 3.2 (lift code). - **Batch 3b (parallel after 3.2):** 3.3 (pipe) and 3.4 (client) — but 3.4 depends on 3.3's contracts. So sequential within Phase 3. - **Batch 3c:** 3.W. But Phase 3 is **fully independent of Phase 1.1's downstream work** once 1.1 has merged. Phase 3 can run in parallel with Phase 1.2/1.3 and all of Phase 2. **Recommended phasing**: kick off Phase 3 in parallel with Phase 2, both gated only on Phase 1.1's merge. --- ## Phase 4 — New `Driver.Galaxy` (Tier-A, .NET 10) against gw This is the bulk of the work. Each PR adds one capability to the new driver. The driver builds and links from PR 4.0 onward; capabilities arrive as incremental green bars. The driver lives at `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (note: same short name as the old `.Proxy`, but new project. The `.Host`, `.Proxy`, `.Shared` projects continue to coexist until phase 7). ### Tasks #### PR 4.0 — Project skeleton, options, factory **Parallel-key:** `galaxy-shell`. **Files** - Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` - `Driver.Galaxy.csproj` (.NET 10 x64), references `Core.Abstractions`, `Core`, `MxGateway.Client` (NuGet from gw repo). - `GalaxyDriver.cs` — `IDriver` + `IDisposable` skeleton; `Initialize` creates `MxGatewayClient` and opens a session; `Shutdown` disposes. - `Config/GalaxyDriverOptions.cs` — POCO matching the JSON shape in `lmx_mxgw.md`. - `GalaxyDriverFactoryExtensions.cs` — `AddGalaxyDriver(IServiceCollection)`. - Tests: - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/` (new project) - `Tests/GalaxyDriverInitializationTests.cs` — uses a fake `IMxGatewayClientTransport` to verify open-session behavior. - *Locked:* `.slnx` (wire-up PR 4.W). **Acceptance** - Driver builds, `Initialize` opens a session against a fake transport, `Shutdown` closes it. - `IDriver.RecycleAsync` (if present in the interface today) returns the same stub shape as the legacy backend — `{Accepted = true, GraceSeconds = 15}` — and is documented in the file as intentionally a no-op until a future PR wires it through gw. Today's `MxAccessGalaxyBackend.RecycleAsync` is itself a stub, so this preserves behavior exactly. #### PR 4.1 — `ITagDiscovery` via `GalaxyRepositoryClient` **Parallel-key:** `galaxy-discover`. **Files** - Create: - `src/.../Driver.Galaxy/Browse/GalaxyDiscoverer.cs` - `src/.../Driver.Galaxy/Browse/DataTypeMap.cs` — `mx_data_type → DriverDataType`. Port table from `GalaxyProxyDriver.MapDataType` (lines 523–532) and verify against `gr/data_type_mapping.md`. - `src/.../Driver.Galaxy/Browse/SecurityMap.cs` — port `GalaxyProxyDriver.MapSecurity` (lines 534–544). - `src/.../Driver.Galaxy/Browse/AlarmRefBuilder.cs` — for any attribute where `IsAlarm=true`, compute the five sub-attribute references by Galaxy naming convention (`..InAlarm`, `..Priority`, `..DescAttrName`, `..Acked`, `..AckMsg`) and populate `AlarmConditionInfo.{InAlarmRef, PriorityRef, DescAttrNameRef, AckedRef, AckMsgWriteRef}` before passing to `MarkAsAlarmCondition`. Mirrors today's behavior in `MxAccessGalaxyBackend.SubscribeAlarmsAsync` so the server-level `AlarmConditionService` (Phase 2) has every ref it needs. - Modify: - `GalaxyDriver.cs` — implement `ITagDiscovery.DiscoverAsync` calling discoverer. - Tests: - `Tests/Browse/GalaxyDiscovererTests.cs` — fake `IGalaxyRepositoryClientTransport` with canned `GalaxyObject` list. - `Tests/Browse/AlarmRefBuilderTests.cs` — for an alarm-bearing attribute, verify all five refs match the `..{...}` shape and round-trip cleanly through `MarkAsAlarmCondition`. **Acceptance** - Discovered nodes carry `mx_data_type`, `IsArray`, `ArrayDim`, `SecurityClassification`, `IsHistorized`, `IsAlarm` matching what the legacy backend produces (snapshot-compared in Phase 5). - Every `IsAlarm=true` attribute calls `MarkAsAlarmCondition` with all five sub-attribute refs populated. The `AlarmConditionService` from Phase 2 must be able to subscribe and ack without further help from the driver. **Subagent guidance** - Use an `Explore` agent first: "find every place in `Driver.Galaxy.Proxy/GalaxyProxyDriver.cs` that consumes `DiscoverHierarchyResponse` and list every wire field it reads, so we know what gw's proto must surface." **Depends on:** PR 4.0 merged + PR 0.1 (gw attribute parity) NuGet bumped. #### PR 4.2 — `IReadable` (one-shot read path) **Parallel-key:** `galaxy-read`. **Files** - Create: - `src/.../Driver.Galaxy/Runtime/GalaxyMxSession.cs` — owns `MxGatewaySession`, `Register` server handle, in-memory `tag → itemHandle` registry. - `src/.../Driver.Galaxy/Runtime/MxValueDecoder.cs` — `MxValue → object` (boolean/int32/float/double/string/datetime, plus array variants). - `src/.../Driver.Galaxy/Runtime/StatusCodeMap.cs` — explicit `MxStatusProxy → uint OPC UA StatusCode` mapping table. Today's coarse `vtq.Quality >= 192 ? Good : Uncertain_Placeholder` becomes a full mapping covering at minimum: `Good (0x0)`, `Uncertain (0x40000000)`, `Uncertain_LastUsableValue (0x40A40000)`, `Bad (0x80000000)`, `Bad_NotConnected (0x808A0000)`, `Bad_NoCommunication (0x80310000)`, `Bad_OutOfService (0x808D0000)`. Document any unmapped category as `Bad_InternalError` and log once with the raw `MxStatusProxy` so the matrix can be extended from field data. - Modify: - `GalaxyDriver.cs` — implement `IReadable.ReadAsync`: per tag, `AddItem` → short-lived `Advise` → first `OnDataChange`. (If Phase 0 added a synchronous `ReadAsync` RPC, use that; flag a follow-up if missing.) - Tests: - `Tests/Runtime/GalaxyReadTests.cs` — fake transport with scripted `OnDataChange` responses. - `Tests/Runtime/StatusCodeMapTests.cs` — exhaustive mapping cases plus "unknown category falls back to Bad_InternalError and emits a single diagnostic log" assertion. **Depends on:** PR 4.0. #### PR 4.3 — `IWritable` + secured-write routing **Parallel-key:** `galaxy-write`. **Files** - Create: - `src/.../Driver.Galaxy/Runtime/MxValueEncoder.cs` — `object → MxValue` (the inverse of 4.2's decoder; unify into one type if simpler). - Modify: - `GalaxyDriver.cs` — implement `IWritable.WriteAsync`. Route writes whose attribute carries `SecurityClassification.SecuredWrite` / `VerifiedWrite` through `WriteSecuredAsync` (mxaccessgw exposes this in `MxGatewaySession`). - Tests: - `Tests/Runtime/GalaxyWriteTests.cs` — verify the routing decision given each `SecurityClassification` value. **Depends on:** PR 4.2 merged (shares `GalaxyMxSession` + value type code). #### PR 4.4 — `ISubscribable` + `EventPump` **Parallel-key:** `galaxy-subscribe`. **Files** - Create: - `src/.../Driver.Galaxy/Runtime/SubscriptionRegistry.cs` — `(driverSubId → list)` and reverse map. - `src/.../Driver.Galaxy/Runtime/EventPump.cs` — single consumer of `MxGatewaySession.StreamEventsAsync`. Maps each `OnDataChange` to a `DataChangeEventArgs` per registered driver subscription. - `src/.../Driver.Galaxy/Runtime/GalaxySubscriptionHandle.cs` (port from Proxy). - Modify: - `GalaxyDriver.cs` — implement `ISubscribable.SubscribeAsync` using `SubscribeBulkAsync` with the `buffered_update_interval_ms` hint from PR 0.2. - Tests: - `Tests/Runtime/EventPumpFanoutTests.cs` — one item → multiple driver subscriptions → one event per driver subscription. - `Tests/Runtime/SubscribeBulkTests.cs` — partial failures. **Depends on:** PR 4.3. #### PR 4.5 — `ReconnectSupervisor` **Parallel-key:** `galaxy-reconnect`. **Files** - Create: - `src/.../Driver.Galaxy/Runtime/ReconnectSupervisor.cs` — state machine `(Healthy → TransportLost → ReopeningSession → ReplayingSubscriptions → Healthy)`. Surfaces `DriverState.Degraded` while not Healthy. - Modify: - `GalaxyDriver.cs` + `GalaxyMxSession.cs` — wire transport-error callbacks into the supervisor; replay subscriptions via `ReplaySubscriptionsCommand` (PR 0.3). - Tests: - `Tests/Runtime/ReconnectSupervisorTests.cs` with simulated drops. **Depends on:** PR 4.4. Strong recommend Phase 0.3 (replay RPC) merged. #### PR 4.6 — `IRediscoverable` via `WatchDeployEvents` **Parallel-key:** `galaxy-deploy`. **Files** - Create: - `src/.../Driver.Galaxy/Browse/DeployWatcher.cs` — long-lived consumer of `GalaxyRepositoryClient.WatchDeployEventsAsync`. - Modify: - `GalaxyDriver.cs` — start watcher on Initialize; raise `OnRediscoveryNeeded` per event. - Tests: - `Tests/Browse/DeployWatcherTests.cs`. **Depends on:** PR 4.0. **Independent of PR 4.2–4.5** — can run in parallel with all of them. #### PR 4.7 — `IHostConnectivityProbe` (transport health + per-platform probes) **Parallel-key:** `galaxy-health`. The current driver reports two flavors of host connectivity: 1. **Top-level transport health** — flips `Running`/`Stopped` on the synthetic host named after `OTOPCUA_GALAXY_CLIENT_NAME` whenever the MXAccess COM proxy connects/disconnects. 2. **Per-platform `ScanState` probes** — for each discovered `$WinPlatform` and `$AppEngine` gobject, advise its `ScanState` attribute and translate value transitions into per-host `Running`/`Stopped`/`Unknown`. Lives in `Driver.Galaxy.Host/Backend/Stability/GalaxyRuntimeProbeManager.cs`. This PR ports both. **Files** - Create: - `src/.../Driver.Galaxy/Health/HostConnectivityForwarder.cs` — consumes PR 0.4 `StreamSessionHealth` and surfaces the synthetic top-level host entry (named after the configured MXAccess `ClientName`). - `src/.../Driver.Galaxy/Health/PerPlatformProbeWatcher.cs` — port of `GalaxyRuntimeProbeManager`. On `Discover`, takes the list of discovered `$WinPlatform`/`$AppEngine` tag names, subscribes their `ScanState` via the driver's own `GalaxyMxSession.SubscribeBulkAsync` (or directly through the gw session), runs the same state machine (`OnProbeCallback` interpretation logic — port verbatim with tests), and raises per-host `HostStatusChangedEventArgs` through the aggregator below. - `src/.../Driver.Galaxy/Health/HostStatusAggregator.cs` — single sink that merges the forwarder's transport entry with the watcher's per-platform entries into the `IReadOnlyList` surfaced by `IHostConnectivityProbe.GetHostStatuses()`. Owns the de-dup + diff logic that today lives in `GalaxyProxyDriver.OnHostConnectivityUpdate`. - Modify: - `GalaxyDriver.cs` — wire forwarder + watcher + aggregator into Initialize. On every `ITagDiscovery.DiscoverAsync` completion (incl. re-discovery from PR 4.6), feed the watcher the fresh platform list so probe subscriptions follow Galaxy redeploys. - Tests: - `Tests/Health/HostConnectivityForwarderTests.cs`. - `Tests/Health/PerPlatformProbeWatcherTests.cs` — port the existing `GalaxyRuntimeProbeManagerTests` (or whatever covers `OnProbeCallback`) verbatim. Cover: initial subscribe on Discover, re-subscribe after Rediscover, value-transition state machine, cleanup on Shutdown. - `Tests/Health/HostStatusAggregatorTests.cs` — transport entry plus multiple per-platform entries, transitions, aggregator emits `OnHostStatusChanged` only on actual state change. **Acceptance** - Top-level transport up/down reflected within 1s of gw `SessionHealth` flip. - Each `$WinPlatform` / `$AppEngine` gobject in the discovered hierarchy produces exactly one entry in `GetHostStatuses()`, transitioning on `ScanState` changes. - After a redeploy that adds a new platform, the watcher subscribes its `ScanState` without restarting the driver. **Depends on:** PR 4.0 + PR 4.1 (needs the discoverer's platform list). **Independent of PR 4.2–4.6** — parallel-safe with the runtime track. #### PR 4.W — Backend-flag wiring **Parallel-key:** locked-files. **Files** - `src/.../Server/Configuration/DriverFactoryRegistry.cs` (or wherever drivers are wired) — add a `Galaxy:Backend` switch: - `legacy-host` → existing `GalaxyProxyDriver` registration (untouched). - `mxgateway` → new `GalaxyDriver` registration via PR 4.0's extension. - `src/.../Server/appsettings.json` — sample new config block. - `ZB.MOM.WW.OtOpcUa.slnx` — register `Driver.Galaxy` and its tests. - `CLAUDE.md` — note new driver, retain old driver pointers. **Acceptance** - With `Galaxy:Backend=legacy-host` (default), unchanged behavior. - With `Galaxy:Backend=mxgateway`, server boots against the new driver and passes a smoke test against the dev gw. ### Phase 4 parallel batches Dependency graph: ``` 4.0 (shell) ──┬── 4.1 (discover) ──┬── 4.6 (deploy) │ └── 4.7 (health: needs platform list) ├── 4.2 (read) ── 4.3 (write) ── 4.4 (subscribe) ── 4.5 (reconnect) │ \ │ → 4.W (wire-up) └── (no longer parallel-with-4.1: 4.7 moved under 4.1) ``` - After 4.0 merges, **4.1 and the 4.2-chain head** can run in two parallel worktrees. - After 4.1 merges, **4.6 and 4.7** can run in two parallel worktrees. - 4.2 → 4.3 → 4.4 → 4.5 is one sequential chain on its own worktree (they all touch `GalaxyDriver.cs` and `GalaxyMxSession.cs`) and runs alongside the discover/deploy/health track. - 4.W gathers everything. **Recommended Phase 4 plan:** - Stage 1 (after 4.0): two worktrees — W1: 4.1; W2: 4.2 → 4.3 → 4.4 → 4.5. - Stage 2 (after 4.1 merges, W2 still running): three worktrees — W1: 4.6; W3: 4.7; W2: continues runtime chain. - Stage 3: 4.W wire-up. --- ## Phase 5 — Parity test matrix ### Tasks #### PR 5.1 — `Driver.Galaxy.ParityTests` project **Parallel-key:** `parity-shell`. **Files** - Create: `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/` - `ParityHarness.cs` — boots the OtOpcUa server twice with each backend, drives the same OPC UA scenarios, captures structured snapshots. - Theory data per scenario (browse, subscribe, alarm transition, write by classification, history read). - Reuses existing live-Galaxy fixtures from `tests/.../Driver.Galaxy.E2E/`. #### PR 5.2 — Browse + read parity scenarios **Parallel-key:** `parity-browse`. #### PR 5.3 — Subscribe + event-rate parity scenarios **Parallel-key:** `parity-subscribe`. #### PR 5.4 — Write-by-classification parity scenarios **Parallel-key:** `parity-write`. #### PR 5.5 — Alarm-transition parity scenarios **Parallel-key:** `parity-alarms`. Cover both: - **Live transitions:** Active / Acknowledged / Inactive sequences against `.InAlarm` / `.Acked` value flips on the dev Galaxy. Must match legacy-host event ordering and severity mapping. - **Alarm-event persistence:** trigger N alarm transitions, then verify the SQLite store-and-forward sink drains them into the Wonderware historian event store via the new sidecar's `WriteAlarmEvents` contract (PR 3.3). Compare the persisted rows to those produced by the legacy `GalaxyHistorianWriter` path. #### PR 5.6 — History-read parity scenarios **Parallel-key:** `parity-history`. #### PR 5.7 — Reconnect/disruption scenarios **Parallel-key:** `parity-reconnect`. #### PR 5.8 — Per-platform `ScanState` probe parity **Parallel-key:** `parity-probes`. Verify the new `PerPlatformProbeWatcher` (PR 4.7) produces the same per-host `HostConnectivityStatus` stream as the legacy `GalaxyRuntimeProbeManager`: - Initial state on Discover for each `$WinPlatform` / `$AppEngine`. - Transition events when a runtime is stopped/started on the dev Galaxy. - Re-subscription after a redeploy that adds/removes a platform. - Cleanup of probe subscriptions on Shutdown (no leaked advises in gw). #### PR 5.W — Parity matrix doc **Files** - `docs/v2/Galaxy.ParityMatrix.md` — table of scenario × result for both backends. Resolved deltas marked, accepted deltas justified. ### Phase 5 parallel batches After 5.1 lands, scenarios 5.2–5.8 are **fully parallel** — they each add a separate test class file. Seven worktrees, seven `general-purpose` agents. 5.W runs after all scenarios merge and pass. --- ## Phase 6 — Performance + hardening ### Tasks #### PR 6.1 — OpenTelemetry traces **Parallel-key:** `perf-otel`. #### PR 6.2 — Bounded channel + drop-newest metrics **Parallel-key:** `perf-eventpump`. #### PR 6.3 — Buffered update interval landing **Parallel-key:** `perf-buffered`. Wire `MxAccess:PublishingIntervalMs` → `SetBufferedUpdateInterval` once gw exposes it. #### PR 6.4 — Soak test scenario **Parallel-key:** `perf-soak`. 50k tags, 24h, automated metric collection. #### PR 6.5 — Tune `MxGatewayClientOptions` defaults **Parallel-key:** `perf-tuning`. Based on soak data. #### PR 6.W — Performance doc `docs/v2/Galaxy.Performance.md`. ### Phase 6 parallel batches 6.1, 6.2, 6.3 all touch `Driver.Galaxy/Runtime/`. Serialize them, OR split files explicitly: - 6.1 owns a new `Runtime/Tracing.cs` injected via decorator. Parallel-safe. - 6.2 owns `Runtime/EventPump.cs`. Conflicts with PR 4.4 only if reordered; not in parallel with 6.1 if 6.1 also wraps EventPump. Decide upfront: PR 6.1 wraps at the gateway-client boundary, PR 6.2 owns EventPump internals. Parallel-safe. - 6.3 modifies `GalaxyDriver.SubscribeAsync` only. Parallel-safe. So 6.1, 6.2, 6.3 parallel, then 6.4 (depends on all three). 6.5 sequential after 6.4 (uses its data). 6.W last. --- ## Phase 7 — Retire legacy ### Tasks #### PR 7.1 — Default flip **Parallel-key:** `retire-defaults`. **Files** - `src/.../Server/appsettings.json` → `Galaxy:Backend = mxgateway`. - `scripts/e2e/e2e-config.sample.json` → drop `OTOPCUA_GALAXY_*` pipe vars, add gw endpoint. - `scripts/install/Install-Services.ps1` → remove `OtOpcUaGalaxyHost` registration; keep `OtOpcUaWonderwareHistorian` from PR 3.W. #### PR 7.2 — Delete legacy projects **Parallel-key:** `retire-delete`. **Files** - Delete: - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/` - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/` - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/` - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/` - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/` - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests/` - Modify: - `ZB.MOM.WW.OtOpcUa.slnx` — remove the six entries. - `Server/Configuration/DriverFactoryRegistry.cs` — remove the `legacy-host` switch arm. **Depends on:** PR 7.1 fully soaked (no rollback risk). #### PR 7.3 — Doc + memory housekeeping **Parallel-key:** `retire-docs`. **Files** - `CLAUDE.md` — rewrite Galaxy section. - `docs/v2/dev-environment.md` — drop `OtOpcUaGalaxyHost` references. - `docs/ServiceHosting.md`, `docs/Redundancy.md`, `docs/security.md` — scrub `Galaxy.Host`/`Galaxy.Proxy` mentions. - `~/.claude/projects/.../memory/MEMORY.md` — retire entries: - `project_galaxy_host_service.md` - `project_galaxy_host_installed.md` - `project_aveva_platform_installed.md` (revise — server box no longer needs AVEVA; gw box does) - Delete: - `mxaccess_documentation.md` (no longer consumed by this repo). - Add memory entry: `project_galaxy_via_mxgateway.md`. ### Phase 7 parallel batches - **Batch 7a (sequential, gated by phase 6 production soak):** 7.1. - **Batch 7b (parallel after 7.1):** 7.2 (`retire-delete`) and 7.3 (`retire-docs`) — disjoint files. --- ## Cross-phase dependency graph ``` Phase 0 (gw repo) ────────────────────────────────────┐ │ Phase 1.1 (Core.Abs/Historian) ──┐ │ ├── Phase 1.2/1.3 │ │ (server History)│ Phase 2.1 (Core.Abs/Alarms) ──────┤ │ ├── Phase 2.2/2.3 │ │ (server Alarms) │ │ │ └── Phase 3 (sidecar host + client) │ │ └─────────┴── Phase 4 (Driver.Galaxy) │ Phase 5 (parity) │ Phase 6 (perf) │ Phase 7 (retire) ``` ### Maximum-parallelism rollout (one possible execution) - **Day 0–N (mxaccessgw):** Phase 0 batches 0a + 0b + 0.W in parallel worktrees, separate repo from this one — runs in parallel with everything below until consumers need the gw bump. - **Day 0–N (this repo):** Phases 1.1 and 2.1 in parallel (two worktrees). Merge. - **Day N+:** Phases 1.2/1.3, 2.2/2.3, 3.1+3.2+3.3+3.4 in parallel (three worktrees, each a sequential chain). - **Day M:** combined wire-up PR 1+2.W, then PR 3.W. Server passes existing e2e against legacy backend. - **Day M+:** Phase 4.0 lands. Phase 4 fan-out (four worktrees) starts. - **Day P:** Phase 4 wire-up. Phase 5 fan-out (six worktrees) starts. - **Day Q:** Phase 5 wire-up. Phase 6 fan-out (three worktrees + sequential). - **Day R:** Phase 7. Done. --- ## Subagent prompt template Re-use this shell when launching any of the parallel coding tasks. Replace `` parts. ``` You are implementing PR from lmx_mxgw_impl.md (""). Repo: <C:\Users\dohertj2\Desktop\lmxopcua | C:\Users\dohertj2\Desktop\mxaccessgw>. Worktree: <path>. Scope (you may create/edit only these files): <list> DO NOT edit: - Any file outside the scope above - ZB.MOM.WW.OtOpcUa.slnx / mxaccessgw/MxGateway.sln - src/.../Server/Program.cs, OpcUaServerService.cs, appsettings.json - scripts/install/Install-Services.ps1 - scripts/e2e/e2e-config.sample.json - CLAUDE.md, docs/**, MEMORY.md, mxaccess_documentation.md Acceptance: <list> Tests: <list> If you find a needed change outside scope, STOP and surface it as a finding rather than editing — it will be picked up by the wire-up PR. Before reporting completion: 1. Run `dotnet build <smallest project tree that covers your scope>`. 2. Run the new/changed tests. 3. Report: files changed, test command + result, any out-of-scope findings. ``` --- ## Risk register (operational) | Risk | When it bites | Mitigation | |---|---|---| | Phase 0 gw bump breaks existing mxaccessgw consumers | Phase 0 wire-up | Cross-language smoke matrix in mxaccessgw must run before merge | | Two parallel agents both edit `OpcUaServerService.cs` despite the rule | Phases 1+2 parallel | Wire-up convention + grep-based pre-merge check (`git diff --stat origin/main` of locked files in the integration branch must be empty until the wire-up PR) | | Subagent silently adds a stray `using` to a locked file | Anytime | The build-and-test step in the prompt will fail if the locked file changed and broke compile; a `git diff --name-only` whitelist check at integration-branch merge time enforces it | | Galaxy.Host can't build during phase 3.2 because lifted files vanished | Phase 3 mid-flight | PR 3.2 adds a ProjectReference from Galaxy.Host to Driver.Historian.Wonderware so the moved files remain reachable; tests cover both call sites | | Phase 4 chain stalls because gw exposes no synchronous read | PR 4.2 | Surface as a Phase 0 finding immediately — add a `ReadCommand` to gw or accept short-lived advise as the read mechanism (document as a perf accepted delta in 5.W) | | Phase 5 parity matrix exposes a delta no one wants to fix | Phase 5 | Phase 7 gating: `Galaxy:Backend=mxgateway` does not become default until every parity delta is either resolved or has a written acceptance from the user | | Soak test in 6.4 finds a memory leak in `EventPump` | Phase 6 | EventPump bounded-channel design (PR 6.2) is shipped before soak so the leak is bounded by design | | Stale memory file references retired code after phase 7 | Phase 7 | PR 7.3 explicitly retires `project_galaxy_host_*` entries; add a memory-audit step to phase-close checklist | --- ## Phase-close checklist (apply at the end of each phase) Before declaring a phase done: 1. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` clean on integration branch. 2. `dotnet test ZB.MOM.WW.OtOpcUa.slnx` clean (or all-but-known-skipped). 3. Live-Galaxy smoke (when applicable) green on dev box. 4. No locked files modified outside their wire-up PR (`git log --name-only origin/main..HEAD -- <locked-paths>` shows only the wire-up commit). 5. `MEMORY.md` updated for any persistent context this phase introduced. 6. Doc updates limited to the phase's scope (no doc edits sprinkled across non-doc PRs).