Files
lmxopcua/lmx_mxgw_impl.md
Joseph Doherty 006af51768 docs: post-PR-7.2 cleanup — audit + three-track scrub
Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:59:59 -04:00

44 KiB
Raw Blame History

Completed 2026-04-30 — historical record of the v2-mxgw implementation plan.

All 39 PRs across 7 phases (1.11.3 + 2.12.3 + 1+2.W + 3.13.W + 4.04.W + 5.15.W + 6.16.W + 7.17.3) shipped and merged to master at commit ae7106d. Per-phase status tracking below is preserved as the historical PR-execution log; phase descriptions are retrospective, not pending. Parity matrix verified green on the dev rig 2026-04-30 (14 passed / 1 skipped / 0 failed — see docs/v2/Galaxy.ParityMatrix.md).

Galaxy → MxGateway Migration — Detailed Implementation Plan

Companion to lmx_mxgw.md (design plan). This document breaks the plan into PR-sized tasks with concrete file paths, acceptance checks, test deltas, and explicit parallel-safety analysis for subagent execution.

Cross-repo scope:

  • lmxopcua (this repo) — drivers, server, install scripts, e2e, docs.
  • mxaccessgw (C:\Users\dohertj2\Desktop\mxaccessgw) — gRPC gateway, worker, .NET client.

How to use parallel subagents safely

The plan lists each task with a parallel-key. Two tasks share a key when they touch the same file(s); tasks with disjoint keys are safe to run in parallel. Tasks within the same phase that share a key MUST run sequentially.

Subagent execution rules

  1. One git worktree per parallel subagent. Spawn each parallel agent with Agent({ isolation: "worktree", ... }) so they never collide on the working tree. Merge back to a shared integration branch after each parallel batch completes.
  2. Interface-defining tasks run first, then their consumers. Anywhere the plan says "PR X.0: define interface", that PR must merge to the integration branch before its consumers fan out in parallel.
  3. Shared-file edits serialize. Files touched by more than one PR in a batch — ZB.MOM.WW.OtOpcUa.slnx, Install-Services.ps1, appsettings.json, CLAUDE.md, MEMORY.md — get a single dedicated "wire-up" PR at the end of the batch that ingests every parallel branch's needed line. Don't let parallel agents edit them.
  4. Test fixtures own their fixture file. When two PRs both need a FakeMxGatewayClient, the first PR creates it and exposes the contract; subsequent PRs add cases to the same file or extend it via partial class in their own test files.
  5. Subagent prompt must include the parallel-key and disallowed paths. Any agent prompt must say "you may NOT edit <sln file>, <wire-up files>, or files outside <your scope>. If you discover a needed change there, surface it as a task for the wire-up PR; do not make it yourself." This prevents merge conflicts at integration time.
  6. Choose the right subagent type.
    • Explore — read-only research/locate. Cheap. Use before any PR that needs to learn the surrounding code.
    • Plan — produce a step-by-step PR plan from a brief; no code writes. Use when a task description below is too coarse for a fresh agent.
    • general-purpose — code-writing. Use for PRs that create/modify source.
    • code-simplifier — post-PR cleanup pass on the same files.
    • codex:rescue — a stuck PR; use sparingly.
  7. Foreground vs. background. Run one PR foreground if its result gates the rest of your work this turn. Run the rest in background and read results when they complete.
  8. Trust but verify. After every subagent claims completion, the parent runs the build (dotnet build ZB.MOM.WW.OtOpcUa.slnx) and the target tests. The agent's report is hearsay until the build is green.
  9. Worktree cleanup. When isolation: "worktree" returns no path, nothing was changed; if it returns a path, integrate by cherry-picking or fast-forwarding into the integration branch, then prune the worktree.

Locked files (never edit from a parallel batch)

These get a dedicated wire-up PR at the end of each phase's parallel fanout:

File Why locked
ZB.MOM.WW.OtOpcUa.slnx New project additions stack and conflict
src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json Config schema additions stack
src/ZB.MOM.WW.OtOpcUa.Server/Program.cs (or Startup.cs) DI registrations stack
scripts/install/Install-Services.ps1 Service registrations stack
scripts/e2e/e2e-config.sample.json E2E config stacks
CLAUDE.md, docs/v2/dev-environment.md Doc edits stack
MEMORY.md (auto-memory index) One line per change; conflicts often
mxaccessgw/MxGateway.sln Same reason as our slnx
mxaccessgw/clients/proto/*.proto files Proto edits stack and reorder field numbers

Phase 0 — mxaccessgw foundation work

Repo: C:\Users\dohertj2\Desktop\mxaccessgw. Branch off main per task.

PR Title Parallel-key Files
0.1 Galaxy attribute metadata parity gw-proto-galaxy clients/proto/galaxy_repository.proto, src/MxGateway.Server/Galaxy/AttributeMapper.cs, src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs, gr/-equivalent SQL in src/MxGateway.Server/Galaxy/Sql/, contract tests
0.2 Bulk subscribe with publishing-interval hint gw-proto-mxaccess clients/proto/mxaccess_gateway.proto (extend SubscribeBulkCommand with optional uint32 buffered_update_interval_ms), src/MxGateway.Worker/MxAccess/Commands/SubscribeBulkHandler.cs, src/MxGateway.Server/Sessions/Mappers.cs, worker tests
0.3 Subscription replay RPC gw-proto-mxaccess Same proto file as 0.2 (add ReplaySubscriptionsCommand), src/MxGateway.Worker/MxAccess/Commands/ReplaySubscriptionsHandler.cs, gateway forwarder, tests
0.4 Session health stream gw-proto-mxaccess Same proto (add StreamSessionHealth(SessionId) returns (stream SessionHealth)), src/MxGateway.Server/Sessions/SessionHealthService.cs, dashboard projection, tests
0.5 Document event-stream resume contract gw-docs docs/Sessions.md, docs/gateway-process-design.md — define retention bound, events_lost signal in MxEvent envelope
0.6 .NET client MxValue adapter + SubscribeWithCallback gw-dotnet-client clients/dotnet/MxGateway.Client/MxValueAdapter.cs (new), clients/dotnet/MxGateway.Client/MxGatewaySession.cs (extend with SubscribeWithCallbackAsync), clients/dotnet/MxGateway.Client.Tests/
0.7 API key scopes + mxgw-key minting CLI gw-auth src/MxGateway.Server/Auth/, src/MxGateway.Cli/, docs/Authentication.md

Phase 0 parallel batches

  • Batch 0a (parallel): 0.1 (gw-proto-galaxy), 0.5 (gw-docs), 0.6 (gw-dotnet-client), 0.7 (gw-auth). Four worktrees, four general-purpose agents.
  • Batch 0b (sequential within key, parallel across keys): 0.2 → 0.3 → 0.4 all share gw-proto-mxaccess. Land them in order on the same agent (or three sequential calls). Field number assignment must be coordinated through the wire-up PR.
  • Wire-up 0.W: integrate proto-generated descriptors, regenerate clients/proto/descriptors, run cross-language smoke matrix.

Phase 0 exit: mxaccessgw main carries all seven PRs. Tag the gw NuGet release. Bump MxGateway.Client consumed by lmxopcua.


Phase 1 — Server-level historian extension point (lmxopcua)

Goal: detach IHistorianDataSource from the Galaxy driver. Server's HistoryRead* operations call into a registered data source by namespace, not into IHistoryProvider on the driver.

Tasks

PR 1.1 — Lift IHistorianDataSource to Core.Abstractions

Parallel-key: core-abs-historian (locks files in Core.Abstractions/Historian/).

Files

  • Create:
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianSample.cs
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianAggregateSample.cs
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianEvent.cs
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianHealthSnapshot.cs
  • Move-from (Galaxy.Host originals stay until phase 7; new copies live in Core.Abstractions and are pure POCO):
    • source bodies in src/.../Driver.Galaxy.Host/Backend/Historian/
  • Modify:
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj (no change if files auto-included)
  • Tests:
    • tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Historian/IHistorianDataSourceContractTests.cs — contract documentation tests (null arg behavior, time-range ordering).

Acceptance

  • dotnet build clean.
  • New tests run and pass.
  • Galaxy.Host still compiles (it keeps its own copies until phase 7).

Subagent prompt boilerplate (template — re-use this shape for each PR):

You are working in worktree <path>. Create the files listed in PR 1.1 of lmx_mxgw_impl.md. Do NOT edit any file under Driver.Galaxy.Host/, appsettings.json, the .slnx, or Program.cs. The DTOs are pure value records — do not import OPC UA types or COM types. Run dotnet build src/ZB.MOM.WW.OtOpcUa.Core.Abstractions before reporting.

PR 1.2 — IHistoryService plugin host on the server

Parallel-key: server-history.

Files

  • Create:
    • src/ZB.MOM.WW.OtOpcUa.Server/History/IHistoryRouter.cs — namespace → IHistorianDataSource.
    • src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryRouter.cs — registry impl.
    • src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryServiceAdapter.cs — bridges OPC UA HistoryRead/HistoryReadProcessed/HistoryReadAtTime/ HistoryReadEvents to the router.
  • Modify:
    • src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs — register HistoryServiceAdapter. Locked file — defer to wire-up PR 1.W.
  • Tests:
    • tests/ZB.MOM.WW.OtOpcUa.Server.Tests/History/HistoryRouterTests.cs.

Acceptance

  • Router resolves data source by namespace prefix.
  • Unknown namespace returns BadHistoryOperationUnsupported (or current status used for that case — verify against existing server behavior in OpcUaServerService.cs before coding).

Depends on: 1.1 merged.

PR 1.3 — Driver capability shrink: drop IHistoryProvider requirement

Parallel-key: server-history.

Files

  • Modify:
    • src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs (or wherever IHistoryProvider is consumed; locate via Grep "IHistoryProvider"). Replace direct calls with IHistoryRouter.Resolve(...).
  • Tests:
    • Update any test that exercised IHistoryProvider paths to register a fake data source via the router.

Depends on: 1.2 merged.

PR 1.W — Phase 1 wire-up

Parallel-key: locked-files.

Files

  • src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs — DI registration of HistoryRouter + the legacy Galaxy.Host historian adapter.
  • ZB.MOM.WW.OtOpcUa.slnx — no change unless a new project was added; if PR 1.1 went into the existing Core.Abstractions project, no slnx edit.

Phase 1 parallel batches

  • Batch 1a (sequential): 1.1 → 1.2 → 1.3 → 1.W. Each blocks the next.
  • Total: one foreground sequence; no parallelism in Phase 1. Use one general-purpose agent across all four PRs, or one PR per agent in order.

Phase 2 — Server-level alarm condition subsystem (lmxopcua)

Goal: drop GalaxyAlarmTracker from the driver's responsibilities; the server runs the AlarmCondition state machine driven by IsAlarm=true attribute metadata.

Tasks

PR 2.1 — Address-space builder alarm-declaration API

Parallel-key: core-abs-alarms.

Files

  • Modify:
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs — add IAlarmConditionDeclaration MarkAsAlarmCondition(...) (the method already exists per GalaxyProxyDriver.cs:146; verify shape and extend with the four sub-attribute references).
    • src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Alarms/AlarmConditionInfo.cs — add InAlarmRef, PriorityRef, DescAttrNameRef, AckedRef, AckMsgWriteRef fields.
  • Tests:
    • tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Alarms/AlarmConditionInfoTests.cs.

Acceptance

  • Existing call sites (GalaxyProxyDriver.DiscoverAsync) still compile — add the new fields with safe defaults.

PR 2.2 — AlarmConditionService (state machine)

Parallel-key: server-alarms.

Files

  • Create:
    • src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs
    • src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionState.cs
    • src/ZB.MOM.WW.OtOpcUa.Server/Alarms/IAlarmAcknowledger.cs
  • Reference impl to port (do not duplicate — read it for invariants):
    • src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs
  • Tests:
    • tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Alarms/AlarmConditionServiceTests.cs — port the existing tracker tests (tests/.../Galaxy.Host.Tests/).

Subagent guidance

  • Two-step. First a Plan agent: read GalaxyAlarmTracker.cs and produce a state-transition table + a list of tests to port. Then a general-purpose agent: implement AlarmConditionService against that table.

Depends on: 2.1 merged.

PR 2.3 — Wire alarm service into DriverNodeManager

Parallel-key: server-alarms.

Files

  • Modify:
    • src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs — on each driver's discovery, collect alarm declarations and hand to AlarmConditionService along with the driver's ISubscribable and IWritable for sub-attribute advise + ack writes.
  • Tests:
    • extend DriverNodeManagerTests with a fake driver that declares one alarm-bearing node.

Depends on: 2.2 merged.

PR 2.W — Phase 2 wire-up

DI registration of AlarmConditionService in OpcUaServerService.cs.

Phase 2 parallel batches

  • Batch 2a (sequential): 2.1 → 2.2 → 2.3 → 2.W.

Phases 1 + 2 cross-batch parallelism

PR 1.1 and PR 2.1 touch different files in Core.Abstractions/ (one under Historian/, one in IAddressSpaceBuilder.cs + Alarms/). They are parallel-safe.

PR 1.2/1.3 and PR 2.2/2.3 both modify OpcUaServerService.cs and DriverNodeManager.cs. They share two locked files — but only at the DI-registration level. If we split the OpcUaServerService.cs edits into a single combined wire-up PR (1+2.W), the body PRs 1.2/1.3 and 2.2/2.3 don't touch them. Then the body PRs can run in parallel batches across phase 1 and phase 2.

Recommended Phase 1+2 plan (parallel):

  1. Run PR 1.1 and PR 2.1 in parallel (two worktrees, two general-purpose agents). Both target Core.Abstractions only.
  2. Merge both to integration branch.
  3. Run PR 1.2/1.3 and PR 2.2/2.3 in parallel, each as a sequential 2-PR chain on its own worktree. Constraint: neither chain edits OpcUaServerService.cs or DriverNodeManager.cs — defer all DI/wiring to the combined wire-up.
  4. Merge both chains.
  5. Combined wire-up PR 1+2.W edits OpcUaServerService.cs and DriverNodeManager.cs once.

Phase 3 — Driver.Historian.Wonderware sidecar

Goal: house the existing HistorianDataSource code in its own .NET 4.8 x86 service, exposed over named pipe; ship a .NET 10 client implementing IHistorianDataSource.

Tasks

PR 3.1 — Create the sidecar shell project

Parallel-key: historian-sidecar-host.

Files

  • Create project: src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/
    • Driver.Historian.Wonderware.csproj (<TargetFramework>net48</TargetFramework>, <PlatformTarget>x86</PlatformTarget>).
    • Program.cs — Serilog + console host + named pipe server (mirror Driver.Galaxy.Host/Program.cs shape: env-driven pipe name, allowed SID, shared secret).
  • Create test project:
    • tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/
  • Locked: .slnx, Install-Services.ps1 (wire-up).

PR 3.2 — Lift HistorianDataSource & friends

Parallel-key: historian-sidecar-host.

Files

  • Move (preserve git history with git mv):
    • src/.../Driver.Galaxy.Host/Backend/Historian/HistorianDataSource.cssrc/.../Driver.Historian.Wonderware/Backend/HistorianDataSource.cs
    • HistorianClusterEndpointPicker.cs
    • HistorianClusterNodeState.cs
    • HistorianConfiguration.cs
    • HistorianEventDto.cs
    • HistorianHealthSnapshot.cs
    • HistorianQualityMapper.cs
    • HistorianSample.cs
    • IHistorianConnectionFactory.cs
  • Add a thin IHistorianDataSource shim in the sidecar that re-implements the interface from Core.Abstractions/Historian/ (after PR 1.1).
  • Galaxy.Host needs to keep building until phase 7. Either:
    • Add Driver.Historian.Wonderware ProjectReference from Driver.Galaxy.Host and re-use the moved code, OR
    • Leave a stub copy in Galaxy.Host that delegates to the sidecar via the new client. Pick option 1 (cleaner).
  • Tests:
    • git mv matching test files from tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/Backend/Historian/ to tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/.

Depends on: PR 1.1 merged (interface lives in Core.Abstractions).

PR 3.3 — Pipe contract + handler

Parallel-key: historian-sidecar-pipe.

Files

  • Create:
    • src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Ipc/Contracts.cs (MessagePack DTOs: ReadRawRequest/Reply, ReadProcessedRequest/Reply, ReadAtTimeRequest/Reply, ReadEventsRequest/Reply, WriteAlarmEventsRequest/Reply — alarm-event persistence write path; mirror today's GalaxyHistorianWriter.WriteBatchAsync payload so the SQLite store-and-forward sink in Core.AlarmHistorian can drain into the Wonderware historian event store after Galaxy.Proxy is deleted).
    • Ipc/PipeServer.cs — copy + adapt Driver.Galaxy.Host/Ipc/PipeServer.cs (same ACL/secret model).
    • Ipc/HistorianFrameHandler.cs — handles all five contract pairs above.
  • Tests:
    • tests/.../Driver.Historian.Wonderware.Tests/Ipc/PipeRoundTripTests.cs — round-trip every contract pair including WriteAlarmEvents.

PR 3.4 — .NET 10 client

Parallel-key: historian-sidecar-client.

Files

  • Create project: src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/ (.NET 10 x64). Implements:
    • IHistorianDataSource (read path: raw / processed / at-time / events) against the sidecar pipe.
    • IAlarmHistorianWriter (write path: alarm-event persistence) against the sidecar pipe WriteAlarmEvents contract from PR 3.3.
  • Tests:
    • tests/.../Driver.Historian.Wonderware.Client.Tests/ against an in-proc fake pipe server. Cover both the read interface and the alarm-event write interface; verify the SQLite store-and-forward sink (Core.AlarmHistorian.SqliteStoreAndForwardSink) drains successfully when the client is plugged in as its target.

Depends on: PR 3.3 merged (contracts published).

PR 3.W — Phase 3 wire-up

Files

  • ZB.MOM.WW.OtOpcUa.slnx — register three new projects + two new test projects.
  • scripts/install/Install-Services.ps1 — register OtOpcUaWonderwareHistorian NSSM service.
  • src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs — register the client as both an IHistorianDataSource for the Galaxy namespace and the IAlarmHistorianWriter target for the SQLite store-and-forward sink, replacing today's GalaxyProxyDriver.WriteBatchAsync route.
  • src/ZB.MOM.WW.OtOpcUa.Server/appsettings.jsonHistorian:Wonderware block.

Phase 3 parallel batches

  • Batch 3a (sequential): 3.1 (shell) → 3.2 (lift code).
  • Batch 3b (parallel after 3.2): 3.3 (pipe) and 3.4 (client) — but 3.4 depends on 3.3's contracts. So sequential within Phase 3.
  • Batch 3c: 3.W.

But Phase 3 is fully independent of Phase 1.1's downstream work once 1.1 has merged. Phase 3 can run in parallel with Phase 1.2/1.3 and all of Phase 2.

Recommended phasing: kick off Phase 3 in parallel with Phase 2, both gated only on Phase 1.1's merge.


Phase 4 — New Driver.Galaxy (Tier-A, .NET 10) against gw

This is the bulk of the work. Each PR adds one capability to the new driver. The driver builds and links from PR 4.0 onward; capabilities arrive as incremental green bars.

The driver lives at src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ (note: same short name as the old .Proxy, but new project. The .Host, .Proxy, .Shared projects continue to coexist until phase 7).

Tasks

PR 4.0 — Project skeleton, options, factory

Parallel-key: galaxy-shell.

Files

  • Create project: src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/
    • Driver.Galaxy.csproj (.NET 10 x64), references Core.Abstractions, Core, MxGateway.Client (NuGet from gw repo).
    • GalaxyDriver.csIDriver + IDisposable skeleton; Initialize creates MxGatewayClient and opens a session; Shutdown disposes.
    • Config/GalaxyDriverOptions.cs — POCO matching the JSON shape in lmx_mxgw.md.
    • GalaxyDriverFactoryExtensions.csAddGalaxyDriver(IServiceCollection).
  • Tests:
    • tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ (new project)
    • Tests/GalaxyDriverInitializationTests.cs — uses a fake IMxGatewayClientTransport to verify open-session behavior.
  • Locked: .slnx (wire-up PR 4.W).

Acceptance

  • Driver builds, Initialize opens a session against a fake transport, Shutdown closes it.
  • IDriver.RecycleAsync (if present in the interface today) returns the same stub shape as the legacy backend — {Accepted = true, GraceSeconds = 15} — and is documented in the file as intentionally a no-op until a future PR wires it through gw. Today's MxAccessGalaxyBackend.RecycleAsync is itself a stub, so this preserves behavior exactly.

PR 4.1 — ITagDiscovery via GalaxyRepositoryClient

Parallel-key: galaxy-discover.

Files

  • Create:
    • src/.../Driver.Galaxy/Browse/GalaxyDiscoverer.cs
    • src/.../Driver.Galaxy/Browse/DataTypeMap.csmx_data_type → DriverDataType. Port table from GalaxyProxyDriver.MapDataType (lines 523532) and verify against gr/data_type_mapping.md.
    • src/.../Driver.Galaxy/Browse/SecurityMap.cs — port GalaxyProxyDriver.MapSecurity (lines 534544).
    • src/.../Driver.Galaxy/Browse/AlarmRefBuilder.cs — for any attribute where IsAlarm=true, compute the five sub-attribute references by Galaxy naming convention (<tag>.<attr>.InAlarm, <tag>.<attr>.Priority, <tag>.<attr>.DescAttrName, <tag>.<attr>.Acked, <tag>.<attr>.AckMsg) and populate AlarmConditionInfo.{InAlarmRef, PriorityRef, DescAttrNameRef, AckedRef, AckMsgWriteRef} before passing to MarkAsAlarmCondition. Mirrors today's behavior in MxAccessGalaxyBackend.SubscribeAlarmsAsync so the server-level AlarmConditionService (Phase 2) has every ref it needs.
  • Modify:
    • GalaxyDriver.cs — implement ITagDiscovery.DiscoverAsync calling discoverer.
  • Tests:
    • Tests/Browse/GalaxyDiscovererTests.cs — fake IGalaxyRepositoryClientTransport with canned GalaxyObject list.
    • Tests/Browse/AlarmRefBuilderTests.cs — for an alarm-bearing attribute, verify all five refs match the <tag>.<attr>.{...} shape and round-trip cleanly through MarkAsAlarmCondition.

Acceptance

  • Discovered nodes carry mx_data_type, IsArray, ArrayDim, SecurityClassification, IsHistorized, IsAlarm matching what the legacy backend produces (snapshot-compared in Phase 5).
  • Every IsAlarm=true attribute calls MarkAsAlarmCondition with all five sub-attribute refs populated. The AlarmConditionService from Phase 2 must be able to subscribe and ack without further help from the driver.

Subagent guidance

  • Use an Explore agent first: "find every place in Driver.Galaxy.Proxy/GalaxyProxyDriver.cs that consumes DiscoverHierarchyResponse and list every wire field it reads, so we know what gw's proto must surface."

Depends on: PR 4.0 merged + PR 0.1 (gw attribute parity) NuGet bumped.

PR 4.2 — IReadable (one-shot read path)

Parallel-key: galaxy-read.

Files

  • Create:
    • src/.../Driver.Galaxy/Runtime/GalaxyMxSession.cs — owns MxGatewaySession, Register server handle, in-memory tag → itemHandle registry.
    • src/.../Driver.Galaxy/Runtime/MxValueDecoder.csMxValue → object (boolean/int32/float/double/string/datetime, plus array variants).
    • src/.../Driver.Galaxy/Runtime/StatusCodeMap.cs — explicit MxStatusProxy → uint OPC UA StatusCode mapping table. Today's coarse vtq.Quality >= 192 ? Good : Uncertain_Placeholder becomes a full mapping covering at minimum: Good (0x0), Uncertain (0x40000000), Uncertain_LastUsableValue (0x40A40000), Bad (0x80000000), Bad_NotConnected (0x808A0000), Bad_NoCommunication (0x80310000), Bad_OutOfService (0x808D0000). Document any unmapped category as Bad_InternalError and log once with the raw MxStatusProxy so the matrix can be extended from field data.
  • Modify:
    • GalaxyDriver.cs — implement IReadable.ReadAsync: per tag, AddItem → short-lived Advise → first OnDataChange. (If Phase 0 added a synchronous ReadAsync RPC, use that; flag a follow-up if missing.)
  • Tests:
    • Tests/Runtime/GalaxyReadTests.cs — fake transport with scripted OnDataChange responses.
    • Tests/Runtime/StatusCodeMapTests.cs — exhaustive mapping cases plus "unknown category falls back to Bad_InternalError and emits a single diagnostic log" assertion.

Depends on: PR 4.0.

PR 4.3 — IWritable + secured-write routing

Parallel-key: galaxy-write.

Files

  • Create:
    • src/.../Driver.Galaxy/Runtime/MxValueEncoder.csobject → MxValue (the inverse of 4.2's decoder; unify into one type if simpler).
  • Modify:
    • GalaxyDriver.cs — implement IWritable.WriteAsync. Route writes whose attribute carries SecurityClassification.SecuredWrite / VerifiedWrite through WriteSecuredAsync (mxaccessgw exposes this in MxGatewaySession).
  • Tests:
    • Tests/Runtime/GalaxyWriteTests.cs — verify the routing decision given each SecurityClassification value.

Depends on: PR 4.2 merged (shares GalaxyMxSession + value type code).

PR 4.4 — ISubscribable + EventPump

Parallel-key: galaxy-subscribe.

Files

  • Create:
    • src/.../Driver.Galaxy/Runtime/SubscriptionRegistry.cs(driverSubId → list<itemHandle>) and reverse map.
    • src/.../Driver.Galaxy/Runtime/EventPump.cs — single consumer of MxGatewaySession.StreamEventsAsync. Maps each OnDataChange to a DataChangeEventArgs per registered driver subscription.
    • src/.../Driver.Galaxy/Runtime/GalaxySubscriptionHandle.cs (port from Proxy).
  • Modify:
    • GalaxyDriver.cs — implement ISubscribable.SubscribeAsync using SubscribeBulkAsync with the buffered_update_interval_ms hint from PR 0.2.
  • Tests:
    • Tests/Runtime/EventPumpFanoutTests.cs — one item → multiple driver subscriptions → one event per driver subscription.
    • Tests/Runtime/SubscribeBulkTests.cs — partial failures.

Depends on: PR 4.3.

PR 4.5 — ReconnectSupervisor

Parallel-key: galaxy-reconnect.

Files

  • Create:
    • src/.../Driver.Galaxy/Runtime/ReconnectSupervisor.cs — state machine (Healthy → TransportLost → ReopeningSession → ReplayingSubscriptions → Healthy). Surfaces DriverState.Degraded while not Healthy.
  • Modify:
    • GalaxyDriver.cs + GalaxyMxSession.cs — wire transport-error callbacks into the supervisor; replay subscriptions via ReplaySubscriptionsCommand (PR 0.3).
  • Tests:
    • Tests/Runtime/ReconnectSupervisorTests.cs with simulated drops.

Depends on: PR 4.4. Strong recommend Phase 0.3 (replay RPC) merged.

PR 4.6 — IRediscoverable via WatchDeployEvents

Parallel-key: galaxy-deploy.

Files

  • Create:
    • src/.../Driver.Galaxy/Browse/DeployWatcher.cs — long-lived consumer of GalaxyRepositoryClient.WatchDeployEventsAsync.
  • Modify:
    • GalaxyDriver.cs — start watcher on Initialize; raise OnRediscoveryNeeded per event.
  • Tests:
    • Tests/Browse/DeployWatcherTests.cs.

Depends on: PR 4.0. Independent of PR 4.24.5 — can run in parallel with all of them.

PR 4.7 — IHostConnectivityProbe (transport health + per-platform probes)

Parallel-key: galaxy-health.

The current driver reports two flavors of host connectivity:

  1. Top-level transport health — flips Running/Stopped on the synthetic host named after OTOPCUA_GALAXY_CLIENT_NAME whenever the MXAccess COM proxy connects/disconnects.
  2. Per-platform ScanState probes — for each discovered $WinPlatform and $AppEngine gobject, advise its ScanState attribute and translate value transitions into per-host Running/Stopped/Unknown. Lives in Driver.Galaxy.Host/Backend/Stability/GalaxyRuntimeProbeManager.cs.

This PR ports both.

Files

  • Create:
    • src/.../Driver.Galaxy/Health/HostConnectivityForwarder.cs — consumes PR 0.4 StreamSessionHealth and surfaces the synthetic top-level host entry (named after the configured MXAccess ClientName).
    • src/.../Driver.Galaxy/Health/PerPlatformProbeWatcher.cs — port of GalaxyRuntimeProbeManager. On Discover, takes the list of discovered $WinPlatform/$AppEngine tag names, subscribes their ScanState via the driver's own GalaxyMxSession.SubscribeBulkAsync (or directly through the gw session), runs the same state machine (OnProbeCallback interpretation logic — port verbatim with tests), and raises per-host HostStatusChangedEventArgs through the aggregator below.
    • src/.../Driver.Galaxy/Health/HostStatusAggregator.cs — single sink that merges the forwarder's transport entry with the watcher's per-platform entries into the IReadOnlyList<HostConnectivityStatus> surfaced by IHostConnectivityProbe.GetHostStatuses(). Owns the de-dup + diff logic that today lives in GalaxyProxyDriver.OnHostConnectivityUpdate.
  • Modify:
    • GalaxyDriver.cs — wire forwarder + watcher + aggregator into Initialize. On every ITagDiscovery.DiscoverAsync completion (incl. re-discovery from PR 4.6), feed the watcher the fresh platform list so probe subscriptions follow Galaxy redeploys.
  • Tests:
    • Tests/Health/HostConnectivityForwarderTests.cs.
    • Tests/Health/PerPlatformProbeWatcherTests.cs — port the existing GalaxyRuntimeProbeManagerTests (or whatever covers OnProbeCallback) verbatim. Cover: initial subscribe on Discover, re-subscribe after Rediscover, value-transition state machine, cleanup on Shutdown.
    • Tests/Health/HostStatusAggregatorTests.cs — transport entry plus multiple per-platform entries, transitions, aggregator emits OnHostStatusChanged only on actual state change.

Acceptance

  • Top-level transport up/down reflected within 1s of gw SessionHealth flip.
  • Each $WinPlatform / $AppEngine gobject in the discovered hierarchy produces exactly one entry in GetHostStatuses(), transitioning on ScanState changes.
  • After a redeploy that adds a new platform, the watcher subscribes its ScanState without restarting the driver.

Depends on: PR 4.0 + PR 4.1 (needs the discoverer's platform list). Independent of PR 4.24.6 — parallel-safe with the runtime track.

PR 4.W — Backend-flag wiring

Parallel-key: locked-files.

Files

  • src/.../Server/Configuration/DriverFactoryRegistry.cs (or wherever drivers are wired) — add a Galaxy:Backend switch:
    • legacy-host → existing GalaxyProxyDriver registration (untouched).
    • mxgateway → new GalaxyDriver registration via PR 4.0's extension.
  • src/.../Server/appsettings.json — sample new config block.
  • ZB.MOM.WW.OtOpcUa.slnx — register Driver.Galaxy and its tests.
  • CLAUDE.md — note new driver, retain old driver pointers.

Acceptance

  • With Galaxy:Backend=legacy-host (default), unchanged behavior.
  • With Galaxy:Backend=mxgateway, server boots against the new driver and passes a smoke test against the dev gw.

Phase 4 parallel batches

Dependency graph:

4.0 (shell) ──┬── 4.1 (discover) ──┬── 4.6 (deploy)
              │                    └── 4.7 (health: needs platform list)
              ├── 4.2 (read) ── 4.3 (write) ── 4.4 (subscribe) ── 4.5 (reconnect)
              │                                                          \
              │                                                           → 4.W (wire-up)
              └── (no longer parallel-with-4.1: 4.7 moved under 4.1)
  • After 4.0 merges, 4.1 and the 4.2-chain head can run in two parallel worktrees.
  • After 4.1 merges, 4.6 and 4.7 can run in two parallel worktrees.
  • 4.2 → 4.3 → 4.4 → 4.5 is one sequential chain on its own worktree (they all touch GalaxyDriver.cs and GalaxyMxSession.cs) and runs alongside the discover/deploy/health track.
  • 4.W gathers everything.

Recommended Phase 4 plan:

  • Stage 1 (after 4.0): two worktrees — W1: 4.1; W2: 4.2 → 4.3 → 4.4 → 4.5.
  • Stage 2 (after 4.1 merges, W2 still running): three worktrees — W1: 4.6; W3: 4.7; W2: continues runtime chain.
  • Stage 3: 4.W wire-up.

Phase 5 — Parity test matrix

Tasks

PR 5.1 — Driver.Galaxy.ParityTests project

Parallel-key: parity-shell.

Files

  • Create: tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
    • ParityHarness.cs — boots the OtOpcUa server twice with each backend, drives the same OPC UA scenarios, captures structured snapshots.
    • Theory data per scenario (browse, subscribe, alarm transition, write by classification, history read).
  • Reuses existing live-Galaxy fixtures from tests/.../Driver.Galaxy.E2E/.

PR 5.2 — Browse + read parity scenarios

Parallel-key: parity-browse.

PR 5.3 — Subscribe + event-rate parity scenarios

Parallel-key: parity-subscribe.

PR 5.4 — Write-by-classification parity scenarios

Parallel-key: parity-write.

PR 5.5 — Alarm-transition parity scenarios

Parallel-key: parity-alarms.

Cover both:

  • Live transitions: Active / Acknowledged / Inactive sequences against .InAlarm / .Acked value flips on the dev Galaxy. Must match legacy-host event ordering and severity mapping.
  • Alarm-event persistence: trigger N alarm transitions, then verify the SQLite store-and-forward sink drains them into the Wonderware historian event store via the new sidecar's WriteAlarmEvents contract (PR 3.3). Compare the persisted rows to those produced by the legacy GalaxyHistorianWriter path.

PR 5.6 — History-read parity scenarios

Parallel-key: parity-history.

PR 5.7 — Reconnect/disruption scenarios

Parallel-key: parity-reconnect.

PR 5.8 — Per-platform ScanState probe parity

Parallel-key: parity-probes.

Verify the new PerPlatformProbeWatcher (PR 4.7) produces the same per-host HostConnectivityStatus stream as the legacy GalaxyRuntimeProbeManager:

  • Initial state on Discover for each $WinPlatform / $AppEngine.
  • Transition events when a runtime is stopped/started on the dev Galaxy.
  • Re-subscription after a redeploy that adds/removes a platform.
  • Cleanup of probe subscriptions on Shutdown (no leaked advises in gw).

PR 5.W — Parity matrix doc

Files

  • docs/v2/Galaxy.ParityMatrix.md — table of scenario × result for both backends. Resolved deltas marked, accepted deltas justified.

Phase 5 parallel batches

After 5.1 lands, scenarios 5.25.8 are fully parallel — they each add a separate test class file. Seven worktrees, seven general-purpose agents.

5.W runs after all scenarios merge and pass.


Phase 6 — Performance + hardening

Tasks

PR 6.1 — OpenTelemetry traces

Parallel-key: perf-otel.

PR 6.2 — Bounded channel + drop-newest metrics

Parallel-key: perf-eventpump.

PR 6.3 — Buffered update interval landing

Parallel-key: perf-buffered. Wire MxAccess:PublishingIntervalMsSetBufferedUpdateInterval once gw exposes it.

PR 6.4 — Soak test scenario

Parallel-key: perf-soak. 50k tags, 24h, automated metric collection.

PR 6.5 — Tune MxGatewayClientOptions defaults

Parallel-key: perf-tuning. Based on soak data.

PR 6.W — Performance doc

docs/v2/Galaxy.Performance.md.

Phase 6 parallel batches

6.1, 6.2, 6.3 all touch Driver.Galaxy/Runtime/. Serialize them, OR split files explicitly:

  • 6.1 owns a new Runtime/Tracing.cs injected via decorator. Parallel-safe.
  • 6.2 owns Runtime/EventPump.cs. Conflicts with PR 4.4 only if reordered; not in parallel with 6.1 if 6.1 also wraps EventPump. Decide upfront: PR 6.1 wraps at the gateway-client boundary, PR 6.2 owns EventPump internals. Parallel-safe.
  • 6.3 modifies GalaxyDriver.SubscribeAsync only. Parallel-safe.

So 6.1, 6.2, 6.3 parallel, then 6.4 (depends on all three). 6.5 sequential after 6.4 (uses its data). 6.W last.


Phase 7 — Retire legacy

Tasks

PR 7.1 — Default flip

Parallel-key: retire-defaults.

Files

  • src/.../Server/appsettings.jsonGalaxy:Backend = mxgateway.
  • scripts/e2e/e2e-config.sample.json → drop OTOPCUA_GALAXY_* pipe vars, add gw endpoint.
  • scripts/install/Install-Services.ps1 → remove OtOpcUaGalaxyHost registration; keep OtOpcUaWonderwareHistorian from PR 3.W.

PR 7.2 — Delete legacy projects

Parallel-key: retire-delete.

Files

  • Delete:
    • src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/
    • src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/
    • src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/
    • tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/
    • tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/
    • tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests/
  • Modify:
    • ZB.MOM.WW.OtOpcUa.slnx — remove the six entries.
    • Server/Configuration/DriverFactoryRegistry.cs — remove the legacy-host switch arm.

Depends on: parity matrix in docs/v2/Galaxy.ParityMatrix.md is fully green or carries documented accepted-deltas (verified 2026-04-30 on the dev rig: 14 passed / 1 skipped / 0 failed).

PR 7.3 — Doc + memory housekeeping

Parallel-key: retire-docs.

Files

  • CLAUDE.md — rewrite Galaxy section.
  • docs/v2/dev-environment.md — drop OtOpcUaGalaxyHost references.
  • docs/ServiceHosting.md, docs/Redundancy.md, docs/security.md — scrub Galaxy.Host/Galaxy.Proxy mentions.
  • ~/.claude/projects/.../memory/MEMORY.md — retire entries:
    • project_galaxy_host_service.md
    • project_galaxy_host_installed.md
    • project_aveva_platform_installed.md (revise — server box no longer needs AVEVA; gw box does)
  • Delete:
    • mxaccess_documentation.md (no longer consumed by this repo).
  • Add memory entry: project_galaxy_via_mxgateway.md.

Phase 7 parallel batches

  • Batch 7a (sequential, gated by phase 6 production soak): 7.1.
  • Batch 7b (parallel after 7.1): 7.2 (retire-delete) and 7.3 (retire-docs) — disjoint files.

Cross-phase dependency graph

Phase 0 (gw repo) ────────────────────────────────────┐
                                                      │
Phase 1.1 (Core.Abs/Historian) ──┐                    │
                                  ├── Phase 1.2/1.3   │
                                  │   (server History)│
Phase 2.1 (Core.Abs/Alarms) ──────┤                   │
                                  ├── Phase 2.2/2.3   │
                                  │   (server Alarms) │
                                  │                   │
                                  └── Phase 3 (sidecar host + client)
                                            │         │
                                            └─────────┴── Phase 4 (Driver.Galaxy)
                                                                │
                                                                Phase 5 (parity)
                                                                │
                                                                Phase 6 (perf)
                                                                │
                                                                Phase 7 (retire)

Maximum-parallelism rollout (one possible execution)

  • Day 0N (mxaccessgw): Phase 0 batches 0a + 0b + 0.W in parallel worktrees, separate repo from this one — runs in parallel with everything below until consumers need the gw bump.
  • Day 0N (this repo): Phases 1.1 and 2.1 in parallel (two worktrees). Merge.
  • Day N+: Phases 1.2/1.3, 2.2/2.3, 3.1+3.2+3.3+3.4 in parallel (three worktrees, each a sequential chain).
  • Day M: combined wire-up PR 1+2.W, then PR 3.W. Server passes existing e2e against legacy backend.
  • Day M+: Phase 4.0 lands. Phase 4 fan-out (four worktrees) starts.
  • Day P: Phase 4 wire-up. Phase 5 fan-out (six worktrees) starts.
  • Day Q: Phase 5 wire-up. Phase 6 fan-out (three worktrees + sequential).
  • Day R: Phase 7. Done.

Subagent prompt template

Re-use this shell when launching any of the parallel coding tasks. Replace <bracketed> parts.

You are implementing PR <id> from lmx_mxgw_impl.md ("<title>").
Repo: <C:\Users\dohertj2\Desktop\lmxopcua | C:\Users\dohertj2\Desktop\mxaccessgw>.
Worktree: <path>.

Scope (you may create/edit only these files):
<list>

DO NOT edit:
- Any file outside the scope above
- ZB.MOM.WW.OtOpcUa.slnx / mxaccessgw/MxGateway.sln
- src/.../Server/Program.cs, OpcUaServerService.cs, appsettings.json
- scripts/install/Install-Services.ps1
- scripts/e2e/e2e-config.sample.json
- CLAUDE.md, docs/**, MEMORY.md, mxaccess_documentation.md

Acceptance:
<list>

Tests:
<list>

If you find a needed change outside scope, STOP and surface it as a
finding rather than editing — it will be picked up by the wire-up PR.

Before reporting completion:
1. Run `dotnet build <smallest project tree that covers your scope>`.
2. Run the new/changed tests.
3. Report: files changed, test command + result, any out-of-scope
   findings.

Risk register (operational)

Risk When it bites Mitigation
Phase 0 gw bump breaks existing mxaccessgw consumers Phase 0 wire-up Cross-language smoke matrix in mxaccessgw must run before merge
Two parallel agents both edit OpcUaServerService.cs despite the rule Phases 1+2 parallel Wire-up convention + grep-based pre-merge check (git diff --stat origin/main of locked files in the integration branch must be empty until the wire-up PR)
Subagent silently adds a stray using to a locked file Anytime The build-and-test step in the prompt will fail if the locked file changed and broke compile; a git diff --name-only whitelist check at integration-branch merge time enforces it
Galaxy.Host can't build during phase 3.2 because lifted files vanished Phase 3 mid-flight PR 3.2 adds a ProjectReference from Galaxy.Host to Driver.Historian.Wonderware so the moved files remain reachable; tests cover both call sites
Phase 4 chain stalls because gw exposes no synchronous read PR 4.2 Surface as a Phase 0 finding immediately — add a ReadCommand to gw or accept short-lived advise as the read mechanism (document as a perf accepted delta in 5.W)
Phase 5 parity matrix exposes a delta no one wants to fix Phase 5 Phase 7 gating: Galaxy:Backend=mxgateway does not become default until every parity delta is either resolved or has a written acceptance from the user
Soak test in 6.4 finds a memory leak in EventPump Phase 6 EventPump bounded-channel design (PR 6.2) is shipped before soak so the leak is bounded by design
Stale memory file references retired code after phase 7 Phase 7 PR 7.3 explicitly retires project_galaxy_host_* entries; add a memory-audit step to phase-close checklist

Phase-close checklist (apply at the end of each phase)

Before declaring a phase done:

  1. dotnet build ZB.MOM.WW.OtOpcUa.slnx clean on integration branch.
  2. dotnet test ZB.MOM.WW.OtOpcUa.slnx clean (or all-but-known-skipped).
  3. Live-Galaxy smoke (when applicable) green on dev box.
  4. No locked files modified outside their wire-up PR (git log --name-only origin/main..HEAD -- <locked-paths> shows only the wire-up commit).
  5. MEMORY.md updated for any persistent context this phase introduced.
  6. Doc updates limited to the phase's scope (no doc edits sprinkled across non-doc PRs).