Files
lmxopcua/lmx_mxgw.md
Joseph Doherty 006af51768 docs: post-PR-7.2 cleanup — audit + three-track scrub
Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:59:59 -04:00

487 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw migration design.**
>
> This document is the design doc that drove the migration from the
> legacy out-of-process Galaxy.Host topology to the in-process
> GalaxyDriver + mxaccessgw architecture. Option 1 (the in-process
> driver path) was selected and implemented across 39 PRs spanning
> phases 07, merged to master at commit `ae7106d`. For current
> architecture see `CLAUDE.md`, `docs/drivers/Galaxy.md`, and
> `docs/v2/Galaxy.Performance.md`.
# Galaxy → MxAccessGateway Migration Plan
Implements **Option 1** from `lmx_backend.md`: replace the bespoke `Galaxy.Host`
+ `Galaxy.Proxy` IPC pair with an **in-process Tier-A** `Driver.Galaxy` running
in the .NET 10 OtOpcUa server, talking to a separately-deployed
`MxGateway.Server` (mxaccessgw repo) over gRPC for live MXAccess work and
Galaxy Repository browse.
## Outcome
After this work:
- `OtOpcUa.Server` is fully .NET 10 x64 — no x86 build artifacts in this repo.
- `Driver.Galaxy.Host` (Windows service, NSSM-wrapped, .NET 4.8 x86) is
retired. `Driver.Galaxy.Proxy` and `Driver.Galaxy.Shared` are deleted.
AVEVA platform is no longer required on the OtOpcUa box.
- A new in-process `Driver.Galaxy` lives next to `Driver.Modbus`,
`Driver.OpcUaClient`, etc. It implements the same `IDriver` capability set
the proxy implements today, but its body calls `MxGateway.Client`
(`MxGatewayClient`, `MxGatewaySession`, `GalaxyRepositoryClient`).
- Wonderware Historian SDK access moves out of the Galaxy driver into a
driver-agnostic historian data source (`Driver.Historian.Wonderware`,
separate sidecar, .NET 4.8 x86). The OPC UA HA service plugs into it the
same way it would plug into any future historian.
- Alarm condition tracking moves out of the driver into the OPC UA server's
generic A&E subsystem. The driver only flags `IsAlarm=true` on attribute
metadata and forwards live `.InAlarm`/`.Acked`/etc value changes; the
server runs the AlarmCondition state machine.
- Per-platform `ScanState` probes degrade to plain attribute subscriptions —
no special probe manager.
---
## Pre-flight: improvements to land in mxaccessgw first
These are **integration-quality changes** in the mxaccessgw repo that make
the OtOpcUa side dramatically simpler / faster / more robust. They aren't
strictly required to start, but ship enough of them before phase 3 that we're
not designing around gaps.
### gw-1. Galaxy attribute metadata parity
**What's there:** `galaxy_repository.v1.DiscoverHierarchy` returns
`GalaxyObject` with name, parent, category, and dynamic attributes.
**What's missing for OtOpcUa:** every field today's `MxAccessGalaxyBackend`
copies into `GalaxyAttributeInfo` — confirm gw's `Attribute` proto carries:
- `mx_data_type` (int)
- `is_array` (bool)
- `array_dimension` (uint, optional)
- `security_classification` (int)
- `is_historized` (bool, from `HistorizedExtension` primitive)
- `is_alarm` (bool, from `AlarmExtension` primitive)
If any are missing, add them to the proto and the server-side query mapper.
Without `IsAlarm` and `IsHistorized` the OPC UA server can't decide which
nodes get HasHistoricalConfiguration / which become AlarmConditions.
### gw-2. Stable, documented event-stream resume semantics
**What's needed:** the OtOpcUa driver must survive a transient gw transport
drop without losing subscription state or duplicating change events. gw's
`StreamEventsAsync(afterWorkerSequence)` already exposes resumption.
Document the per-session retention window (how long does the worker buffer
events the gateway hasn't acked?) and the "events were dropped, you must
re-subscribe" signal. If retention is bounded by count rather than time,
expose the bound in `OpenSessionReply` so the client can size its own buffer.
### gw-3. Reconnectable sessions
Listed under "post-v1 revisit" in `gateway.md`. Without it, every gw or
OtOpcUa restart re-`Register`s, re-`AddItem`s, re-`Advise`s the entire
address space — for a 50k-tag Galaxy that's a non-trivial cold-start. With
reconnectable sessions, the driver presents its `SessionId` after a restart
and the worker keeps its handles.
If full reconnection is too large, ship a **bulk replay** instead: a single
RPC that takes the full subscription set and the worker performs the
register/add/advise inside one round trip. We can drive it from a
client-side cache rather than gw state. See gw-5 below.
### gw-4. Driver-shaped subscribe primitive
`MxGatewaySession` already has `SubscribeBulkAsync` (one RPC: `Register`
implicit + `AddItem` + `Advise` for a list of tag addresses, returning
per-tag `SubscribeResult`). That's exactly what `ISubscribable.SubscribeAsync`
wants. Confirm it returns enough per-tag detail to surface a partial-failure
list to OPC UA monitored items (good handle, status code, error text).
If not already, expose **`SubscribeBulk` with optional update-rate hint**
forwarded to `SetBufferedUpdateInterval` so the OPC UA publishing interval
becomes a single field on the subscribe call rather than a follow-up RPC.
### gw-5. Subscription replay snapshot
Provide an RPC `ReplaySubscriptionsAsync(SessionId, IEnumerable<TagAddress>)`
that re-establishes a list of subscriptions after a session reset and returns
per-tag results. The client stores its tag list locally (the driver already
has it from `Discover`), and the gw worker turns it into one
register/add/advise sequence. This is the minimum surface we need; full
"reattach to a previous session by id" (gw-3) is a richer version of the
same thing.
### gw-6. Transport-health stream
The gw already exposes worker / session health on its dashboard. Add a small
streaming RPC `StreamSessionHealth(SessionId) → stream SessionHealth` so the
OtOpcUa driver can surface "MXAccess transport up/down" to its
`IHostConnectivityProbe` without faking it via probe-tag subscriptions.
Today `MxAccessClient.ConnectionStateChanged` does this in-process; we want
the same signal at the gw boundary.
### gw-7. Optional .NET 10 client polish
- Async-disposable session pattern is already there.
- Add a **typed `MxValue` ⇄ `object` adapter** for the seven Galaxy types
OtOpcUa cares about (Boolean, Int32, Float, Double, String, DateTime,
arrays of the same). Today every consumer writes its own `MxValue.From<T>`
helpers; this shaves boilerplate from the driver.
- Add a **`SubscribeWithCallback`** convenience wrapper that combines
`OpenSession` + `SubscribeBulk` + `StreamEvents` and routes events through
a delegate per tag. Keeps the OPC UA driver from re-implementing the
fan-out / sequencer pattern.
### gw-8. Auth minimums
Document API-key scoping as it applies to OtOpcUa: the server identity needs
`session`, `invoke`, `event`, and `metadata:read` scopes. Provide a CLI to
mint a key bound to those scopes for an OtOpcUa instance.
### gw-9. Performance: bulk paths and value coalescing
- Confirm `SubscribeBulkAsync` is implemented as a single MXAccess
`AddItem`+`Advise` loop on the worker, not N pipe round trips. If not, fix
before we drive 50k-tag Galaxies through it.
- Expose `SetBufferedUpdateInterval` per session so OtOpcUa can request
buffered updates at the OPC UA publishing interval and get one batched
`OnBufferedDataChange` per tick rather than N `OnDataChange` events.
These can all ship in mxaccessgw independently and improve every consumer.
---
## OtOpcUa-side improvements to land in parallel
Some are forced by removing `Galaxy.Host`; others are quality-of-life.
### ot-1. Promote `IHistorianDataSource` to a server-level extension point
Today `IHistorianDataSource` is a Galaxy-internal abstraction in
`Driver.Galaxy.Host`. Lift it to `OtOpcUa.Core.Abstractions` (or a similar
home next to `IDriver`) and let the OPC UA HA service consume **any number
of registered data sources** keyed by node namespace. Drivers don't own
historian access; the server mounts data sources alongside drivers. This is
the prerequisite that lets us move Wonderware Historian out of the Galaxy
driver without losing the feature.
### ot-2. Generic alarm condition state machine in the server
Move the `.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` quartet handling
out of `GalaxyAlarmTracker` into a server-level alarm subsystem keyed off the
`IsAlarm=true` flag drivers set during discovery. The server subscribes to
the four sub-attributes itself and runs the AlarmCondition state machine.
Driver only:
- declares `IsAlarm=true` in `DriverAttributeInfo`,
- forwards plain attribute value changes (already done by `ISubscribable`).
This is also a precondition for future drivers (Modbus DL205 alarm bits,
S7 alarm DBs) to emit alarms without each writing their own tracker.
### ot-3. Driver capabilities trim
After ot-1 and ot-2, `Driver.Galaxy` no longer needs to implement:
- `IHistoryProvider` (server's HA service handles it via Wonderware
historian data source)
- `IAlarmHistorianWriter` (server's A&E historian, or kept generic — Galaxy
shouldn't own the SQLite path)
- `IAlarmSource` ack route (server-level alarm subsystem writes back via the
driver's `IWritable.WriteAsync`, which the gw already supports)
Keep:
- `IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`,
`IRediscoverable`, `IHostConnectivityProbe`.
### ot-4. Treat `time_of_last_deploy` as `IRediscoverable`'s pump
Replace the Host-side change-detection poll with a managed
`GalaxyRepositoryClient.WatchDeployEventsAsync` consumer in the driver.
Each event raises `OnRediscoveryNeeded` with the new deploy time as the
`scopeHint`. No polling code in this repo.
### ot-5. Connection pool at the server, not the driver
If the redundancy pair runs two OtOpcUa instances against one gw, both
should share a single `GrpcChannel` per process (already gRPC default) but
**different sessions** (one MXAccess client identity per OtOpcUa instance,
not one shared session that fights over Wonderware client state). Encode
the per-instance MXAccess client name in driver config — already partly
there (`OTOPCUA_GALAXY_CLIENT_NAME`); make it explicit in the new driver's
`appsettings.json` shape.
---
## Phased implementation
Each phase is a working, mergeable slice. Keep `Galaxy.Host` running
alongside the new driver until phase 7 — gated by a config switch
`Galaxy:Backend = legacy-host | mxgateway`.
### Phase 0 — pre-flight (mxaccessgw repo)
Ship gw-1, gw-2, gw-4, gw-9 (the parity, performance, and contract bits the
plan immediately depends on). gw-3, gw-5, gw-6, gw-7 can come during or
after phase 5.
**Exit:** local OtOpcUa dev box can `MxGatewayClient.Create` a client, open a
session, `SubscribeBulkAsync` 100 tags, and observe `OnDataChange` events at
the configured update rate.
### Phase 1 — server-level historian extension point (ot-1)
1. Extract `IHistorianDataSource` (and its DTOs `HistorianSample`,
`HistorianAggregateSample`, `HistoricalEvent`) from
`Driver.Galaxy.Host/Backend/Historian/` into
`src/ZB.MOM.WW.OtOpcUa.Core/Abstractions/Historian/`.
2. Extend the OPC UA HA service to look up a registered
`IHistorianDataSource` per namespace and call into it for `HistoryRead`,
`HistoryReadProcessed`, `HistoryReadAtTime`, `HistoryReadEvents`. Drivers
stop implementing `IHistoryProvider` directly; the server proxies.
3. Add a no-op default registration so drivers without history keep working.
**Exit:** all current Galaxy history reads route through an
`IHistorianDataSource` registered by `Driver.Galaxy.Host` (still legacy)
without behavior change. Other drivers untouched.
### Phase 2 — server-level alarm subsystem (ot-2)
1. Add an `IAlarmConditionDeclaration` API on the address-space builder so
discovery can flag a node as alarm-bearing and supply the four
sub-attribute references.
2. Add a hosted `AlarmConditionService` in the server that, on driver
`Discover`, subscribes to the four sub-attributes via the driver's own
`ISubscribable`, runs the state machine, and emits
`IAlarmSource.OnAlarmEvent` itself. Acks route back through the driver's
`IWritable.WriteAsync` to the `.AckMsg` attribute.
3. Add Galaxy-specific defaults (sub-attribute naming) as a small adapter
so the same service can serve future drivers with different conventions.
**Exit:** Galaxy alarms still work end-to-end; the tracker code that runs
inside `Galaxy.Host` is dead but kept for the legacy-host backend path.
### Phase 3 — Wonderware Historian sidecar (`Driver.Historian.Wonderware`)
1. New solution project: `Driver.Historian.Wonderware`, .NET 4.8 x86,
console app + NSSM (mirrors today's Galaxy.Host packaging exactly,
minus Galaxy responsibilities).
2. Hosts the existing `HistorianDataSource`, `HistorianClusterEndpointPicker`,
`HistorianHealthSnapshot` code lifted from `Galaxy.Host/Backend/Historian/`
and exposes them over a small named-pipe protocol (or local gRPC if
.NET 4.8 cost is acceptable; named pipe is simpler).
3. Add `Driver.Historian.Wonderware.Client` — .NET 10 — implementing
`IHistorianDataSource` against the sidecar.
4. Server registers it as a data source for the `Galaxy` namespace.
**Exit:** OPC UA history reads work via the sidecar with the legacy-host
backend still in place. We've decoupled history from MXAccess.
### Phase 4 — new `Driver.Galaxy` against gw
This is the meat. New project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`, .NET 10,
in-process. Capabilities (post ot-3): `IDriver`, `ITagDiscovery`, `IReadable`,
`IWritable`, `ISubscribable`, `IRediscoverable`, `IHostConnectivityProbe`.
Shape:
```
Driver.Galaxy/
GalaxyDriver.cs # IDriver root
Browse/
GalaxyDiscoverer.cs # consumes GalaxyRepositoryClient.DiscoverHierarchyAsync
DataTypeMap.cs # mx_data_type → DriverDataType
SecurityMap.cs # security_classification → SecurityClassification
Runtime/
GalaxyMxSession.cs # owns one MxGatewaySession; Register + map per-driver client name
SubscriptionRegistry.cs # tag → server/item handles; persists to memory only
EventPump.cs # consumes session.StreamEventsAsync, fans out to OnDataChange
ReconnectSupervisor.cs # gw transport drop / session-lost recovery
DeployWatcher.cs # GalaxyRepositoryClient.WatchDeployEventsAsync → OnRediscoveryNeeded
Health/
HostConnectivityForwarder.cs # gw-6 SessionHealth → IHostConnectivityProbe
Config/
GalaxyDriverOptions.cs # endpoint, ApiKey, ClientName, TLS, retry, intervals
GalaxyDriverFactoryExtensions.cs # AddGalaxyDriver(IServiceCollection)
```
Key behaviors:
- **Discovery** calls `GalaxyRepositoryClient.DiscoverHierarchyAsync()`
once at init and on every `WatchDeployEvents` event, then drives the
address space builder. Same node naming as today (parent contained-name
hierarchy + leaf attributes named `tag_name.AttributeName`).
- **Read** uses one-off `AddItem` + `Advise` + read-after-first-callback
is overkill; instead, use **`Register` + per-call `AddItem`/`Read`** if gw
exposes a synchronous read, otherwise short-lived advise. *Action item:*
confirm gw's read story; if absent, request a synchronous `ReadAsync` RPC
on top of MXAccess `Read` (which exists in the COM API).
- **Write** maps `WriteRequest.Value` to `MxValue` via gw-7 helpers and
calls `WriteAsync(serverHandle, itemHandle, value, userId=0)`. Routes
`WriteSecured` (where `SecurityClassification == SecuredWrite/Verified`)
to `WriteSecuredAsync` once exposed on `MxGatewaySession`.
- **Subscribe** calls `SubscribeBulkAsync` once per `ISubscribable.Subscribe`
call. Stores `(tag → itemHandle, sid)` in `SubscriptionRegistry`. The
single `EventPump` consumes one `StreamEventsAsync` per session and fans
out per `sid`.
- **Unsubscribe** calls `UnsubscribeBulkAsync` and drops registry entries.
- **Reconnect** — when the gRPC channel drops or `StreamEvents` returns,
`ReconnectSupervisor` reopens the session and replays subscriptions via
gw-5 `ReplaySubscriptionsAsync`. The driver flags `DriverState.Degraded`
during recovery; the server keeps publishing last-good values with
`Uncertain` quality.
- **Host connectivity** — single synthesized host entry named after
`OTOPCUA_GALAXY_CLIENT_NAME` driven by gw-6 `SessionHealth` updates
(or, until gw-6 lands, by transport drops).
Wire into the server next to other Tier-A drivers in the
`AddDrivers(...)` call site.
**Exit:** flipping `Galaxy:Backend` to `mxgateway` runs the OPC UA server
end-to-end with no `Galaxy.Host` involvement. Live read, live write, live
subscribe pass against the dev Galaxy. Historian + alarms still work via
phases 13.
### Phase 5 — parity test matrix
Reuse the existing live-Galaxy integration tests; run each scenario twice:
once with `Galaxy:Backend=legacy-host`, once with `mxgateway`. Compare:
- discovered hierarchy node count + names + datatypes,
- subscribed publish rates (allow ±10% tolerance vs. legacy),
- write success / status codes for each `SecurityClassification`,
- alarm condition transitions (Active / Acked / Inactive) — already
routed through phase 2's server-level subsystem,
- history reads — phase 3 sidecar, identical results both backends,
- reconnect behavior under gw kill, worker kill, network drop, ZB drop.
Document the matrix; resolve every discrepancy or explicitly accept it.
**Exit:** parity matrix has zero unexplained deltas. Performance budget
agreed: e.g. ≤ 2× per-call latency vs. named-pipe baseline at the 95th
percentile, equal or better throughput in `SubscribeBulk` setup time.
### Phase 6 — perf + hardening
- Land gw-9 buffered-update intervals.
- Add OpenTelemetry traces from the driver around every gw call,
correlated via `client_correlation_id`.
- Write soak test: 50k tags subscribed, 24h, count missed events, gw
restarts, OtOpcUa restarts.
- Tune `MxGatewayClientOptions.MaxGrpcMessageBytes`, retry pipeline,
call timeouts based on soak results.
**Exit:** production-acceptable perf numbers documented in
`docs/drivers/Galaxy.md`.
### Phase 7 — retirement
1. Default `Galaxy:Backend = mxgateway` everywhere (sample configs,
install scripts, e2e configs).
2. Delete `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`,
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy`,
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared`, and matching tests.
3. Remove `OtOpcUaGalaxyHost` NSSM registration from
`scripts/install/Install-Services.ps1`. Add a registration block for the
Wonderware historian sidecar from phase 3.
4. Remove every x86 .NET 4.8 reference, build target, and CI step from this
repo; remove `mxaccess_documentation.md`-driven dependencies that no
longer apply.
5. Update CLAUDE.md, `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`,
`docs/Redundancy.md` to reflect the new topology.
6. Memory housekeeping: retire `project_galaxy_host_service.md` and
`project_galaxy_host_installed.md`; add a short note about the gw
dependency.
**Exit:** `git grep -i 'Galaxy\.Host'` returns nothing in source.
---
## Configuration shape (new driver)
```jsonc
"Drivers": {
"Galaxy": {
"Type": "Galaxy",
"InstanceId": "galaxy-prod-1",
"Gateway": {
"Endpoint": "https://mxgw.aveva.local:5001",
"ApiKeySecretRef": "galaxy:apiKey", // resolved via existing secret store
"UseTls": true,
"CaCertificatePath": "C:\\publish\\mxgw\\ca.crt",
"ConnectTimeoutSeconds": 10,
"DefaultCallTimeoutSeconds": 5,
"StreamTimeoutSeconds": 0 // unbounded
},
"MxAccess": {
"ClientName": "OtOpcUa-A", // unique per OtOpcUa instance
"PublishingIntervalMs": 1000, // hint for SetBufferedUpdateInterval
"WriteUserId": 0
},
"Repository": {
"DiscoverPageSize": 5000,
"WatchDeployEvents": true
},
"Reconnect": {
"InitialBackoffMs": 500,
"MaxBackoffMs": 30000,
"ReplayOnSessionLost": true
}
}
}
```
The OtOpcUa secret store already handles DPAPI-protected values for LDAP
binds; reuse it for the gw API key. Never put the key in plaintext in the
sample config.
---
## Risks and mitigations
| Risk | Mitigation |
|---|---|
| gw protocol regression breaks production | Pin gw NuGet to a contract version range; CI runs parity matrix on every gw bump; staged rollout via `Galaxy:Backend` flag. |
| Per-call latency regresses for chatty workloads | Land gw-9 (buffered updates) before phase 5; soak the 95p in phase 6. |
| Reconnect storm after gw restart re-registers 50k tags | Land gw-3 or gw-5 before phase 6; client-side bulk replay throttled by `SubscribeBulkAsync` chunk size. |
| Alarm parity gap from moving tracker server-side | Phase 2 ships before phase 4; parity matrix gates phase 7. |
| Historian sidecar adds a second .NET 4.8 x86 service | Acceptable: it's a *driver-agnostic* component, and it ships only where Wonderware historian access is actually needed. |
| Two OtOpcUa instances both registering as same MXAccess client | `ClientName` is per-instance config (ot-5); install scripts lint that the redundancy pair has distinct names. |
| Cross-machine MXAccess writes traverse plaintext gRPC | Phase 0 enforces `UseTls=true` for any non-loopback `Endpoint`; CI lints the sample configs. |
| gw API key leaked in logs | gw and `MxGatewayClient` already redact `authorization` metadata; phase 6 audit. |
| Memory leak in `EventPump` under high event rate | Bounded channel between `StreamEventsAsync` and per-sub fan-out, drop-newest with a metric counter; soak test catches. |
---
## Cross-cutting deliverables
- **Docs:** `docs/drivers/Galaxy.md` (new), updates to
`docs/v2/dev-environment.md`, `docs/ServiceHosting.md`,
`docs/Redundancy.md`, `CLAUDE.md`.
- **Install scripts:** `scripts/install/Install-Services.ps1` removes
`OtOpcUaGalaxyHost`, adds `OtOpcUaWonderwareHistorian`, no Galaxy
service registration on the OtOpcUa node.
- **e2e:** `scripts/e2e/e2e-config.sample.json` — drop `OTOPCUA_GALAXY_*`
pipe vars, add `Drivers:Galaxy:Gateway:Endpoint` etc.
- **Memory:** retire stale Galaxy.Host entries; add gw dependency entry,
redundancy + client-name guidance.
---
## Order-of-work summary
```
Phase 0 (gw repo): gw-1, gw-2, gw-4, gw-9
Phase 1 (this): ot-1 — historian extension point
Phase 2 (this): ot-2 — alarm subsystem
Phase 3 (this): Driver.Historian.Wonderware sidecar
Phase 4 (this): Driver.Galaxy (new) behind backend flag
— depends on Phase 0, 1, 2
Phase 5 (this+gw): parity matrix
— drives gw-3 / gw-5 / gw-6 / gw-7 if gaps surface
Phase 6 (this): perf + hardening
Phase 7 (this): retire Galaxy.Host / Proxy / Shared
```
Phases 13 are independent of each other and can run in parallel. Phase 4
needs all three plus Phase 0. Phase 5 requires Phase 4. Phases 6 and 7 are
sequential after Phase 5.