Audit (three parallel agent passes) found 43 markdown files carrying stale references to the deleted Galaxy.Host/Proxy/Shared projects after the v2-mxgw merge. This commit lands the prioritized fixes. Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted) - README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install text; leads with the multi-driver .NET 10 server identity and points at scripts/install/Install-Services.ps1 and the parity rig. - docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the Tier-C out-of-process spec with a Tier-A in-process description matching the current GalaxyDriver code, with the four-section GalaxyDriverOptions JSON shape pulled verbatim from Config/GalaxyDriverOptions.cs. - docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the current Browse/Runtime/Health/Config sub-folders. Track 2 — historical banners (5 files) - lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md, docs/v2/Galaxy.ParityMatrix.md, docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a "✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md also fixes two dead links (`docs/Galaxy.Driver.md` and `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`. Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs) - Moved 10 v1 docs under docs/v1/ preserving subpath structure: AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess, Subscriptions (top-level); drivers/Galaxy-Repository, drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs, reqs/MxAccessClientReqs, reqs/ServiceHostReqs. - New docs/v1/README.md is the shared archive banner + per-file table. - docs/README.md repointed to the v1 paths and updated to reflect the v2 two-process deploy shape (Server + Admin + optional OtOpcUaWonderwareHistorian). - docs/v2/Galaxy.ParityRig.md got a historical banner + four inline scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2. The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now describes only the post-PR-7.2 architecture. v1 docs are preserved as a labelled archive under docs/v1/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
487 lines
22 KiB
Markdown
487 lines
22 KiB
Markdown
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw migration design.**
|
||
>
|
||
> This document is the design doc that drove the migration from the
|
||
> legacy out-of-process Galaxy.Host topology to the in-process
|
||
> GalaxyDriver + mxaccessgw architecture. Option 1 (the in-process
|
||
> driver path) was selected and implemented across 39 PRs spanning
|
||
> phases 0–7, merged to master at commit `ae7106d`. For current
|
||
> architecture see `CLAUDE.md`, `docs/drivers/Galaxy.md`, and
|
||
> `docs/v2/Galaxy.Performance.md`.
|
||
|
||
# Galaxy → MxAccessGateway Migration Plan
|
||
|
||
Implements **Option 1** from `lmx_backend.md`: replace the bespoke `Galaxy.Host`
|
||
+ `Galaxy.Proxy` IPC pair with an **in-process Tier-A** `Driver.Galaxy` running
|
||
in the .NET 10 OtOpcUa server, talking to a separately-deployed
|
||
`MxGateway.Server` (mxaccessgw repo) over gRPC for live MXAccess work and
|
||
Galaxy Repository browse.
|
||
|
||
## Outcome
|
||
|
||
After this work:
|
||
|
||
- `OtOpcUa.Server` is fully .NET 10 x64 — no x86 build artifacts in this repo.
|
||
- `Driver.Galaxy.Host` (Windows service, NSSM-wrapped, .NET 4.8 x86) is
|
||
retired. `Driver.Galaxy.Proxy` and `Driver.Galaxy.Shared` are deleted.
|
||
AVEVA platform is no longer required on the OtOpcUa box.
|
||
- A new in-process `Driver.Galaxy` lives next to `Driver.Modbus`,
|
||
`Driver.OpcUaClient`, etc. It implements the same `IDriver` capability set
|
||
the proxy implements today, but its body calls `MxGateway.Client`
|
||
(`MxGatewayClient`, `MxGatewaySession`, `GalaxyRepositoryClient`).
|
||
- Wonderware Historian SDK access moves out of the Galaxy driver into a
|
||
driver-agnostic historian data source (`Driver.Historian.Wonderware`,
|
||
separate sidecar, .NET 4.8 x86). The OPC UA HA service plugs into it the
|
||
same way it would plug into any future historian.
|
||
- Alarm condition tracking moves out of the driver into the OPC UA server's
|
||
generic A&E subsystem. The driver only flags `IsAlarm=true` on attribute
|
||
metadata and forwards live `.InAlarm`/`.Acked`/etc value changes; the
|
||
server runs the AlarmCondition state machine.
|
||
- Per-platform `ScanState` probes degrade to plain attribute subscriptions —
|
||
no special probe manager.
|
||
|
||
---
|
||
|
||
## Pre-flight: improvements to land in mxaccessgw first
|
||
|
||
These are **integration-quality changes** in the mxaccessgw repo that make
|
||
the OtOpcUa side dramatically simpler / faster / more robust. They aren't
|
||
strictly required to start, but ship enough of them before phase 3 that we're
|
||
not designing around gaps.
|
||
|
||
### gw-1. Galaxy attribute metadata parity
|
||
|
||
**What's there:** `galaxy_repository.v1.DiscoverHierarchy` returns
|
||
`GalaxyObject` with name, parent, category, and dynamic attributes.
|
||
|
||
**What's missing for OtOpcUa:** every field today's `MxAccessGalaxyBackend`
|
||
copies into `GalaxyAttributeInfo` — confirm gw's `Attribute` proto carries:
|
||
- `mx_data_type` (int)
|
||
- `is_array` (bool)
|
||
- `array_dimension` (uint, optional)
|
||
- `security_classification` (int)
|
||
- `is_historized` (bool, from `HistorizedExtension` primitive)
|
||
- `is_alarm` (bool, from `AlarmExtension` primitive)
|
||
|
||
If any are missing, add them to the proto and the server-side query mapper.
|
||
Without `IsAlarm` and `IsHistorized` the OPC UA server can't decide which
|
||
nodes get HasHistoricalConfiguration / which become AlarmConditions.
|
||
|
||
### gw-2. Stable, documented event-stream resume semantics
|
||
|
||
**What's needed:** the OtOpcUa driver must survive a transient gw transport
|
||
drop without losing subscription state or duplicating change events. gw's
|
||
`StreamEventsAsync(afterWorkerSequence)` already exposes resumption.
|
||
Document the per-session retention window (how long does the worker buffer
|
||
events the gateway hasn't acked?) and the "events were dropped, you must
|
||
re-subscribe" signal. If retention is bounded by count rather than time,
|
||
expose the bound in `OpenSessionReply` so the client can size its own buffer.
|
||
|
||
### gw-3. Reconnectable sessions
|
||
|
||
Listed under "post-v1 revisit" in `gateway.md`. Without it, every gw or
|
||
OtOpcUa restart re-`Register`s, re-`AddItem`s, re-`Advise`s the entire
|
||
address space — for a 50k-tag Galaxy that's a non-trivial cold-start. With
|
||
reconnectable sessions, the driver presents its `SessionId` after a restart
|
||
and the worker keeps its handles.
|
||
|
||
If full reconnection is too large, ship a **bulk replay** instead: a single
|
||
RPC that takes the full subscription set and the worker performs the
|
||
register/add/advise inside one round trip. We can drive it from a
|
||
client-side cache rather than gw state. See gw-5 below.
|
||
|
||
### gw-4. Driver-shaped subscribe primitive
|
||
|
||
`MxGatewaySession` already has `SubscribeBulkAsync` (one RPC: `Register`
|
||
implicit + `AddItem` + `Advise` for a list of tag addresses, returning
|
||
per-tag `SubscribeResult`). That's exactly what `ISubscribable.SubscribeAsync`
|
||
wants. Confirm it returns enough per-tag detail to surface a partial-failure
|
||
list to OPC UA monitored items (good handle, status code, error text).
|
||
|
||
If not already, expose **`SubscribeBulk` with optional update-rate hint**
|
||
forwarded to `SetBufferedUpdateInterval` so the OPC UA publishing interval
|
||
becomes a single field on the subscribe call rather than a follow-up RPC.
|
||
|
||
### gw-5. Subscription replay snapshot
|
||
|
||
Provide an RPC `ReplaySubscriptionsAsync(SessionId, IEnumerable<TagAddress>)`
|
||
that re-establishes a list of subscriptions after a session reset and returns
|
||
per-tag results. The client stores its tag list locally (the driver already
|
||
has it from `Discover`), and the gw worker turns it into one
|
||
register/add/advise sequence. This is the minimum surface we need; full
|
||
"reattach to a previous session by id" (gw-3) is a richer version of the
|
||
same thing.
|
||
|
||
### gw-6. Transport-health stream
|
||
|
||
The gw already exposes worker / session health on its dashboard. Add a small
|
||
streaming RPC `StreamSessionHealth(SessionId) → stream SessionHealth` so the
|
||
OtOpcUa driver can surface "MXAccess transport up/down" to its
|
||
`IHostConnectivityProbe` without faking it via probe-tag subscriptions.
|
||
Today `MxAccessClient.ConnectionStateChanged` does this in-process; we want
|
||
the same signal at the gw boundary.
|
||
|
||
### gw-7. Optional .NET 10 client polish
|
||
|
||
- Async-disposable session pattern is already there.
|
||
- Add a **typed `MxValue` ⇄ `object` adapter** for the seven Galaxy types
|
||
OtOpcUa cares about (Boolean, Int32, Float, Double, String, DateTime,
|
||
arrays of the same). Today every consumer writes its own `MxValue.From<T>`
|
||
helpers; this shaves boilerplate from the driver.
|
||
- Add a **`SubscribeWithCallback`** convenience wrapper that combines
|
||
`OpenSession` + `SubscribeBulk` + `StreamEvents` and routes events through
|
||
a delegate per tag. Keeps the OPC UA driver from re-implementing the
|
||
fan-out / sequencer pattern.
|
||
|
||
### gw-8. Auth minimums
|
||
|
||
Document API-key scoping as it applies to OtOpcUa: the server identity needs
|
||
`session`, `invoke`, `event`, and `metadata:read` scopes. Provide a CLI to
|
||
mint a key bound to those scopes for an OtOpcUa instance.
|
||
|
||
### gw-9. Performance: bulk paths and value coalescing
|
||
|
||
- Confirm `SubscribeBulkAsync` is implemented as a single MXAccess
|
||
`AddItem`+`Advise` loop on the worker, not N pipe round trips. If not, fix
|
||
before we drive 50k-tag Galaxies through it.
|
||
- Expose `SetBufferedUpdateInterval` per session so OtOpcUa can request
|
||
buffered updates at the OPC UA publishing interval and get one batched
|
||
`OnBufferedDataChange` per tick rather than N `OnDataChange` events.
|
||
|
||
These can all ship in mxaccessgw independently and improve every consumer.
|
||
|
||
---
|
||
|
||
## OtOpcUa-side improvements to land in parallel
|
||
|
||
Some are forced by removing `Galaxy.Host`; others are quality-of-life.
|
||
|
||
### ot-1. Promote `IHistorianDataSource` to a server-level extension point
|
||
|
||
Today `IHistorianDataSource` is a Galaxy-internal abstraction in
|
||
`Driver.Galaxy.Host`. Lift it to `OtOpcUa.Core.Abstractions` (or a similar
|
||
home next to `IDriver`) and let the OPC UA HA service consume **any number
|
||
of registered data sources** keyed by node namespace. Drivers don't own
|
||
historian access; the server mounts data sources alongside drivers. This is
|
||
the prerequisite that lets us move Wonderware Historian out of the Galaxy
|
||
driver without losing the feature.
|
||
|
||
### ot-2. Generic alarm condition state machine in the server
|
||
|
||
Move the `.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` quartet handling
|
||
out of `GalaxyAlarmTracker` into a server-level alarm subsystem keyed off the
|
||
`IsAlarm=true` flag drivers set during discovery. The server subscribes to
|
||
the four sub-attributes itself and runs the AlarmCondition state machine.
|
||
Driver only:
|
||
- declares `IsAlarm=true` in `DriverAttributeInfo`,
|
||
- forwards plain attribute value changes (already done by `ISubscribable`).
|
||
|
||
This is also a precondition for future drivers (Modbus DL205 alarm bits,
|
||
S7 alarm DBs) to emit alarms without each writing their own tracker.
|
||
|
||
### ot-3. Driver capabilities trim
|
||
|
||
After ot-1 and ot-2, `Driver.Galaxy` no longer needs to implement:
|
||
- `IHistoryProvider` (server's HA service handles it via Wonderware
|
||
historian data source)
|
||
- `IAlarmHistorianWriter` (server's A&E historian, or kept generic — Galaxy
|
||
shouldn't own the SQLite path)
|
||
- `IAlarmSource` ack route (server-level alarm subsystem writes back via the
|
||
driver's `IWritable.WriteAsync`, which the gw already supports)
|
||
|
||
Keep:
|
||
- `IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`,
|
||
`IRediscoverable`, `IHostConnectivityProbe`.
|
||
|
||
### ot-4. Treat `time_of_last_deploy` as `IRediscoverable`'s pump
|
||
|
||
Replace the Host-side change-detection poll with a managed
|
||
`GalaxyRepositoryClient.WatchDeployEventsAsync` consumer in the driver.
|
||
Each event raises `OnRediscoveryNeeded` with the new deploy time as the
|
||
`scopeHint`. No polling code in this repo.
|
||
|
||
### ot-5. Connection pool at the server, not the driver
|
||
|
||
If the redundancy pair runs two OtOpcUa instances against one gw, both
|
||
should share a single `GrpcChannel` per process (already gRPC default) but
|
||
**different sessions** (one MXAccess client identity per OtOpcUa instance,
|
||
not one shared session that fights over Wonderware client state). Encode
|
||
the per-instance MXAccess client name in driver config — already partly
|
||
there (`OTOPCUA_GALAXY_CLIENT_NAME`); make it explicit in the new driver's
|
||
`appsettings.json` shape.
|
||
|
||
---
|
||
|
||
## Phased implementation
|
||
|
||
Each phase is a working, mergeable slice. Keep `Galaxy.Host` running
|
||
alongside the new driver until phase 7 — gated by a config switch
|
||
`Galaxy:Backend = legacy-host | mxgateway`.
|
||
|
||
### Phase 0 — pre-flight (mxaccessgw repo)
|
||
|
||
Ship gw-1, gw-2, gw-4, gw-9 (the parity, performance, and contract bits the
|
||
plan immediately depends on). gw-3, gw-5, gw-6, gw-7 can come during or
|
||
after phase 5.
|
||
|
||
**Exit:** local OtOpcUa dev box can `MxGatewayClient.Create` a client, open a
|
||
session, `SubscribeBulkAsync` 100 tags, and observe `OnDataChange` events at
|
||
the configured update rate.
|
||
|
||
### Phase 1 — server-level historian extension point (ot-1)
|
||
|
||
1. Extract `IHistorianDataSource` (and its DTOs `HistorianSample`,
|
||
`HistorianAggregateSample`, `HistoricalEvent`) from
|
||
`Driver.Galaxy.Host/Backend/Historian/` into
|
||
`src/ZB.MOM.WW.OtOpcUa.Core/Abstractions/Historian/`.
|
||
2. Extend the OPC UA HA service to look up a registered
|
||
`IHistorianDataSource` per namespace and call into it for `HistoryRead`,
|
||
`HistoryReadProcessed`, `HistoryReadAtTime`, `HistoryReadEvents`. Drivers
|
||
stop implementing `IHistoryProvider` directly; the server proxies.
|
||
3. Add a no-op default registration so drivers without history keep working.
|
||
|
||
**Exit:** all current Galaxy history reads route through an
|
||
`IHistorianDataSource` registered by `Driver.Galaxy.Host` (still legacy)
|
||
without behavior change. Other drivers untouched.
|
||
|
||
### Phase 2 — server-level alarm subsystem (ot-2)
|
||
|
||
1. Add an `IAlarmConditionDeclaration` API on the address-space builder so
|
||
discovery can flag a node as alarm-bearing and supply the four
|
||
sub-attribute references.
|
||
2. Add a hosted `AlarmConditionService` in the server that, on driver
|
||
`Discover`, subscribes to the four sub-attributes via the driver's own
|
||
`ISubscribable`, runs the state machine, and emits
|
||
`IAlarmSource.OnAlarmEvent` itself. Acks route back through the driver's
|
||
`IWritable.WriteAsync` to the `.AckMsg` attribute.
|
||
3. Add Galaxy-specific defaults (sub-attribute naming) as a small adapter
|
||
so the same service can serve future drivers with different conventions.
|
||
|
||
**Exit:** Galaxy alarms still work end-to-end; the tracker code that runs
|
||
inside `Galaxy.Host` is dead but kept for the legacy-host backend path.
|
||
|
||
### Phase 3 — Wonderware Historian sidecar (`Driver.Historian.Wonderware`)
|
||
|
||
1. New solution project: `Driver.Historian.Wonderware`, .NET 4.8 x86,
|
||
console app + NSSM (mirrors today's Galaxy.Host packaging exactly,
|
||
minus Galaxy responsibilities).
|
||
2. Hosts the existing `HistorianDataSource`, `HistorianClusterEndpointPicker`,
|
||
`HistorianHealthSnapshot` code lifted from `Galaxy.Host/Backend/Historian/`
|
||
and exposes them over a small named-pipe protocol (or local gRPC if
|
||
.NET 4.8 cost is acceptable; named pipe is simpler).
|
||
3. Add `Driver.Historian.Wonderware.Client` — .NET 10 — implementing
|
||
`IHistorianDataSource` against the sidecar.
|
||
4. Server registers it as a data source for the `Galaxy` namespace.
|
||
|
||
**Exit:** OPC UA history reads work via the sidecar with the legacy-host
|
||
backend still in place. We've decoupled history from MXAccess.
|
||
|
||
### Phase 4 — new `Driver.Galaxy` against gw
|
||
|
||
This is the meat. New project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`, .NET 10,
|
||
in-process. Capabilities (post ot-3): `IDriver`, `ITagDiscovery`, `IReadable`,
|
||
`IWritable`, `ISubscribable`, `IRediscoverable`, `IHostConnectivityProbe`.
|
||
|
||
Shape:
|
||
|
||
```
|
||
Driver.Galaxy/
|
||
GalaxyDriver.cs # IDriver root
|
||
Browse/
|
||
GalaxyDiscoverer.cs # consumes GalaxyRepositoryClient.DiscoverHierarchyAsync
|
||
DataTypeMap.cs # mx_data_type → DriverDataType
|
||
SecurityMap.cs # security_classification → SecurityClassification
|
||
Runtime/
|
||
GalaxyMxSession.cs # owns one MxGatewaySession; Register + map per-driver client name
|
||
SubscriptionRegistry.cs # tag → server/item handles; persists to memory only
|
||
EventPump.cs # consumes session.StreamEventsAsync, fans out to OnDataChange
|
||
ReconnectSupervisor.cs # gw transport drop / session-lost recovery
|
||
DeployWatcher.cs # GalaxyRepositoryClient.WatchDeployEventsAsync → OnRediscoveryNeeded
|
||
Health/
|
||
HostConnectivityForwarder.cs # gw-6 SessionHealth → IHostConnectivityProbe
|
||
Config/
|
||
GalaxyDriverOptions.cs # endpoint, ApiKey, ClientName, TLS, retry, intervals
|
||
GalaxyDriverFactoryExtensions.cs # AddGalaxyDriver(IServiceCollection)
|
||
```
|
||
|
||
Key behaviors:
|
||
|
||
- **Discovery** calls `GalaxyRepositoryClient.DiscoverHierarchyAsync()`
|
||
once at init and on every `WatchDeployEvents` event, then drives the
|
||
address space builder. Same node naming as today (parent contained-name
|
||
hierarchy + leaf attributes named `tag_name.AttributeName`).
|
||
- **Read** uses one-off `AddItem` + `Advise` + read-after-first-callback
|
||
is overkill; instead, use **`Register` + per-call `AddItem`/`Read`** if gw
|
||
exposes a synchronous read, otherwise short-lived advise. *Action item:*
|
||
confirm gw's read story; if absent, request a synchronous `ReadAsync` RPC
|
||
on top of MXAccess `Read` (which exists in the COM API).
|
||
- **Write** maps `WriteRequest.Value` to `MxValue` via gw-7 helpers and
|
||
calls `WriteAsync(serverHandle, itemHandle, value, userId=0)`. Routes
|
||
`WriteSecured` (where `SecurityClassification == SecuredWrite/Verified`)
|
||
to `WriteSecuredAsync` once exposed on `MxGatewaySession`.
|
||
- **Subscribe** calls `SubscribeBulkAsync` once per `ISubscribable.Subscribe`
|
||
call. Stores `(tag → itemHandle, sid)` in `SubscriptionRegistry`. The
|
||
single `EventPump` consumes one `StreamEventsAsync` per session and fans
|
||
out per `sid`.
|
||
- **Unsubscribe** calls `UnsubscribeBulkAsync` and drops registry entries.
|
||
- **Reconnect** — when the gRPC channel drops or `StreamEvents` returns,
|
||
`ReconnectSupervisor` reopens the session and replays subscriptions via
|
||
gw-5 `ReplaySubscriptionsAsync`. The driver flags `DriverState.Degraded`
|
||
during recovery; the server keeps publishing last-good values with
|
||
`Uncertain` quality.
|
||
- **Host connectivity** — single synthesized host entry named after
|
||
`OTOPCUA_GALAXY_CLIENT_NAME` driven by gw-6 `SessionHealth` updates
|
||
(or, until gw-6 lands, by transport drops).
|
||
|
||
Wire into the server next to other Tier-A drivers in the
|
||
`AddDrivers(...)` call site.
|
||
|
||
**Exit:** flipping `Galaxy:Backend` to `mxgateway` runs the OPC UA server
|
||
end-to-end with no `Galaxy.Host` involvement. Live read, live write, live
|
||
subscribe pass against the dev Galaxy. Historian + alarms still work via
|
||
phases 1–3.
|
||
|
||
### Phase 5 — parity test matrix
|
||
|
||
Reuse the existing live-Galaxy integration tests; run each scenario twice:
|
||
once with `Galaxy:Backend=legacy-host`, once with `mxgateway`. Compare:
|
||
|
||
- discovered hierarchy node count + names + datatypes,
|
||
- subscribed publish rates (allow ±10% tolerance vs. legacy),
|
||
- write success / status codes for each `SecurityClassification`,
|
||
- alarm condition transitions (Active / Acked / Inactive) — already
|
||
routed through phase 2's server-level subsystem,
|
||
- history reads — phase 3 sidecar, identical results both backends,
|
||
- reconnect behavior under gw kill, worker kill, network drop, ZB drop.
|
||
|
||
Document the matrix; resolve every discrepancy or explicitly accept it.
|
||
|
||
**Exit:** parity matrix has zero unexplained deltas. Performance budget
|
||
agreed: e.g. ≤ 2× per-call latency vs. named-pipe baseline at the 95th
|
||
percentile, equal or better throughput in `SubscribeBulk` setup time.
|
||
|
||
### Phase 6 — perf + hardening
|
||
|
||
- Land gw-9 buffered-update intervals.
|
||
- Add OpenTelemetry traces from the driver around every gw call,
|
||
correlated via `client_correlation_id`.
|
||
- Write soak test: 50k tags subscribed, 24h, count missed events, gw
|
||
restarts, OtOpcUa restarts.
|
||
- Tune `MxGatewayClientOptions.MaxGrpcMessageBytes`, retry pipeline,
|
||
call timeouts based on soak results.
|
||
|
||
**Exit:** production-acceptable perf numbers documented in
|
||
`docs/drivers/Galaxy.md`.
|
||
|
||
### Phase 7 — retirement
|
||
|
||
1. Default `Galaxy:Backend = mxgateway` everywhere (sample configs,
|
||
install scripts, e2e configs).
|
||
2. Delete `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`,
|
||
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy`,
|
||
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared`, and matching tests.
|
||
3. Remove `OtOpcUaGalaxyHost` NSSM registration from
|
||
`scripts/install/Install-Services.ps1`. Add a registration block for the
|
||
Wonderware historian sidecar from phase 3.
|
||
4. Remove every x86 .NET 4.8 reference, build target, and CI step from this
|
||
repo; remove `mxaccess_documentation.md`-driven dependencies that no
|
||
longer apply.
|
||
5. Update CLAUDE.md, `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`,
|
||
`docs/Redundancy.md` to reflect the new topology.
|
||
6. Memory housekeeping: retire `project_galaxy_host_service.md` and
|
||
`project_galaxy_host_installed.md`; add a short note about the gw
|
||
dependency.
|
||
|
||
**Exit:** `git grep -i 'Galaxy\.Host'` returns nothing in source.
|
||
|
||
---
|
||
|
||
## Configuration shape (new driver)
|
||
|
||
```jsonc
|
||
"Drivers": {
|
||
"Galaxy": {
|
||
"Type": "Galaxy",
|
||
"InstanceId": "galaxy-prod-1",
|
||
"Gateway": {
|
||
"Endpoint": "https://mxgw.aveva.local:5001",
|
||
"ApiKeySecretRef": "galaxy:apiKey", // resolved via existing secret store
|
||
"UseTls": true,
|
||
"CaCertificatePath": "C:\\publish\\mxgw\\ca.crt",
|
||
"ConnectTimeoutSeconds": 10,
|
||
"DefaultCallTimeoutSeconds": 5,
|
||
"StreamTimeoutSeconds": 0 // unbounded
|
||
},
|
||
"MxAccess": {
|
||
"ClientName": "OtOpcUa-A", // unique per OtOpcUa instance
|
||
"PublishingIntervalMs": 1000, // hint for SetBufferedUpdateInterval
|
||
"WriteUserId": 0
|
||
},
|
||
"Repository": {
|
||
"DiscoverPageSize": 5000,
|
||
"WatchDeployEvents": true
|
||
},
|
||
"Reconnect": {
|
||
"InitialBackoffMs": 500,
|
||
"MaxBackoffMs": 30000,
|
||
"ReplayOnSessionLost": true
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
The OtOpcUa secret store already handles DPAPI-protected values for LDAP
|
||
binds; reuse it for the gw API key. Never put the key in plaintext in the
|
||
sample config.
|
||
|
||
---
|
||
|
||
## Risks and mitigations
|
||
|
||
| Risk | Mitigation |
|
||
|---|---|
|
||
| gw protocol regression breaks production | Pin gw NuGet to a contract version range; CI runs parity matrix on every gw bump; staged rollout via `Galaxy:Backend` flag. |
|
||
| Per-call latency regresses for chatty workloads | Land gw-9 (buffered updates) before phase 5; soak the 95p in phase 6. |
|
||
| Reconnect storm after gw restart re-registers 50k tags | Land gw-3 or gw-5 before phase 6; client-side bulk replay throttled by `SubscribeBulkAsync` chunk size. |
|
||
| Alarm parity gap from moving tracker server-side | Phase 2 ships before phase 4; parity matrix gates phase 7. |
|
||
| Historian sidecar adds a second .NET 4.8 x86 service | Acceptable: it's a *driver-agnostic* component, and it ships only where Wonderware historian access is actually needed. |
|
||
| Two OtOpcUa instances both registering as same MXAccess client | `ClientName` is per-instance config (ot-5); install scripts lint that the redundancy pair has distinct names. |
|
||
| Cross-machine MXAccess writes traverse plaintext gRPC | Phase 0 enforces `UseTls=true` for any non-loopback `Endpoint`; CI lints the sample configs. |
|
||
| gw API key leaked in logs | gw and `MxGatewayClient` already redact `authorization` metadata; phase 6 audit. |
|
||
| Memory leak in `EventPump` under high event rate | Bounded channel between `StreamEventsAsync` and per-sub fan-out, drop-newest with a metric counter; soak test catches. |
|
||
|
||
---
|
||
|
||
## Cross-cutting deliverables
|
||
|
||
- **Docs:** `docs/drivers/Galaxy.md` (new), updates to
|
||
`docs/v2/dev-environment.md`, `docs/ServiceHosting.md`,
|
||
`docs/Redundancy.md`, `CLAUDE.md`.
|
||
- **Install scripts:** `scripts/install/Install-Services.ps1` removes
|
||
`OtOpcUaGalaxyHost`, adds `OtOpcUaWonderwareHistorian`, no Galaxy
|
||
service registration on the OtOpcUa node.
|
||
- **e2e:** `scripts/e2e/e2e-config.sample.json` — drop `OTOPCUA_GALAXY_*`
|
||
pipe vars, add `Drivers:Galaxy:Gateway:Endpoint` etc.
|
||
- **Memory:** retire stale Galaxy.Host entries; add gw dependency entry,
|
||
redundancy + client-name guidance.
|
||
|
||
---
|
||
|
||
## Order-of-work summary
|
||
|
||
```
|
||
Phase 0 (gw repo): gw-1, gw-2, gw-4, gw-9
|
||
Phase 1 (this): ot-1 — historian extension point
|
||
Phase 2 (this): ot-2 — alarm subsystem
|
||
Phase 3 (this): Driver.Historian.Wonderware sidecar
|
||
Phase 4 (this): Driver.Galaxy (new) behind backend flag
|
||
— depends on Phase 0, 1, 2
|
||
Phase 5 (this+gw): parity matrix
|
||
— drives gw-3 / gw-5 / gw-6 / gw-7 if gaps surface
|
||
Phase 6 (this): perf + hardening
|
||
Phase 7 (this): retire Galaxy.Host / Proxy / Shared
|
||
```
|
||
|
||
Phases 1–3 are independent of each other and can run in parallel. Phase 4
|
||
needs all three plus Phase 0. Phase 5 requires Phase 4. Phases 6 and 7 are
|
||
sequential after Phase 5.
|