docs: post-PR-7.2 cleanup — audit + three-track scrub

Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-30 08:59:59 -04:00
parent ae7106dfce
commit 006af51768
21 changed files with 322 additions and 671 deletions

238
README.md
View File

@@ -1,200 +1,90 @@
# LmxOpcUa
# OtOpcUa
OPC UA server and cross-platform client tools for AVEVA System Platform (Wonderware) Galaxy. The server exposes Galaxy tags via MXAccess as an OPC UA address space. The client stack provides a shared library, CLI tool, and Avalonia desktop application for browsing, reading/writing, subscriptions, alarms, and historical data.
OPC UA server (.NET 10 AnyCPU) that exposes a fleet of industrial drivers as a single OPC UA address space. Drivers ship in-process for AVEVA System Platform Galaxy (via the sibling `mxaccessgw` repo), Modbus TCP, Siemens S7, Allen-Bradley CIP (ControlLogix / CompactLogix), Allen-Bradley Legacy (SLC 500 / MicroLogix), Beckhoff TwinCAT (ADS), FANUC FOCAS, and OPC UA Client (gateway).
A cross-platform client stack (.NET 10) — shared library, CLI, and Avalonia desktop app — connects to any OPC UA server.
## Architecture
```
OPC UA Clients
(CLI, Desktop UI, 3rd-party)
|
v
+-----------------+ +------------------+ +-----------------+
| Galaxy Repo DB |---->| OPC UA Server |<--->| MXAccess Client |
| (SQL Server) | | (address space) | | (STA + COM) |
+-----------------+ +------------------+ +-----------------+
| |
+-------+--------+ +---------+---------+
| Status Dashboard| | Historian Runtime |
| (HTTP/JSON) | | (SQL Server) |
+----------------+ +-------------------+
OPC UA Clients (CLI, Desktop UI, 3rd-party)
|
v
+-------------------------------------+
| OtOpcUa.Server (.NET 10 AnyCPU) |
| address space + capability fan-out|
+-------------------------------------+
| | | | | | | |
Galaxy Modbus S7 AbCip AbLeg TwinCAT FOCAS OpcUaClient
|
v
mxaccessgw (sibling repo, gRPC)
|
v
MXAccess COM (x86 worker, on AVEVA box)
```
## Contained Name vs Tag Name
Galaxy is the only driver with an external runtime: it speaks gRPC to a separately installed `mxaccessgw` server (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`) which owns the MXAccess COM apartment and the x86/STA bitness constraint server-side. Everything in this repo is platform-agnostic .NET 10.
| Browse Path (contained names) | Runtime Reference (tag name) |
|-------------------------------|------------------------------|
| `TestMachine_001/DelmiaReceiver/DownloadPath` | `DelmiaReceiver_001.DownloadPath` |
| `TestMachine_001/MESReceiver/MoveInBatchID` | `MESReceiver_001.MoveInBatchID` |
## Prerequisites
---
- .NET 10 SDK (server, drivers, clients all target .NET 10)
- SQL Server reachable for the central config DB
- For Galaxy specifically: a running `mxaccessgw` deployment — see [docs/v2/Galaxy.ParityRig.md](docs/v2/Galaxy.ParityRig.md)
- For Wonderware Historian read-back: optional `OtOpcUaWonderwareHistorian` sidecar — see [docs/ServiceHosting.md](docs/ServiceHosting.md)
## Server
The OPC UA server runs on .NET Framework 4.8 (x86) and bridges the Galaxy runtime to OPC UA clients.
### Server Prerequisites
- .NET Framework 4.8 SDK
- AVEVA System Platform with ArchestrA Framework installed
- Galaxy repository database (SQL Server, Windows Auth)
- MXAccess COM registered (`LMXProxy.LMXProxyServer`)
- Wonderware Historian (optional, for historical data access)
- Windows (required for COM interop and MXAccess)
### Build and Run Server
## Quick Start
```bash
dotnet restore ZB.MOM.WW.LmxOpcUa.slnx
dotnet build src/ZB.MOM.WW.LmxOpcUa.Host
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Host
dotnet restore ZB.MOM.WW.OtOpcUa.slnx
dotnet build ZB.MOM.WW.OtOpcUa.slnx
dotnet test ZB.MOM.WW.OtOpcUa.slnx
# Run the server in dev (foreground)
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server
```
The server starts on `opc.tcp://localhost:4840/LmxOpcUa` with the `None` security profile by default. Configure `Security.Profiles` in `appsettings.json` to enable `Basic256Sha256-Sign` or `Basic256Sha256-SignAndEncrypt` for transport security. See [Security Guide](docs/security.md).
The server starts on `opc.tcp://localhost:4840` with the `None` security profile. Configure `Security.Profiles` in `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` to enable `Basic256Sha256-Sign` or `Basic256Sha256-SignAndEncrypt`. See [docs/security.md](docs/security.md).
### Install as Windows Service
## Install as Windows Services
Production deployment is driven by `scripts/install/Install-Services.ps1`, which registers the `OtOpcUa` server service (and optionally the `OtOpcUaWonderwareHistorian` sidecar) under a chosen service account. Galaxy support requires a separately installed `mxaccessgw` — neither this repo nor the install script provisions it.
```powershell
.\scripts\install\Install-Services.ps1 `
-InstallRoot 'C:\Program Files\OtOpcUa' `
-ServiceAccount 'DOMAIN\svc-otopcua'
```
Add `-InstallWonderwareHistorian` for the historian sidecar. See the script header and [docs/ServiceHosting.md](docs/ServiceHosting.md) for full options.
## Client CLI
```bash
cd src/ZB.MOM.WW.LmxOpcUa.Host/bin/Debug/net48
ZB.MOM.WW.LmxOpcUa.Host.exe install
ZB.MOM.WW.LmxOpcUa.Host.exe start
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840 -r -d 3
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode"
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- write -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" -v 42
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" -i 500
```
**Service logon requirement:** The service must run under a Windows account that has access to the AVEVA Galaxy and Historian. The default `LocalSystem` account can connect to MXAccess and SQL Server but **cannot authenticate with the Historian SDK** (HCAP). Configure the service to "Log on as" a domain or local user that is a recognized ArchestrA platform user. This can be set in `services.msc` or during install with `ZB.MOM.WW.LmxOpcUa.Host.exe install -username DOMAIN\user -password ***`.
### Run Server Tests
```bash
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.IntegrationTests
```
---
## Client Stack
The client stack is cross-platform (.NET 10) and consists of three projects sharing a common `IOpcUaClientService` abstraction. No AVEVA software or COM is required — the clients connect to any OPC UA server.
### Client Prerequisites
- .NET 10 SDK
- No platform-specific dependencies (runs on Windows, macOS, Linux)
### Build All Clients
```bash
dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.Shared
dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.CLI
dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.UI
```
### Run Client Tests
```bash
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.Shared.Tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.CLI.Tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.UI.Tests
```
### Client CLI
```bash
# Connect
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840/LmxOpcUa
# Browse Galaxy hierarchy
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=ZB" -r -d 5
# Read a tag
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001.MachineID"
# Write a tag
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- write -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestChildObject.TestString" -v "Hello"
# Subscribe to changes
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- subscribe -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestChildObject.TestInt" -i 500
# Read historical data
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- historyread -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001.TestHistoryValue" --start "2026-03-25" --end "2026-03-30"
# Subscribe to alarm events
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- alarms -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001" --refresh
# Query redundancy state
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- redundancy -u opc.tcp://localhost:4840/LmxOpcUa
```
### Client UI
```bash
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.UI
```
The desktop application provides browse tree, subscriptions, alarm monitoring, history reads, and write dialogs. See [Client UI Documentation](docs/Client.UI.md) for details.
---
## Project Structure
```
src/
ZB.MOM.WW.LmxOpcUa.Host/ OPC UA server (.NET Framework 4.8, x86)
Configuration/ Config binding and validation
Domain/ Interfaces, DTOs, enums, mappers
Historian/ Wonderware Historian data source
Metrics/ Performance tracking (rolling P95)
MxAccess/ STA thread, COM interop, subscriptions
GalaxyRepository/ SQL queries, change detection
OpcUa/ Server, node manager, address space, alarms, diff
Status/ HTTP dashboard, health checks
ZB.MOM.WW.LmxOpcUa.Client.Shared/ Shared OPC UA client library (.NET 10)
ZB.MOM.WW.LmxOpcUa.Client.CLI/ Command-line client (.NET 10)
ZB.MOM.WW.LmxOpcUa.Client.UI/ Avalonia desktop client (.NET 10)
tests/
ZB.MOM.WW.LmxOpcUa.Tests/ Server unit + integration tests
ZB.MOM.WW.LmxOpcUa.IntegrationTests/ Server integration tests (live DB)
ZB.MOM.WW.LmxOpcUa.Client.Shared.Tests/ Shared library tests
ZB.MOM.WW.LmxOpcUa.Client.CLI.Tests/ CLI command tests
ZB.MOM.WW.LmxOpcUa.Client.UI.Tests/ UI ViewModel + headless tests
gr/ Galaxy repository docs, SQL queries, schema
```
See [docs/Client.CLI.md](docs/Client.CLI.md) and [docs/Client.UI.md](docs/Client.UI.md).
## Documentation
### Server
| Component | Description |
| Topic | Doc |
|---|---|
| [OPC UA Server](docs/OpcUaServer.md) | Endpoint, sessions, security policy, server lifecycle |
| [Address Space](docs/AddressSpace.md) | Hierarchy nodes, variable nodes, primitive grouping, NodeId scheme |
| [Galaxy Repository](docs/GalaxyRepository.md) | SQL queries, deployed package chain, change detection |
| [MXAccess Bridge](docs/MxAccessBridge.md) | STA thread, COM interop, subscriptions, reconnection |
| [Data Type Mapping](docs/DataTypeMapping.md) | Galaxy to OPC UA types, arrays, security classification |
| [Read/Write Operations](docs/ReadWriteOperations.md) | Value reads, writes, access level enforcement, array element writes |
| [Subscriptions](docs/Subscriptions.md) | Ref-counted MXAccess subscriptions, data change dispatch |
| [Alarm Tracking](docs/AlarmTracking.md) | AlarmConditionState nodes, InAlarm monitoring, event reporting |
| [Historical Data Access](docs/HistoricalDataAccess.md) | Historian data source, HistoryReadRaw, HistoryReadProcessed |
| [Incremental Sync](docs/IncrementalSync.md) | Diff computation, subtree teardown/rebuild, subscription preservation |
| [Configuration](docs/Configuration.md) | appsettings.json binding, feature flags, validation |
| [Status Dashboard](docs/StatusDashboard.md) | HTTP server, health checks, metrics reporting |
| [Service Hosting](docs/ServiceHosting.md) | TopShelf, startup/shutdown sequence, error handling |
| [Security](docs/security.md) | Transport security profiles, certificate trust, production hardening |
| [Redundancy](docs/Redundancy.md) | Non-transparent warm/hot redundancy, ServiceLevel, paired deployment |
### Client
| Component | Description |
|---|---|
| [Client CLI](docs/Client.CLI.md) | Connect, browse, read, write, subscribe, historyread, alarms, redundancy commands |
| [Client UI](docs/Client.UI.md) | Avalonia desktop client: browse, subscribe, alarms, history, write values |
### Reference
- [Galaxy Repository Queries](gr/CLAUDE.md) — SQL queries for hierarchy, attributes, and change detection
- [Data Type Mapping](gr/data_type_mapping.md) — Galaxy to OPC UA type mapping with security classification
| Driver specs (per-driver capability surface, config, addressing) | [docs/v2/driver-specs.md](docs/v2/driver-specs.md) |
| Galaxy driver | [docs/drivers/Galaxy.md](docs/drivers/Galaxy.md) |
| Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient | [docs/drivers/](docs/drivers/) |
| Galaxy parity rig (mxaccessgw setup) | [docs/v2/Galaxy.ParityRig.md](docs/v2/Galaxy.ParityRig.md) |
| Galaxy performance + tracing | [docs/v2/Galaxy.Performance.md](docs/v2/Galaxy.Performance.md) |
| Service hosting | [docs/ServiceHosting.md](docs/ServiceHosting.md) |
| Security (transport, LDAP, certificates) | [docs/security.md](docs/security.md) |
| Redundancy | [docs/Redundancy.md](docs/Redundancy.md) |
| Address space | [docs/AddressSpace.md](docs/AddressSpace.md) |
| Configuration | [docs/Configuration.md](docs/Configuration.md) |
| Status dashboard | [docs/StatusDashboard.md](docs/StatusDashboard.md) |
## License

View File

@@ -11,9 +11,8 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess
- **Core** owns the OPC UA stack, address space, session/security/subscription machinery.
- **Drivers** plug in via capability interfaces in `ZB.MOM.WW.OtOpcUa.Core.Abstractions`: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`, `IPerCallHostResolver`. Each driver opts into whichever it supports.
- **Server** is the OPC UA endpoint process (net10, x64). Hosts every driver except Galaxy in-process; talks to Galaxy via a named pipe because MXAccess COM is 32-bit-only.
- **Server** is the OPC UA endpoint process (net10, AnyCPU). Hosts every driver in-process. The Galaxy driver reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo); it is no longer hosted from this repo.
- **Admin** is the Blazor Server operator UI (net10, x64). Owns the Config DB draft/publish flow, ACL + role-grant authoring, fleet status + `/metrics` scrape endpoint.
- **Galaxy.Host** is a .NET Framework 4.8 x86 Windows service that wraps MXAccess COM on an STA thread for the Galaxy driver.
## Where to find what
@@ -24,11 +23,11 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess
| [OpcUaServer.md](OpcUaServer.md) | Top-level server architecture — Core, driver dispatch, Config DB, generations |
| [AddressSpace.md](AddressSpace.md) | `GenericDriverNodeManager` + `ITagDiscovery` + `IAddressSpaceBuilder` |
| [ReadWriteOperations.md](ReadWriteOperations.md) | OPC UA Read/Write → `CapabilityInvoker``IReadable`/`IWritable` |
| [Subscriptions.md](Subscriptions.md) | Monitored items → `ISubscribable` + per-driver subscription refcount |
| [AlarmTracking.md](AlarmTracking.md) | `IAlarmSource` + `AlarmSurfaceInvoker` + OPC UA alarm conditions |
| [DataTypeMapping.md](DataTypeMapping.md) | Per-driver `DriverAttributeInfo` → OPC UA variable types |
| [Subscriptions.md](v1/Subscriptions.md) | Monitored items → `ISubscribable` + per-driver subscription refcount (v1 archive) |
| [AlarmTracking.md](v1/AlarmTracking.md) | `IAlarmSource` + `AlarmSurfaceInvoker` + OPC UA alarm conditions (v1 archive) |
| [DataTypeMapping.md](v1/DataTypeMapping.md) | Per-driver `DriverAttributeInfo` → OPC UA variable types (v1 archive — live mapping is in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`) |
| [IncrementalSync.md](IncrementalSync.md) | Address-space rebuild on redeploy + `sp_ComputeGenerationDiff` |
| [HistoricalDataAccess.md](HistoricalDataAccess.md) | `IHistoryProvider` as a per-driver optional capability |
| [HistoricalDataAccess.md](v1/HistoricalDataAccess.md) | `IHistoryProvider` as a per-driver optional capability (v1 archive) |
| [VirtualTags.md](VirtualTags.md) | `Core.Scripting` + `Core.VirtualTags` — Roslyn script sandbox, engine, dispatch alongside driver tags |
| [ScriptedAlarms.md](ScriptedAlarms.md) | `Core.ScriptedAlarms` — script-predicate `IAlarmSource` + Part 9 state machine |
@@ -36,7 +35,7 @@ Two Core subsystems are shipped without a dedicated top-level doc; see the secti
| Project | See |
|---------|-----|
| `Core.AlarmHistorian` | [AlarmTracking.md](AlarmTracking.md) § Alarm historian sink |
| `Core.AlarmHistorian` | [AlarmTracking.md](v1/AlarmTracking.md) § Alarm historian sink (v1 archive) |
| `Analyzers` (Roslyn OTOPCUA0001) | [security.md](security.md) § OTOPCUA0001 Analyzer |
### Drivers
@@ -44,8 +43,8 @@ Two Core subsystems are shipped without a dedicated top-level doc; see the secti
| Doc | Covers |
|-----|--------|
| [drivers/README.md](drivers/README.md) | Index of the eight shipped drivers + capability matrix |
| [drivers/Galaxy.md](drivers/Galaxy.md) | Galaxy driver — MXAccess bridge, Host/Proxy split, named-pipe IPC |
| [drivers/Galaxy-Repository.md](drivers/Galaxy-Repository.md) | Galaxy-specific discovery via the ZB SQL database |
| [drivers/Galaxy.md](drivers/Galaxy.md) | Galaxy driver — in-process gRPC client to the mxaccessgw sidecar |
| [v1/drivers/Galaxy-Repository.md](v1/drivers/Galaxy-Repository.md) | Galaxy-specific discovery via the ZB SQL database (v1 archive — the gateway owns this path now) |
For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics, see [v2/driver-specs.md](v2/driver-specs.md).
@@ -53,10 +52,10 @@ For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics
| Doc | Covers |
|-----|--------|
| [Configuration.md](Configuration.md) | appsettings bootstrap + Config DB + Admin UI draft/publish |
| [Configuration.md](v1/Configuration.md) | appsettings bootstrap + Config DB + Admin UI draft/publish (v1 archive — `OTOPCUA_GALAXY_*` env vars now live in mxaccessgw config) |
| [security.md](security.md) | Transport security profiles, LDAP auth, ACL trie, role grants, OTOPCUA0001 analyzer |
| [Redundancy.md](Redundancy.md) | `RedundancyCoordinator`, `ServiceLevelCalculator`, apply-lease, Prometheus metrics |
| [ServiceHosting.md](ServiceHosting.md) | Three-process deploy (Server + Admin + Galaxy.Host) install/uninstall |
| [ServiceHosting.md](ServiceHosting.md) | Two-process deploy (Server + Admin) install/uninstall, plus the optional `OtOpcUaWonderwareHistorian` sidecar |
| [StatusDashboard.md](StatusDashboard.md) | Pointer — superseded by [v2/admin-ui.md](v2/admin-ui.md) |
### Client tooling
@@ -79,10 +78,10 @@ For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics
|-----|--------|
| [reqs/HighLevelReqs.md](reqs/HighLevelReqs.md) | HLRs — numbered system-level requirements |
| [reqs/OpcUaServerReqs.md](reqs/OpcUaServerReqs.md) | OPC UA server-layer reqs |
| [reqs/ServiceHostReqs.md](reqs/ServiceHostReqs.md) | Per-process hosting reqs |
| [v1/reqs/ServiceHostReqs.md](v1/reqs/ServiceHostReqs.md) | Per-process hosting reqs (v1 archive — only `OtOpcUa` server hosting remains in scope post-PR-7.2) |
| [reqs/ClientRequirements.md](reqs/ClientRequirements.md) | Client CLI + UI reqs |
| [reqs/GalaxyRepositoryReqs.md](reqs/GalaxyRepositoryReqs.md) | Galaxy-scoped repository reqs |
| [reqs/MxAccessClientReqs.md](reqs/MxAccessClientReqs.md) | Galaxy-scoped MXAccess reqs |
| [v1/reqs/GalaxyRepositoryReqs.md](v1/reqs/GalaxyRepositoryReqs.md) | Galaxy-scoped repository reqs (v1 archive — owned by mxaccessgw today) |
| [v1/reqs/MxAccessClientReqs.md](v1/reqs/MxAccessClientReqs.md) | Galaxy-scoped MXAccess reqs (v1 archive — owned by mxaccessgw today) |
| [reqs/StatusDashboardReqs.md](reqs/StatusDashboardReqs.md) | Pointer — superseded by Admin UI |
## Implementation history (`docs/v2/`)
@@ -97,3 +96,7 @@ Design decisions + phase plans + execution notes. Load-bearing cross-references
- [v2/dev-environment.md](v2/dev-environment.md) — dev-box bootstrap
- [v2/test-data-sources.md](v2/test-data-sources.md) — integration-test simulator matrix (includes the pinned libplctag `ab_server` version for AB CIP tests)
- [v2/implementation/phase-*-*.md](v2/implementation/) — per-phase execution plans with exit-gate evidence
## v1 archive
The v1 in-process MXAccess architecture (Galaxy.Host + Galaxy.Proxy + Galaxy.Shared, .NET 4.8 x86 COM, the `OtOpcUaGalaxyHost` Windows service) was retired in PR 7.2 (2026-04-30, commit `ae7106d`). Docs that described that shape are kept under [v1/](v1/) as historical record — see [v1/README.md](v1/README.md) for the index.

View File

@@ -1,211 +1,92 @@
# Galaxy Driver
The Galaxy driver bridges OtOpcUa to AVEVA System Platform (Wonderware) Galaxies through the `ArchestrA.MxAccess` COM API plus the Galaxy Repository SQL database. It is one driver of seven in the OtOpcUa platform (see [drivers/README.md](README.md) for the full list); all other drivers run in-process in the main Server (.NET 10 x64). Galaxy is the exception — it runs as its own Windows service and talks to the Server over a local named pipe.
The Galaxy driver bridges OtOpcUa to AVEVA System Platform (Wonderware) Galaxies. It is a **Tier-A in-process driver** that runs in the OtOpcUa server's .NET 10 AnyCPU process and speaks gRPC to a separately installed `mxaccessgw` server (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`). The gateway owns the MXAccess COM apartment, the STA + Win32 message pump, the Galaxy Repository SQL reader, and the Historian SDK — all the bits that need x86 / .NET Framework 4.8 / COM interop. The driver itself is platform-agnostic and contains no COM, no STA thread, and no x86 bitness constraint.
For the decision record on why Galaxy is out-of-process and how the refactor was staged, see [docs/v2/plan.md §4 Galaxy/MXAccess as Out-of-Process Driver](../v2/plan.md). For the full driver spec (addressing, data-type map, config shape), see [docs/v2/driver-specs.md §1](../v2/driver-specs.md).
For the driver spec (capability surface, config shape, addressing), see [docs/v2/driver-specs.md §1](../v2/driver-specs.md). For the gateway setup recipe, see [docs/v2/Galaxy.ParityRig.md](../v2/Galaxy.ParityRig.md). For tracing, metrics, and soak profile, see [docs/v2/Galaxy.Performance.md](../v2/Galaxy.Performance.md).
## Project Split
> **Note**: the related drivers `Galaxy-Repository.md` and `Galaxy-Test-Fixture.md` describe the previous v1 / out-of-process topology and are being moved to `docs/v1/` by a parallel cleanup track. Use `Galaxy.ParityRig.md` and the `mxaccessgw` repo for current testing.
Galaxy ships as three projects:
## Architecture
| Project | Target | Role |
|---------|--------|------|
| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/` | .NET Standard 2.0 | IPC contracts (MessagePack records + `MessageKind` enum) referenced by both sides |
| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/` | .NET Framework 4.8 **x86** | Separate Windows service hosting the MXAccess COM objects, STA thread + Win32 message pump, Galaxy Repository reader, Historian SDK, runtime-probe manager |
| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/` | .NET 10 (matches Server) | `GalaxyProxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IRediscoverable, IHostConnectivityProbe` — loaded in-process by the Server; every call forwards over the pipe to the Host |
The Shared assembly is the **only** contract between the two runtimes. It carries no COM or SDK references so Proxy (net10) can reference it without dragging x86 code into the Server process.
## Why Out-of-Process
Two reasons drive the split, per `docs/v2/plan.md`:
1. **Bitness constraint.** MXAccess is 32-bit COM only — `ArchestrA.MxAccess.dll` in `Program Files (x86)\ArchestrA\Framework\bin` has no 64-bit variant. The main OtOpcUa Server is .NET 10 x64 (the OPC Foundation stack, SqlClient, and every other non-Galaxy driver target 64-bit). In-process hosting would force the whole Server to x86, which every other driver project would then inherit.
2. **Tier-C stability isolation.** Galaxy is classified Tier C in [docs/v2/driver-stability.md](../v2/driver-stability.md) — the COM runtime, STA thread, Aveva Historian SDK, and SQL queries all have crash/hang modes that can take down the hosting process. Isolating the driver in its own Windows service means a COM deadlock, AccessViolation in an unmanaged Historian DLL, or a runaway SQL query never takes the Server endpoint down. The Proxy-side supervisor restarts the Host with crash-loop circuit-breaker.
The same Tier-C isolation story applies to FOCAS (decision record in `docs/v2/plan.md` §7), which is the second out-of-process driver.
## IPC Transport
`GalaxyProxyDriver``GalaxyIpcClient` → named pipe → `Galaxy.Host` pipe server.
- Pipe name: `otopcua-galaxy-{DriverInstanceId}` (localhost-only, no TCP surface)
- Wire format: MessagePack-CSharp, length-prefixed frames
- ACL: pipe is created with a DACL that grants `ReadWrite | Synchronize` only to the configured Server service-principal SID + denies `LocalSystem`. The per-connection SID check in `PipeServer.VerifyCaller` is the real authorization boundary — any caller whose impersonated token SID doesn't match the allowed SID is dropped before the first frame is read.
- Handshake: Proxy presents a shared secret at `OpenSessionRequest`; Host rejects anything else with `MessageKind.OpenSessionResponse{Success=false}`
- Heartbeat: Proxy sends a periodic ping; missed heartbeats trigger the Proxy-side crash-loop supervisor to restart the Host
Every capability call on `GalaxyProxyDriver` (Read, Write, Subscribe, HistoryRead*, etc.) serializes a `*Request`, awaits the matching `*Response` via a `CallAsync<TReq, TResp>` helper, and rehydrates the result into the `Core.Abstractions` shape the Server expects.
## STA Thread Requirement (Host-side)
MXAccess COM objects — `LMXProxyServer` instantiation, `Register`, `AddItem`, `AdviseSupervisory`, `Write`, and cleanup calls — must all execute on the same Single-Threaded Apartment. Calling a COM object from the wrong thread causes marshalling failures or silent data corruption.
`StaComThread` in the Host provides that thread with the apartment state set before the thread starts:
```csharp
_thread = new Thread(ThreadEntry) { Name = "MxAccess-STA", IsBackground = true };
_thread.SetApartmentState(ApartmentState.STA);
```
+---------------------------------------+
| OtOpcUa.Server (.NET 10 AnyCPU) |
| GalaxyDriver (in-process) |
| ITagDiscovery / IReadable / |
| IWritable / ISubscribable / |
| IRediscoverable / |
| IHostConnectivityProbe |
+-------------------+-------------------+
|
gRPC (default http://localhost:5120)
|
v
+---------------------------------------+
| mxaccessgw (sibling repo) |
| +-------------------------------+ |
| | MxGateway.Worker (x86 net48) | |
| | STA + WM_APP pump | |
| | ArchestrA.MxAccess COM | |
| | Galaxy Repository SQL | |
| | Wonderware Historian SDK | |
| +-------------------------------+ |
+---------------------------------------+
```
Work items queue via `RunAsync(Action)` or `RunAsync<T>(Func<T>)` into a `ConcurrentQueue<Action>` and post `WM_APP` to wake the pump. Each work item is wrapped in a `TaskCompletionSource` so callers can `await` the result from any thread — including the IPC handler thread that receives the inbound pipe request.
History reads + alarm-condition tracking moved server-side in PR 7.2 (`IHistoryRouter`, `AlarmConditionService`). Galaxy no longer implements `IHistoryProvider` or `IAlarmSource` of its own.
## Win32 Message Pump (Host-side)
## Project Layout
COM callbacks (`OnDataChange`, `OnWriteComplete`) are delivered through the Windows message loop. `StaComThread` runs a standard Win32 message pump via P/Invoke:
The driver ships as a single project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (.NET 10, AnyCPU). Sub-folders:
1. `PeekMessage` primes the message queue (required before `PostThreadMessage` works)
2. `GetMessage` blocks until a message arrives
3. `WM_APP` drains the work queue
4. `WM_APP + 1` drains the queue and posts `WM_QUIT` to exit the loop
5. All other messages go through `TranslateMessage` / `DispatchMessage` for COM callback delivery
| Folder | Role |
|--------|------|
| `Browse/` | Static-side discovery: `GalaxyDiscoverer` walks the gateway's hierarchy + attribute-set RPCs, `DataTypeMap` and `SecurityMap` translate Galaxy types and security classifications into OPC UA equivalents, `AlarmRefBuilder` extracts alarm-bearing attribute references for the server-layer alarm engine. `IGalaxyHierarchySource` + `GatewayGalaxyHierarchySource` + `TracedGalaxyHierarchySource` decorate the gateway browse RPC; `IGalaxyDeployWatchSource` + `GatewayGalaxyDeployWatchSource` + `DeployWatcher` drive `IRediscoverable`. |
| `Runtime/` | Live data path: `EventPump` runs the gateway's `StreamEvents` RPC and fans out to subscribers via a bounded channel; `GalaxyMxSession` is the read-side handle; `GatewayGalaxySubscriber` + `GatewayGalaxyDataWriter` (each with a `Traced*` decorator) implement `ISubscribable` / `IWritable`; `SubscriptionRegistry` tracks subscription state for replay; `ReconnectSupervisor` owns the backoff loop and triggers `ReplaySubscriptions` on session loss; `StatusCodeMap` translates gateway StatusCodes to OPC UA; `MxValueDecoder` / `MxValueEncoder` handle scalar + array marshalling; `GalaxyTelemetry` + `GalaxySubscriptionHandle` round out the surface. |
| `Health/` | `HostStatusAggregator` rolls per-platform probe state into the driver's `IHostConnectivityProbe` view; `PerPlatformProbeWatcher` listens on the gateway's per-host status stream; `HostConnectivityForwarder` pushes transitions out to the server's connectivity bus. |
| `Config/` | `GalaxyDriverOptions` and the four nested option records (`GalaxyGatewayOptions`, `GalaxyMxAccessOptions`, `GalaxyRepositoryOptions`, `GalaxyReconnectOptions`). |
Without this pump MXAccess callbacks never fire and the driver delivers no live data.
Project root files:
## LMXProxyServer COM Object
- `GalaxyDriver.cs``IDriver` + capability-interface implementation; composes the Browse / Runtime / Health collaborators.
- `GalaxyDriverFactoryExtensions.cs` — DI registration helper used by the server's driver bootstrap.
`MxProxyAdapter` wraps the real `ArchestrA.MxAccess.LMXProxyServer` COM object behind the `IMxProxy` interface so Host unit tests can substitute a fake proxy without requiring the ArchestrA runtime. Lifecycle:
## Capability Surface
1. **`Register(clientName)`** — Creates a new `LMXProxyServer` instance, wires up `OnDataChange` and `OnWriteComplete` event handlers, calls `Register` to obtain a connection handle
2. **`Unregister(handle)`** — Unwires event handlers, calls `Unregister`, releases the COM object via `Marshal.ReleaseComObject`
`GalaxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IDisposable`.
## Register / AddItem / AdviseSupervisory Pattern
| Capability | Implementation entry point |
|------------|---------------------------|
| `ITagDiscovery` | `Browse/GalaxyDiscoverer.cs` |
| `IRediscoverable` | `Browse/DeployWatcher.cs` |
| `IReadable` | `Runtime/GalaxyMxSession.cs` |
| `IWritable` | `Runtime/GatewayGalaxyDataWriter.cs` |
| `ISubscribable` | `Runtime/GatewayGalaxySubscriber.cs` (driven by `EventPump`) |
| `IHostConnectivityProbe` | `Health/HostStatusAggregator.cs` |
Every MXAccess data operation follows a three-step pattern, all executed on the STA thread:
## Configuration
1. **`AddItem(handle, address)`** — Resolves a Galaxy tag reference (e.g., `TestMachine_001.MachineID`) to an integer item handle
2. **`AdviseSupervisory(handle, itemHandle)`** — Subscribes the item for supervisory data-change callbacks
3. The runtime begins delivering `OnDataChange` events
`DriverConfig` JSON binds to `Config/GalaxyDriverOptions.cs`. The four sections are:
For writes, after `AddItem` + `AdviseSupervisory`, `Write(handle, itemHandle, value, securityClassification)` sends the value; `OnWriteComplete` confirms or rejects. Cleanup reverses: `UnAdviseSupervisory` then `RemoveItem`.
- **`Gateway`** — endpoint, API key secret ref, TLS knobs, connect/call/stream timeouts. `StreamTimeoutSeconds = 0` keeps the long-lived `StreamEvents` RPC open for the driver's lifetime.
- **`MxAccess`** — `ClientName` (must be unique per OtOpcUa instance — redundancy pairs enforce uniqueness at install time), `PublishingIntervalMs` (forwarded as `buffered_update_interval_ms` on subscribe), `WriteUserId` for ArchestrA secured-write, `EventPumpChannelCapacity` (default 50_000 — one second of headroom at 50k tags / 1Hz; tune via the `galaxy.events.dropped` metric).
- **`Repository`** — `DiscoverPageSize`, `WatchDeployEvents`.
- **`Reconnect`** — `InitialBackoffMs`, `MaxBackoffMs`, `ReplayOnSessionLost` (calls the gateway's `ReplaySubscriptions` RPC after reconnect rather than re-issuing subscribe-bulk for every tag).
## OnDataChange and OnWriteComplete Callbacks
Full per-field descriptions live in `Config/GalaxyDriverOptions.cs`. The full JSON skeleton is reproduced in [docs/v2/driver-specs.md §1](../v2/driver-specs.md).
### OnDataChange
## Reconnect + Replay
Fired by the COM runtime on the STA thread when a subscribed tag changes. The handler in `MxAccessClient.EventHandlers.cs`:
`ReconnectSupervisor` owns an exponential-backoff loop bounded by `Reconnect.InitialBackoffMs` / `MaxBackoffMs`. On session loss it tears down the gRPC channel, redials, and — when `ReplayOnSessionLost = true` — calls the gateway's `ReplaySubscriptions` RPC with the cached subscription set from `SubscriptionRegistry` instead of re-subscribing tag-by-tag. The gateway's worker then re-issues `AdviseSupervisory` server-side under the apartment lock.
1. Maps the integer `phItemHandle` back to a tag address via `_handleToAddress`
2. Maps the MXAccess quality code to the internal `Quality` enum
3. Checks `MXSTATUS_PROXY` for error details and adjusts quality
4. Converts the timestamp to UTC
5. Constructs a `Vtq` (Value/Timestamp/Quality) and delivers it to:
- The stored per-tag subscription callback
- Any pending one-shot read completions
- The global `OnTagValueChanged` event (consumed by the Host's subscription dispatcher, which packages changes into `DataChangeEventArgs` and forwards them over the pipe to `GalaxyProxyDriver.OnDataChange`)
## Testing
### OnWriteComplete
- **Unit tests**: `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/` — fakes the gateway gRPC surface; covers Browse, Runtime, Health, and Config in isolation.
- **Parity rig + dev-rig walkthrough**: see [docs/v2/Galaxy.ParityRig.md](../v2/Galaxy.ParityRig.md). The rig stands up a real `mxaccessgw` against a live Galaxy and exercises the full read / write / subscribe / rediscover path.
- **Performance + soak**: see [docs/v2/Galaxy.Performance.md](../v2/Galaxy.Performance.md).
Fired when the runtime acknowledges or rejects a write. The handler resolves the pending `TaskCompletionSource<bool>` for the item handle. If `MXSTATUS_PROXY.success == 0` the write is considered failed and the error detail is logged.
## Operational Notes
## Reconnection Logic
`MxAccessClient` implements automatic reconnection through two mechanisms.
### Monitor loop
`StartMonitor` launches a background task that polls at `MonitorIntervalSeconds`. On each cycle:
- If the state is `Disconnected` or `Error` and `AutoReconnect` is enabled, it calls `ReconnectAsync`
- If connected and a probe tag is configured, it checks the probe staleness threshold
### Reconnect sequence
`ReconnectAsync` performs a full disconnect-then-connect cycle:
1. Increment the reconnect counter
2. `DisconnectAsync` — tear down all active subscriptions (`UnAdviseSupervisory` + `RemoveItem` for each), detach COM event handlers, call `Unregister`, clear all handle mappings
3. `ConnectAsync` — create a fresh `LMXProxyServer`, register, replay all stored subscriptions, re-subscribe the probe tag
Stored subscriptions (`_storedSubscriptions`) persist across reconnects. `ReplayStoredSubscriptionsAsync` iterates the stored entries and calls `AddItem` + `AdviseSupervisory` for each.
## Probe Tag Health Monitoring
A configurable probe tag (e.g., a frequently updating Galaxy attribute) serves as a connection health indicator. After connecting, the client subscribes to the probe tag and records `_lastProbeValueTime` on every `OnDataChange`. The monitor loop compares `DateTime.UtcNow - _lastProbeValueTime` against `ProbeStaleThresholdSeconds`; if the probe has not updated within the window, the connection is assumed stale and a reconnect is forced. This catches scenarios where the COM connection is technically alive but the runtime has stopped delivering data.
## Per-Host Runtime Status Probes (`<Host>.ScanState`)
Separate from the connection-level probe, the driver advises `<HostName>.ScanState` on every deployed `$WinPlatform` and `$AppEngine` in the Galaxy. These probes track per-host runtime state so the Admin UI dashboard can report "this specific Platform / AppEngine is off scan" and the driver can proactively invalidate every OPC UA variable hosted by the stopped object — preventing MXAccess from serving stale Good-quality cached values to clients who read those tags while the host is down.
Enabled by default via `MxAccess.RuntimeStatusProbesEnabled`; see [Configuration](../Configuration.md#mxaccess) for the two config fields.
### How it works
`GalaxyRuntimeProbeManager` lives in `Driver.Galaxy.Host` alongside the rest of the MXAccess code. It is owned by the Host's subscription dispatcher and runs a three-state machine per host (Unknown / Running / Stopped):
1. **Discovery** — After the Host completes `BuildAddressSpace`, the manager filters the hierarchy to rows where `CategoryId == 1` (`$WinPlatform`) or `CategoryId == 3` (`$AppEngine`) and issues `AdviseSupervisory` for `<TagName>.ScanState` on each one. Probes are driver-owned, not ref-counted against client subscriptions, and persist across address-space rebuilds via a `Sync` diff.
2. **Transition predicate** — A probe callback is interpreted as `isRunning = vtq.Quality.IsGood() && vtq.Value is bool b && b`. Everything else (explicit `ScanState = false`, bad quality, communication errors) means **Stopped**.
3. **On-change-only delivery**`ScanState` is delivered only when the value actually changes. A stably Running host may go hours without a callback. `Tick()` does NOT run a starvation check on Running entries — the only time-based transition is **Unknown → Stopped** when the initial callback hasn't arrived within `RuntimeStatusUnknownTimeoutSeconds` (default 15s). This protects against a probe that fails to resolve at all without incorrectly flipping healthy long-running hosts.
4. **Transport gating** — When `IMxAccessClient.State != Connected`, `GetSnapshot()` forces every entry to `Unknown`. The dashboard shows the Connection panel as the primary signal in that case rather than misleading operators with "every host stopped".
5. **Subscribe failure rollback** — If `SubscribeAsync` throws for a new probe (SDK failure, broker rejection, transport error), the manager rolls back both `_byProbe` and `_probeByGobjectId` so the probe never appears in `GetSnapshot()`. Stability review 2026-04-13 Finding 1.
### Subtree quality invalidation on transition
When a host transitions **Running → Stopped**, the probe manager invokes a callback that walks `_hostedVariables[gobjectId]` — the set of every OPC UA variable transitively hosted by that Galaxy object — and sets each variable's `StatusCode` to `BadOutOfService`. **Stopped → Running** calls `ClearHostVariablesBadQuality` to reset each to `Good` so the next on-change MXAccess update repopulates the value.
The hosted-variables map is built once per `BuildAddressSpace` by walking each object's `HostedByGobjectId` chain up to the nearest Platform or Engine ancestor. A variable hosted by an Engine inside a Platform lands in both the Engine's list and the Platform's list, so stopping the Platform transitively invalidates every descendant Engine's variables.
### Read-path short-circuit (`IsTagUnderStoppedHost`)
The Host's Read handler checks `IsTagUnderStoppedHost(tagRef)` (a reverse-index lookup `_hostIdsByTagRef[tagRef]``GalaxyRuntimeProbeManager.IsHostStopped(hostId)`) before the MXAccess round-trip. When the owning host is Stopped, the handler returns a synthesized `DataValue { Value = cachedVar.Value, StatusCode = BadOutOfService }` directly without touching MXAccess. This guarantees clients see a uniform `BadOutOfService` on every descendant tag of a stopped host, regardless of whether they're reading or subscribing.
### Deferred dispatch — the STA deadlock
**Critical**: probe transition callbacks must **not** run synchronously on the STA thread that delivered the `OnDataChange`. `MarkHostVariablesBadQuality` takes the subscription dispatcher lock, which may be held by a worker thread currently inside `Read` waiting on an `_mxAccessClient.ReadAsync()` round-trip that is itself waiting for the STA thread. Classic circular wait — the first real deploy of this feature hung inside 30 seconds from exactly this pattern.
The fix is a deferred-dispatch queue: probe callbacks enqueue the transition onto `ConcurrentQueue<(int GobjectId, bool Stopped)>` and set the existing dispatch signal. The dispatch thread drains the queue inside its existing 100ms `WaitOne` loop — outside any locks held by the STA path — and then calls `MarkHostVariablesBadQuality` / `ClearHostVariablesBadQuality` under its own natural lock acquisition. No circular wait, no STA involvement.
### Dashboard and health surface
- Admin UI **Galaxy Runtime** panel shows per-host state with Name / Kind / State / Since / Last Error columns. Panel color is green (all Running), yellow (any Unknown, none Stopped), red (any Stopped), gray (MXAccess transport disconnected)
- `HealthCheckService.CheckHealth` rolls overall driver health to `Degraded` when any host is Stopped
See [Status Dashboard](../StatusDashboard.md#galaxy-runtime) for the field table and [Configuration](../Configuration.md#mxaccess) for the config fields.
## Request Timeout Safety Backstop
Every sync-over-async site on the OPC UA stack thread that calls into Galaxy (`Read`, `Write`, address-space rebuild probe sync) is wrapped in a bounded `SyncOverAsync.WaitSync(...)` helper with timeout `MxAccess.RequestTimeoutSeconds` (default 30s). Inner `ReadTimeoutSeconds` / `WriteTimeoutSeconds` bounds on the async path are the first line of defense; the outer wrapper is a backstop so a scheduler stall, slow reconnect, or any other non-returning async path cannot park the stack thread indefinitely.
On timeout, the underlying task is **not** cancelled — it runs to completion on the thread pool and is abandoned. This is acceptable because Galaxy IPC clients are shared singletons and the abandoned continuation does not capture request-scoped state. The OPC UA stack receives `StatusCodes.BadTimeout` on the affected operation.
`ConfigurationValidator` enforces `RequestTimeoutSeconds >= 1` and warns when it is set below the inner Read/Write timeouts (operator misconfiguration). Stability review 2026-04-13 Finding 3.
All capability calls at the Server dispatch layer are additionally wrapped by `CapabilityInvoker` (Core/Resilience/) which runs them through a Polly pipeline keyed on `(DriverInstanceId, HostName, DriverCapability)`. `OTOPCUA0001` analyzer enforces the wrap at build time.
## Why Marshal.ReleaseComObject Is Needed
The .NET Framework runtime's garbage collector releases COM references non-deterministically. For MXAccess, delayed release can leave stale COM connections open, preventing clean re-registration. `MxProxyAdapter.Unregister` calls `Marshal.ReleaseComObject(_lmxProxy)` in a `finally` block to immediately drive the COM reference count to zero. This ensures the underlying COM server is freed before a reconnect attempt creates a new instance.
## Tag Discovery and Historical Data
Tag discovery (the Galaxy Repository SQL reader + `LocalPlatform` scope filter) is covered in [Galaxy-Repository.md](Galaxy-Repository.md). The Galaxy driver is `ITagDiscovery` for the Server's bootstrap path and `IRediscoverable` for the on-change-redeploy path.
Historical data access (raw, processed, at-time, events) runs against the Aveva Historian via the `aahClientManaged` SDK and is exposed through the Galaxy driver's `IHistoryProvider` implementation. See [HistoricalDataAccess.md](../HistoricalDataAccess.md).
## Key source files
Host-side (`.NET 4.8 x86`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/`):
- `Backend/MxAccess/StaComThread.cs` — STA thread and Win32 message pump
- `Backend/MxAccess/MxAccessClient.cs` — Core client (partial)
- `Backend/MxAccess/MxAccessClient.Connection.cs` — Connect / disconnect / reconnect
- `Backend/MxAccess/MxAccessClient.Subscription.cs` — Subscribe / unsubscribe / replay
- `Backend/MxAccess/MxAccessClient.ReadWrite.cs` — Read and write operations
- `Backend/MxAccess/MxAccessClient.EventHandlers.cs``OnDataChange` / `OnWriteComplete` handlers
- `Backend/MxAccess/MxAccessClient.Monitor.cs` — Background health monitor
- `Backend/MxAccess/MxProxyAdapter.cs` — COM object wrapper
- `Backend/MxAccess/GalaxyRuntimeProbeManager.cs` — Per-host `ScanState` probes, state machine, `IsHostStopped` lookup
- `Backend/Historian/HistorianDataSource.cs``aahClientManaged` SDK wrapper (see [HistoricalDataAccess.md](../HistoricalDataAccess.md))
- `Ipc/GalaxyIpcServer.cs` — Named-pipe server, message dispatch
- `Domain/IMxAccessClient.cs` — Client interface
Shared (`.NET Standard 2.0`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/`):
- `Contracts/MessageKind.cs` — IPC message kinds (`ReadRequest`, `HistoryReadRequest`, `OpenSessionResponse`, …)
- `Contracts/*.cs` — MessagePack DTOs for every request/response pair
Proxy-side (`.NET 10`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/`):
- `GalaxyProxyDriver.cs``IDriver`/`ITagDiscovery`/`IReadable`/`IWritable`/`ISubscribable`/`IAlarmSource`/`IHistoryProvider`/`IRediscoverable`/`IHostConnectivityProbe` implementation; every method forwards via `GalaxyIpcClient`
- `Ipc/GalaxyIpcClient.cs` — Named-pipe client, `CallAsync<TReq, TResp>`, reconnect on broken pipe
- `GalaxyProxySupervisor.cs` — Host-process monitor, crash-loop circuit-breaker, Host relaunch
- **MXAccess `ClientName` collisions**: two OtOpcUa instances sharing a `ClientName` cause the older Wonderware session to lose subscription state. Redundancy pairs (decision #149) enforce uniqueness via install scripts.
- **Channel saturation**: `galaxy.events.dropped > 0` indicates `EventPump` is back-pressured. Raise `EventPumpChannelCapacity` or investigate downstream slowness in the server-side fan-out.
- **Connectivity surface**: per-platform probe state is exposed through `IHostConnectivityProbe` and aggregated by the server's connectivity bus — there is no driver-private dashboard surface anymore. The Admin UI's Host Status panel is the consumer.

30
docs/v1/README.md Normal file
View File

@@ -0,0 +1,30 @@
# v1 documentation archive
This folder contains documentation that described the original v1
in-process MXAccess architecture (`Galaxy.Host` + `Galaxy.Proxy` +
`Galaxy.Shared` three-project split, .NET 4.8 x86 + COM apartment, the
`OtOpcUaGalaxyHost` Windows service). That architecture was retired in
PR 7.2 (merged 2026-04-30 at commit `ae7106d`). These docs are kept as
the historical record of how the system worked before the v2-mxgw
migration; treat their content as accurate at the time of writing, NOT
as current state.
For current architecture see:
- `CLAUDE.md` — agent-facing v2 overview
- `docs/drivers/Galaxy.md` — current Galaxy driver doc
- `docs/v2/Galaxy.ParityRig.md` — current testing setup
- `docs/v2/Galaxy.Performance.md` — observability + perf
- `lmx_mxgw.md` (in repo root) — design rationale for the migration
| File | What it covered |
|---|---|
| `AlarmTracking.md` | v1 alarm-tracking flow through the in-process MXAccess client |
| `Configuration.md` | v1 server configuration (`OTOPCUA_GALAXY_*` env vars now live in mxaccessgw config) |
| `DataTypeMapping.md` | Galaxy `mx_data_type` → OPC UA type mapping (still accurate as a reference; the live mapping logic is in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`) |
| `HistoricalDataAccess.md` | v1 IHistoryProvider on the Host side; current path is the server-level HistoryRouter + Wonderware sidecar |
| `Subscriptions.md` | v1 MXAccess subscription mechanics; current path uses gateway StreamEvents |
| `drivers/Galaxy-Repository.md` | v1 Host-side ZB SQL repository client; the gateway owns this path now |
| `drivers/Galaxy-Test-Fixture.md` | v1 test-fixture setup (parity tests + Galaxy.Host EXE spawn) |
| `reqs/GalaxyRepositoryReqs.md`, `reqs/MxAccessClientReqs.md` | Original Phase 0 requirements; satisfied in mxaccessgw repo today |
| `reqs/ServiceHostReqs.md` | Service-hosting requirements including `OtOpcUaGalaxyHost` (GHX-* section); only `OtOpcUa` server hosting remains in scope post-7.2 |

View File

@@ -1,3 +1,15 @@
> **✅ Completed 2026-04-30 — historical record of the parity-rig validation gate for PR 7.2.**
>
> The matrix below was the go/no-go gate for retiring the legacy
> Galaxy.Host backend (PR 7.2). Final run on the dev rig 2026-04-30
> returned 14 passed / 1 skipped / 0 failed; PR 7.2 (commit `fe91d42`)
> deleted the legacy projects + service the next day. The "Running
> the matrix" section is preserved for historical reproducibility but
> the test projects it references (`Driver.Galaxy.ParityTests`) were
> deleted alongside the legacy backend; this matrix is no longer
> runnable. Current Galaxy testing flows through the gateway's own
> test suite (sibling mxaccessgw repo).
# Galaxy backend parity matrix
This document tracks the scenario × result matrix that the

View File

@@ -1,15 +1,30 @@
# Galaxy parity rig — runbook
> ✅ **Completed 2026-04-30 — historical record.** This runbook is the
> recipe that produced the green parity matrix that gated PR 7.2
> (retire legacy Galaxy projects, merged at commit `ae7106d`). The
> matrix it produced is captured in
> [`Galaxy.ParityMatrix.md`](Galaxy.ParityMatrix.md), also marked
> historical. The test project this doc drove
> (`Driver.Galaxy.ParityTests`) was deleted in PR 7.2, along with
> `Driver.Galaxy.{Host,Proxy,Shared}` and the `OtOpcUaGalaxyHost`
> Windows service. **You cannot re-run this rig today.** Current
> Galaxy testing flows through the gateway's own test suite in the
> sibling `mxaccessgw` repo.
>
> The text below is preserved as-written so the migration trail (what
> was tested, against what shape, with what env vars) stays auditable.
Brings up both Galaxy backends side-by-side against a single live Galaxy
so the parity matrix in `docs/v2/Galaxy.ParityMatrix.md` and the soak
scenario in `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs`
can run for real. Closing the parity matrix is the gate for PR 7.2
can run for real. Closing the parity matrix was the gate for PR 7.2
(retire legacy Galaxy projects).
## Conceptual layout
```
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86)
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) [DELETED in PR 7.2]
│ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
│ └── named pipe "OtOpcUaGalaxy"
│ ▲
@@ -29,17 +44,19 @@ Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86)
Both halves talk to the **same Galaxy** through **two distinct MxAccess
sessions** (different ClientNames so they don't evict each other).
## What's already on this dev box
## What was on the dev box at the time
Per `~/.claude/projects/.../memory/`:
Per `~/.claude/projects/.../memory/` *as of the rig run*:
- **AVEVA System Platform + Galaxy + MXAccess runtime** — `project_aveva_platform_installed.md`.
- **`OtOpcUaGalaxyHost`** Windows service running as `dohertj2`, NSSM-wrapped,
binary at `C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe`,
shared secret at `.local/galaxy-host-secret.txt`, ZB SQL on `localhost:1433`
`project_galaxy_host_installed.md`.
- **Parity test project** (`Driver.Galaxy.ParityTests`) committed and
skip-clean — runs as soon as the mxgw half resolves.
`project_galaxy_host_installed.md`. **(Service uninstalled and binary
retired as part of PR 7.2; the host source project no longer exists in
this repo.)**
- **Parity test project** (`Driver.Galaxy.ParityTests`) — committed and
skip-clean at the time of the rig run. **Deleted in PR 7.2.**
## Setup steps (one-time)
@@ -282,7 +299,7 @@ sees the change:
```powershell
graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 `
--confirm --confirm-target OtOpcUaParityTest_001
sc.exe restart OtOpcUaGalaxyHost
sc.exe restart OtOpcUaGalaxyHost # service no longer exists post-PR-7.2; in the modern shape, restart mxaccessgw instead
```
Then re-run the parity matrix. The previously-skipped scenarios should
@@ -343,11 +360,14 @@ Galaxy with a script that imports 50k attributes onto a generated UDO
- **`LegacySkipReason` says "Galaxy ZB SQL not reachable on
localhost:1433"** — SQL Server isn't running, or its TCP listener is
off. Check `services.msc` for the SQL Server (default) instance.
- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — the parity
harness looks under `src/.../bin/Debug/net48/`. Build it once:
`dotnet build src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`. Note the
separately-published copy at `C:\publish\OtOpcUaGalaxyHost\` is for
the Windows service; the parity harness spawns its own subprocess.
- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — at rig time
the parity harness looked under
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` for the
EXE it spawned as a subprocess, separate from the published copy at
`C:\publish\OtOpcUaGalaxyHost\` used by the Windows service. **Both
the source project and the published binary were removed in PR 7.2,
so this troubleshooting branch no longer applies — the legacy half
cannot be brought up at all.**
- **Both halves resolve but parity scenarios assert deltas** — that's
the expected outcome the rig exists to surface. Review each delta
against `docs/v2/Galaxy.ParityMatrix.md`'s "Accepted deltas" section

View File

@@ -10,289 +10,65 @@
### Summary
Out-of-process **Tier C** driver bridging AVEVA System Platform (Wonderware) Galaxies. The existing v1 implementation is refactored behind the new driver capability interfaces and hosted in a separate Windows service (.NET 4.8 x86) that communicates with the main OtOpcUa server (.NET 10 x64) via named pipes + MessagePack. Hosted out-of-process for **two reasons**: COM/.NET 4.8 x86 bitness constraint **and** Tier C stability isolation (per `driver-stability.md`). FOCAS is the second Tier C driver, also out-of-process — see §7.
### Library & Dependencies
| Component | Package / Source | Version | Target | Notes |
|-----------|------------------|---------|--------|-------|
| **MXAccess COM** | `ArchestrA.MxAccess` (GAC / `lib/ArchestrA.MxAccess.dll`) | version-neutral late-bound | .NET 4.8 x86 | Pinned via `<Reference Include="ArchestrA.MxAccess">` with `EmbedInteropTypes=false`; interfaces: `LMXProxyServer`, `ILMXProxyServerEvents`, `MXSTATUS_PROXY` |
| **Galaxy DB client** | `System.Data.SqlClient` (BCL) | BCL | .NET 4.8 x86 | Direct SQL for hierarchy/attribute/change-detection queries |
| **Wonderware Historian SDK** | `aahClientManaged`, `aahClientCommon` | Historian-shipped | .NET 4.8 x86 | Optional — loaded only when `Historian.Enabled=true` |
| **MessagePack-CSharp** | `MessagePack` NuGet | 2.x | .NET Standard 2.0 (Shared) | IPC serialization; shared contract between Proxy and Host |
| **Named pipes** | `System.IO.Pipes` (BCL) | BCL | both sides | IPC transport, localhost only |
### Required Components
- **AVEVA System Platform / ArchestrA Platform** deployed on the same machine as `Galaxy.Host` (installs MXAccess COM objects into the GAC)
- A **deployed Galaxy** with at least one $WinPlatform object hosting $AppEngine(s) hosting AutomationObjects
- **SQL Server** reachable from `Galaxy.Host` with the Galaxy repository database (default `ZB`); Windows Auth by default
- **32-bit .NET Framework 4.8** runtime on the Host machine (MXAccess is 32-bit COM, no 64-bit variant)
- **STA thread + Win32 message pump** inside the Host process for all COM calls and event callbacks (see §13)
- **Wonderware Historian** installed on-box or reachable via aah SDK — *only* if HDA is enabled
- **No external firewall ports** — MXAccess is local-machine COM/IPC; pipe is localhost-only. Galaxy DB port (default SQL 1433) if the ZB database is remote.
### Connection Settings (per driver instance, from central config DB)
All settings live under a schemaless `DriverConfig` JSON blob on the `DriverInstance` row. Current v1 equivalents (defaults and source file references in parentheses):
**MXAccess** (`MxAccessConfiguration.cs`):
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `ClientName` | string | `"LmxOpcUa"` | Registration name passed to `LMXProxyServer.Register()` |
| `NodeName` | string? | `null` | Optional ArchestrA node override (null = local) |
| `GalaxyName` | string? | `null` | Optional Galaxy name override |
| `ReadTimeoutSeconds` | int | `5` | Per-read timeout |
| `WriteTimeoutSeconds` | int | `5` | Per-write timeout |
| `RequestTimeoutSeconds` | int | `30` | Outer safety timeout around any MXAccess request |
| `MaxConcurrentOperations` | int | `10` | Pool bound on in-flight MXAccess work items |
| `MonitorIntervalSeconds` | int | `5` | Connectivity heartbeat probe interval |
| `AutoReconnect` | bool | `true` | Replay stored subscriptions on COM reconnect |
| `ProbeTag` | string? | `null` | Optional heartbeat tag for health monitoring |
| `ProbeStaleThresholdSeconds` | int | `60` | Mark connection stale if no probe callback within |
| `RuntimeStatusProbesEnabled` | bool | `true` | Auto-subscribe `ScanState` for $WinPlatform / $AppEngine |
| `RuntimeStatusUnknownTimeoutSeconds` | int | `15` | Grace period before an un-probed host is assumed Stopped |
**Galaxy repository** (`GalaxyRepositoryConfiguration.cs`):
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `ConnectionString` | string | `Server=localhost;Database=ZB;Integrated Security=true;` | ZB SQL Server connection |
| `ChangeDetectionIntervalSeconds` | int | `30` | Poll interval for `galaxy.time_of_last_deploy` |
| `CommandTimeoutSeconds` | int | `30` | SQL command timeout |
| `ExtendedAttributes` | bool | `false` | Include extended attribute metadata in discovery |
| `Scope` | enum (`Galaxy` \| `LocalPlatform`) | `Galaxy` | Address-space scope filter (commit bc282b6) |
| `PlatformName` | string? | `Environment.MachineName` | Platform to scope to when `Scope=LocalPlatform` |
**IPC** (new for v2):
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `PipeName` | string | `otopcua-galaxy-{InstanceId}` | Named pipe name |
| `HostStartupTimeoutMs` | int | `30000` | Proxy wait for Host `Ready` handshake |
| `IpcCallTimeoutMs` | int | `15000` | Per-call RPC timeout |
### Addressing
Galaxy objects carry two names:
- **`contained_name`** — human-readable, scoped to parent; used for OPC UA browse tree
- **`tag_name`** — globally unique system identifier; used for MXAccess runtime references
| Layer | Example |
|-------|---------|
| OPC UA browse path | `TestMachine_001/DelmiaReceiver/DownloadPath` |
| OPC UA NodeId | `ns=<galaxyNs>;s=<tagName>.<AttributeName>` |
| MXAccess reference | `DelmiaReceiver_001.DownloadPath` (passed to `AddItem()`) |
Tag discovery is **dynamic** — driven by the Galaxy repository DB (`gobject`, `dynamic_attribute`, `primitive_instance`, `template_definition`). Optional `Scope=LocalPlatform` filters the hierarchy via the `hosted_by_gobject_id` chain to the subtree rooted at the local $WinPlatform (on a dev Galaxy: 49→3 objects, 4206→386 attributes).
### Data Type Mapping (`MxDataTypeMapper.cs`, `gr/data_type_mapping.md`)
| mx_data_type | Galaxy Type | OPC UA BuiltInType | CLR Type |
|--------------|-------------|--------------------|----------|
| 1 | Boolean | Boolean (i=1) | `bool` |
| 2 | Integer | Int32 (i=6) | `int` |
| 3 | Float | Float (i=10) | `float` |
| 4 | Double | Double (i=11) | `double` |
| 5 | String | String (i=12) | `string` |
| 6 | Time | DateTime (i=13) | `DateTime` |
| 7 | ElapsedTime | Double (i=11) | `double` (seconds) |
| 8 | Reference | String (i=12) | `string` |
| 13 | Enumeration | Int32 (i=6) | `int` |
| 14 / 16 | Custom | String (i=12) | `string` |
| 15 | InternationalizedString | LocalizedText (i=21) | `string` |
| (default) | Unknown | String (i=12) | `string` |
**Arrays**: `is_array=0` → ValueRank `-1` (Scalar); `is_array=1` → ValueRank `1` (OneDimension), ArrayDimensions = `[array_dimension]`.
### Security Classification Mapping (`SecurityClassificationMapper.cs`)
| security_classification | Galaxy Level | OPC UA Write Permission |
|-------------------------|--------------|-------------------------|
| 0 | FreeAccess | `WriteOperate` |
| 1 | Operate | `WriteOperate` |
| 2 | SecuredWrite | — (read-only in v1) |
| 3 | VerifiedWrite | — (read-only in v1) |
| 4 | Tune | `WriteTune` |
| 5 | Configure | `WriteConfigure` |
| 6 | ViewOnly | — (read-only) |
Maps to the OPC UA roles `ReadOnly` / `WriteOperate` / `WriteTune` / `WriteConfigure` defined in the LDAP role provider (see `docs/security.md`).
### Subscription Model — Native MXAccess Advisories
**Galaxy is one of three drivers with native subscriptions (Galaxy, TwinCAT, OPC UA Client).** No polling.
- Mechanism: `LMXProxyServer.AddItem()``AdviseSupervisory(handle, itemHandle)`; callbacks delivered through the `ILMXProxyServerEvents.OnDataChange` COM event
- Callback signature: `MxDataChangeHandler(itemHandle, MXSTATUS_PROXY, value, quality, timestamp)`
- Dispatch: STA COM event → dispatch-thread queue → OPC UA `ClearChangeMasks` fan-out (decouples COM thread from UA stack lock — commit c76ab8f)
- **Stored subscriptions** replayed on reconnect via `ReplayStoredSubscriptionsAsync()`
- **Probe tag** + runtime-status probes provide connection-health visibility (see §14)
- **Bad-quality fan-out**: when a host ($WinPlatform or $AppEngine) ScanState transitions to Stopped, every attribute under that host is immediately published as `BadOutOfService` (commits 7310925, c76ab8f)
### Alarm Model
In-process alarm-condition tracking (v1 baseline; extended in v2 to match `IAlarmSource`):
- **Auto-subscribed attributes per alarm-eligible object**: `InAlarm`, `Priority`, `Description` (cached for severity and message)
- **Filtering**: `AlarmFilterConfiguration.ObjectFilters[]` — include/exclude by template chain (empty = all eligible)
- **Transitions**: `InAlarm` change → OPC UA A&C `AlarmConditionState` event (Active / Return to Normal)
- **Severity**: Galaxy `Priority` (1 = highest) mapped to OPC UA 11000 severity (higher = more severe)
- **Acknowledgment**: local OPC UA ack forwards to MXAccess write on the `Ack` attribute of the alarm-bearing object
### History Model — Wonderware Historian (optional plugin)
- Loaded **at runtime** from `ZB.MOM.WW.LmxOpcUa.Historian.Aveva.dll` when `Historian.Enabled=true`; compile-time optional
- SDK: `aahClientManaged` / `aahClientCommon`
- Supported OPC UA HDA calls:
- `HistoryReadRawModified` (raw values with bounds)
- `HistoryReadProcessed` (Historian aggregates: AVG, MIN, MAX, TIMEAVG, etc. — mapped to OPC UA aggregates)
- Continuation points for paged reads
- Only attributes flagged `historize=1` in the Galaxy DB expose `AccessLevel.HistoryRead`
### Error Mapping — MXAccess → Quality → OPC UA StatusCode
**Byte quality (OPC DA convention)**`QualityMapper.cs`:
| OPC DA Quality | Category |
|----------------|----------|
| `>= 192` | Good |
| `64191` | Uncertain |
| `< 64` | Bad |
**MXAccess error codes → Quality** (`MxErrorCodes.cs`):
| Code | Name | Quality |
|------|------|---------|
| 1008 | `MX_E_InvalidReference` | `BadConfigError` |
| 1012 | `MX_E_WrongDataType` | `BadConfigError` |
| 1013 | `MX_E_NotWritable` | `BadOutOfService` |
| 1014 | `MX_E_RequestTimedOut` | `BadCommFailure` |
| 1015 | `MX_E_CommFailure` | `BadCommFailure` |
| 1016 | `MX_E_NotConnected` | `BadNotConnected` |
**Quality → OPC UA StatusCode** (`QualityMapper.cs`):
| Quality | StatusCode |
|---------|-----------|
| Good | `0x00000000` |
| GoodLocalOverride | `0x00D80000` |
| Uncertain | `0x40000000` |
| Bad (generic) | `0x80000000` |
| BadCommFailure | `0x80050000` |
| BadNotConnected | `0x808A0000` |
| BadOutOfService | `0x808D0000` |
### Change Detection
- `ChangeDetectionService` polls `galaxy.time_of_last_deploy` at `ChangeDetectionIntervalSeconds` (default 30s)
- On timestamp change, `OnGalaxyChanged` fires → Host re-queries hierarchy/attributes → emits `TagSetChanged` over IPC → Proxy implements `IRediscoverable` and rebuilds the affected subtree in the address space
- Platform-scope filter (commit bc282b6) applied during hierarchy load when `Scope=LocalPlatform`
### IPC Contract (Proxy ↔ Host) — `Galaxy.Shared`
.NET Standard 2.0 MessagePack contracts. Every request carries a correlation ID; responses carry the same ID plus success/error.
**Lifecycle / handshake**:
| Message | Direction | Payload |
|---------|-----------|---------|
| `ClientHello` | Proxy → Host | InstanceId, expected protocol version |
| `HostReady` | Host → Proxy | Host version, Galaxy name, capabilities |
| `Shutdown` | Proxy → Host | Graceful stop |
**Tag discovery** (`ITagDiscovery`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `DiscoverHierarchyRequest` | Proxy → Host | `Scope`, `PlatformName` |
| `DiscoverHierarchyResponse` | Host → Proxy | `GalaxyObjectInfo[]` (TagName, ContainedName, ParentTagName, TemplateChain, category) |
| `DiscoverAttributesRequest` | Proxy → Host | `TagName[]` |
| `DiscoverAttributesResponse` | Host → Proxy | `GalaxyAttributeInfo[]` (Name, MxDataType, IsArray, ArrayDim, SecurityClass, Historized, WriteableRuntimeChecked) |
| `TagSetChangedNotification` | Host → Proxy | New deploy timestamp; triggers re-discover |
**Read / Write** (`IReadable`, `IWritable`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `ReadRequest` | Proxy → Host | `TagRef[]` (tag_name + attribute) |
| `ReadResponse` | Host → Proxy | `VtqPayload[]` (value, quality, timestamp, statusCode) |
| `WriteRequest` | Proxy → Host | `(TagRef, Value, ExpectedDataType)[]` |
| `WriteResponse` | Host → Proxy | `(TagRef, StatusCode)[]` |
**Subscription** (`ISubscribable`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `SubscribeRequest` | Proxy → Host | `TagRef[]` + Proxy-generated subscription ID |
| `SubscribeResponse` | Host → Proxy | Per-tag subscribe ack + handle |
| `UnsubscribeRequest` | Proxy → Host | handles |
| `DataChangeNotification` | Host → Proxy (push) | handle, VTQ, sequence number |
| `ProbeHealthNotification` | Host → Proxy (push) | probe tag staleness, `ScanState` transitions, overall connected/disconnected |
**Alarms** (`IAlarmSource`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `AlarmEventNotification` | Host → Proxy (push) | source tag, InAlarm, Priority, Description, severity, transition type |
| `AlarmAckRequest` | Proxy → Host | source tag, user, comment |
**History** (`IHistoryProvider`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `HistoryReadRawRequest` | Proxy → Host | TagRef, start, end, numValues, returnBounds, continuationPoint |
| `HistoryReadRawResponse` | Host → Proxy | values + next continuation point |
| `HistoryReadProcessedRequest` | Proxy → Host | TagRef, aggregateId, start, end, resampleInterval |
| `HistoryReadProcessedResponse` | Host → Proxy | aggregated values |
**Framing**: length-prefixed MessagePack frames over a single `NamedPipeServerStream` in `PipeTransmissionMode.Byte`. Separate outgoing pipe for push notifications or multiplex via message type tag.
### Threading / COM Constraints
- **STA thread** (`StaComThread.cs`) hosts MXAccess: `ApartmentState.STA`, raw Win32 `GetMessage` / `DispatchMessage` loop
- Work items marshaled in via `PostThreadMessage(WM_APP=0x8000)`
- **Per-handle serialization**: LMXProxyServer is not thread-safe — all Read/Write/Subscribe calls on one handle run serially via the STA queue
- **Dispatch thread** (separate from STA thread) drains `_pendingDataChanges` to the OPC UA framework; decouples the STA pump from UA stack locks so a slow subscriber can't back up COM event delivery
- **Reentrancy guards** — event unwiring must precede `Marshal.ReleaseComObject()` on disconnect
### Runtime Status (recent commits bc282b6 / 4b209f6 / 7310925 / c76ab8f / 0003984)
- `GalaxyRuntimeProbeManager` auto-subscribes `<ObjectName>.ScanState` for every $WinPlatform (category 1) and $AppEngine (category 3) in scope
- Per-host state machine: `Unknown → Running | Stopped`; transitions fire `_onHostStopped` / `_onHostRunning` callbacks on the dispatch thread
- **Synthetic OPC UA nodes** expose `ScanState` per host as read-only variables so clients see runtime topology without the dashboard
- **HealthCheck Rule 2e** monitors probe subscription health; a failed probe can no longer leave phantom entries that fan out false `BadOutOfService`
- Generalizes to the driver-agnostic `IHostConnectivityProbe` capability interface in v2 (see `plan.md` §5a)
### Implementation Notes
- **First Tier C out-of-process driver** — uses the `Galaxy.Proxy` / `Galaxy.Host` / `Galaxy.Shared` three-project split. The pattern is reusable; FOCAS is the second adopter (see §7), and any future driver with bitness, licensing, or stability-isolation needs reuses the same template. See `driver-stability.md` for the generalized contract
- `Galaxy.Proxy` (in the main server) implements `IDriver`, `ITagDiscovery`, `IRediscoverable`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IHostConnectivityProbe`
- `Galaxy.Host` owns `MxAccessBridge`, `GalaxyRepository`, alarm tracking, `GalaxyRuntimeProbeManager`, and the Historian plugin — no reference to `Core.Abstractions`
- `Galaxy.Shared` is .NET Standard 2.0, referenced by both sides
- Existing v1 code is the implementation — **refactor in place** (extract capability interfaces first, then move behind IPC — see `plan.md` Decision #55)
- **Parity gate**: v2 driver must pass v1 `IntegrationTests` suite + scripted Client.CLI walkthrough before Phase 3 begins
### Operational Stability Notes
Galaxy has a Tier C deep dive in `driver-stability.md` covering the STA pump, COM object lifetime, subscription replay, recycle policy, and post-mortem contents. Driver-instance specifics:
- **Memory baseline scales with Galaxy size**. Watchdog floor of 200 MB above baseline + 1.5 GB hard ceiling — higher than FOCAS because legitimate Galaxy footprints are larger.
- **Slope tolerance is 5 MB/min** (more permissive than FOCAS) because address-space rebuild on redeploy can transiently allocate large amounts.
- **Known regression-prone failure modes** (closed in commits `c76ab8f` and `7310925`, must remain closed): phantom probe subscription flipping Tick() to Stopped; cross-host quality clear wiping sibling state during recovery; sync-over-async on the OPC UA stack thread; fire-and-forget alarm tasks racing shutdown. Each should have a regression test in the v2 parity suite.
- **STA pump health probe** every 10 s (separate from the proxy↔host heartbeat). A wedged pump is the most likely Tier C failure mode for Galaxy.
- **Recycle preserves cached `time_of_last_deploy` watermark** — the common case (crash unrelated to redeploy) skips full DB rediscovery for faster recovery.
### Namespace Assignment
Galaxy is the canonical **SystemPlatform-kind namespace** driver. It exposes Aveva System Platform / Galaxy objects as OPC UA — these are *processed* values with business meaning attached at Layer 3, not raw equipment signals. Per `plan.md` §4:
- The Galaxy driver's `DriverInstance.NamespaceId` must reference a `Namespace` row with `Kind = 'SystemPlatform'`.
- **UNS naming rules do NOT apply** to the Galaxy hierarchy. Tags belong to `DriverInstanceId + FolderPath` (v1 LmxOpcUa pattern preserved); `Tag.EquipmentId` is NULL.
- The Galaxy hierarchy reflects the gobject parent chain as v1 has always done — no migration to UNS path conventions in v2.
- If a future need arises to expose raw Galaxy gobject data alongside processed (e.g. an Aveva-Wonderware Historian raw signal feed), that becomes a *separate* driver instance assigned to an Equipment-kind namespace, with its own per-equipment mapping.
Galaxy (MXAccess) is a **Tier-A in-process driver** that runs in the OtOpcUa server's .NET 10 AnyCPU process and speaks gRPC to a separately installed `mxaccessgw` (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`). The gateway owns the MXAccess COM apartment, the STA pump, and the Galaxy Repository / Historian SDK on its own host; the driver itself is platform-agnostic and carries no COM or x86 bitness constraint. Project lives at `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`.
### Capability Surface
`GalaxyDriver` (in `GalaxyDriver.cs`) implements `IDriver`, `IDisposable`, plus six driver capabilities — eight interfaces total.
| Capability | Source files |
|------------|--------------|
| `ITagDiscovery` | `Browse/GalaxyDiscoverer.cs`, `Browse/GatewayGalaxyHierarchySource.cs`, `Browse/DataTypeMap.cs`, `Browse/SecurityMap.cs`, `Browse/AlarmRefBuilder.cs` |
| `IRediscoverable` | `Browse/DeployWatcher.cs`, `Browse/GatewayGalaxyDeployWatchSource.cs` |
| `IReadable` | `Runtime/GalaxyMxSession.cs`, `Runtime/MxValueDecoder.cs`, `Runtime/StatusCodeMap.cs` |
| `IWritable` | `Runtime/GatewayGalaxyDataWriter.cs` (+ `TracedGalaxyDataWriter.cs`), `Runtime/MxValueEncoder.cs` |
| `ISubscribable` | `Runtime/GatewayGalaxySubscriber.cs` (+ `TracedGalaxySubscriber.cs`), `Runtime/EventPump.cs`, `Runtime/SubscriptionRegistry.cs`, `Runtime/ReconnectSupervisor.cs` |
| `IHostConnectivityProbe` | `Health/HostStatusAggregator.cs`, `Health/HostConnectivityForwarder.cs`, `Health/PerPlatformProbeWatcher.cs` |
History reads + alarm condition tracking now live in the server-layer `IHistoryRouter` and `AlarmConditionService` (PR 7.2). Galaxy no longer carries `IHistoryProvider` or `IAlarmSource` of its own.
### DriverConfig JSON shape
Per `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Config/GalaxyDriverOptions.cs`:
```jsonc
{
"Gateway": {
"Endpoint": "http://localhost:5120",
"ApiKeySecretRef": "secret:galaxy-gw-api-key",
"UseTls": true,
"CaCertificatePath": null,
"ConnectTimeoutSeconds": 10,
"DefaultCallTimeoutSeconds": 30,
"StreamTimeoutSeconds": 0
},
"MxAccess": {
"ClientName": "OtOpcUa",
"PublishingIntervalMs": 1000,
"WriteUserId": 0,
"EventPumpChannelCapacity": 50000
},
"Repository": {
"DiscoverPageSize": 5000,
"WatchDeployEvents": true
},
"Reconnect": {
"InitialBackoffMs": 500,
"MaxBackoffMs": 30000,
"ReplayOnSessionLost": true
}
}
```
`Gateway.ApiKeySecretRef` resolves through the server-side secret store (DPAPI in production, env override in dev) — the API key never appears in cleartext config. `MxAccess.ClientName` MUST be unique per OtOpcUa instance; redundancy pairs enforce uniqueness at install time. `StreamTimeoutSeconds = 0` keeps the `StreamEvents` RPC alive for the lifetime of the driver.
### Performance, tracing, soak
See [Galaxy.Performance.md](Galaxy.Performance.md) for the OpenTelemetry trace map, the per-RPC metric set (`galaxy.events.dropped`, channel headroom, reconnect backoff distribution), and the soak-run profile.
### Parity rig + gateway setup
See [Galaxy.ParityRig.md](Galaxy.ParityRig.md) and the `mxaccessgw` repo for the gateway worker layout and the dev-rig recipe.
---

View File

@@ -1,3 +1,14 @@
> **✅ Completed 2026-04-30 — historical record of Phase 2 (Galaxy out-of-process split).**
>
> Phase 2 produced the `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared`
> three-project split as a stepping stone toward the eventual mxaccessgw
> architecture. Those projects shipped, served their purpose for
> roughly a year, then retired in PR 7.2 alongside the
> `OtOpcUaGalaxyHost` Windows service. This file is preserved as the
> phase-exit evidence; do not treat it as live architecture
> documentation. See `docs/drivers/Galaxy.md` for the current
> in-process driver.
# Phase 2 — Galaxy Out-of-Process Refactor (Tier C)
> **Status**: DRAFT — implementation plan for Phase 2 of the v2 build (`plan.md` §6, `driver-stability.md` §"Galaxy — Deep Dive").

View File

@@ -1,3 +1,11 @@
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw backend-options decision.**
>
> This document evaluated alternative backend topologies before the
> v2-mxgw migration. **Option 1 (in-process driver + gRPC gateway) was
> selected and implemented**; see `lmx_mxgw.md` for the design and
> `lmx_mxgw_impl.md` for the implementation plan. Both shipped at
> commit `ae7106d` (2026-04-30). Preserved here as the audit trail.
# Galaxy / LMX Backend — Restructuring Options
## Context

View File

@@ -1,3 +1,13 @@
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw migration design.**
>
> This document is the design doc that drove the migration from the
> legacy out-of-process Galaxy.Host topology to the in-process
> GalaxyDriver + mxaccessgw architecture. Option 1 (the in-process
> driver path) was selected and implemented across 39 PRs spanning
> phases 07, merged to master at commit `ae7106d`. For current
> architecture see `CLAUDE.md`, `docs/drivers/Galaxy.md`, and
> `docs/v2/Galaxy.Performance.md`.
# Galaxy → MxAccessGateway Migration Plan
Implements **Option 1** from `lmx_backend.md`: replace the bespoke `Galaxy.Host`
@@ -360,7 +370,7 @@ percentile, equal or better throughput in `SubscribeBulk` setup time.
call timeouts based on soak results.
**Exit:** production-acceptable perf numbers documented in
`docs/Galaxy.Driver.md`.
`docs/drivers/Galaxy.md`.
### Phase 7 — retirement
@@ -443,7 +453,7 @@ sample config.
## Cross-cutting deliverables
- **Docs:** `docs/v2/Galaxy.Driver.md` (new), updates to
- **Docs:** `docs/drivers/Galaxy.md` (new), updates to
`docs/v2/dev-environment.md`, `docs/ServiceHosting.md`,
`docs/Redundancy.md`, `CLAUDE.md`.
- **Install scripts:** `scripts/install/Install-Services.ps1` removes

View File

@@ -1,3 +1,13 @@
> **✅ Completed 2026-04-30 — historical record of the v2-mxgw implementation plan.**
>
> All 39 PRs across 7 phases (1.11.3 + 2.12.3 + 1+2.W + 3.13.W +
> 4.04.W + 5.15.W + 6.16.W + 7.17.3) shipped and merged to master
> at commit `ae7106d`. Per-phase status tracking below is preserved as
> the historical PR-execution log; phase descriptions are
> retrospective, not pending. Parity matrix verified green on the dev
> rig 2026-04-30 (14 passed / 1 skipped / 0 failed —
> see `docs/v2/Galaxy.ParityMatrix.md`).
# Galaxy → MxGateway Migration — Detailed Implementation Plan
Companion to `lmx_mxgw.md` (design plan). This document breaks the plan into