From 006af517689f79391b9c4ec524091ac75d0cc4aa Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 30 Apr 2026 08:59:59 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20post-PR-7.2=20cleanup=20=E2=80=94=20aud?= =?UTF-8?q?it=20+=20three-track=20scrub?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audit (three parallel agent passes) found 43 markdown files carrying stale references to the deleted Galaxy.Host/Proxy/Shared projects after the v2-mxgw merge. This commit lands the prioritized fixes. Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted) - README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install text; leads with the multi-driver .NET 10 server identity and points at scripts/install/Install-Services.ps1 and the parity rig. - docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the Tier-C out-of-process spec with a Tier-A in-process description matching the current GalaxyDriver code, with the four-section GalaxyDriverOptions JSON shape pulled verbatim from Config/GalaxyDriverOptions.cs. - docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the current Browse/Runtime/Health/Config sub-folders. Track 2 — historical banners (5 files) - lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md, docs/v2/Galaxy.ParityMatrix.md, docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a "✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md also fixes two dead links (`docs/Galaxy.Driver.md` and `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`. Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs) - Moved 10 v1 docs under docs/v1/ preserving subpath structure: AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess, Subscriptions (top-level); drivers/Galaxy-Repository, drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs, reqs/MxAccessClientReqs, reqs/ServiceHostReqs. - New docs/v1/README.md is the shared archive banner + per-file table. - docs/README.md repointed to the v1 paths and updated to reflect the v2 two-process deploy shape (Server + Admin + optional OtOpcUaWonderwareHistorian). - docs/v2/Galaxy.ParityRig.md got a historical banner + four inline scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2. The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now describes only the post-PR-7.2 architecture. v1 docs are preserved as a labelled archive under docs/v1/. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 238 ++++-------- docs/README.md | 31 +- docs/drivers/Galaxy.md | 251 ++++--------- docs/{ => v1}/AlarmTracking.md | 0 docs/{ => v1}/Configuration.md | 0 docs/{ => v1}/DataTypeMapping.md | 0 docs/{ => v1}/HistoricalDataAccess.md | 0 docs/v1/README.md | 30 ++ docs/{ => v1}/Subscriptions.md | 0 docs/{ => v1}/drivers/Galaxy-Repository.md | 0 docs/{ => v1}/drivers/Galaxy-Test-Fixture.md | 0 docs/{ => v1}/reqs/GalaxyRepositoryReqs.md | 0 docs/{ => v1}/reqs/MxAccessClientReqs.md | 0 docs/{ => v1}/reqs/ServiceHostReqs.md | 0 docs/v2/Galaxy.ParityMatrix.md | 12 + docs/v2/Galaxy.ParityRig.md | 46 ++- docs/v2/driver-specs.md | 342 +++--------------- .../phase-2-galaxy-out-of-process.md | 11 + lmx_backend.md | 8 + lmx_mxgw.md | 14 +- lmx_mxgw_impl.md | 10 + 21 files changed, 322 insertions(+), 671 deletions(-) rename docs/{ => v1}/AlarmTracking.md (100%) rename docs/{ => v1}/Configuration.md (100%) rename docs/{ => v1}/DataTypeMapping.md (100%) rename docs/{ => v1}/HistoricalDataAccess.md (100%) create mode 100644 docs/v1/README.md rename docs/{ => v1}/Subscriptions.md (100%) rename docs/{ => v1}/drivers/Galaxy-Repository.md (100%) rename docs/{ => v1}/drivers/Galaxy-Test-Fixture.md (100%) rename docs/{ => v1}/reqs/GalaxyRepositoryReqs.md (100%) rename docs/{ => v1}/reqs/MxAccessClientReqs.md (100%) rename docs/{ => v1}/reqs/ServiceHostReqs.md (100%) diff --git a/README.md b/README.md index e4f6955..dc85c91 100644 --- a/README.md +++ b/README.md @@ -1,200 +1,90 @@ -# LmxOpcUa +# OtOpcUa -OPC UA server and cross-platform client tools for AVEVA System Platform (Wonderware) Galaxy. The server exposes Galaxy tags via MXAccess as an OPC UA address space. The client stack provides a shared library, CLI tool, and Avalonia desktop application for browsing, reading/writing, subscriptions, alarms, and historical data. +OPC UA server (.NET 10 AnyCPU) that exposes a fleet of industrial drivers as a single OPC UA address space. Drivers ship in-process for AVEVA System Platform Galaxy (via the sibling `mxaccessgw` repo), Modbus TCP, Siemens S7, Allen-Bradley CIP (ControlLogix / CompactLogix), Allen-Bradley Legacy (SLC 500 / MicroLogix), Beckhoff TwinCAT (ADS), FANUC FOCAS, and OPC UA Client (gateway). + +A cross-platform client stack (.NET 10) — shared library, CLI, and Avalonia desktop app — connects to any OPC UA server. ## Architecture ``` - OPC UA Clients - (CLI, Desktop UI, 3rd-party) - | - v -+-----------------+ +------------------+ +-----------------+ -| Galaxy Repo DB |---->| OPC UA Server |<--->| MXAccess Client | -| (SQL Server) | | (address space) | | (STA + COM) | -+-----------------+ +------------------+ +-----------------+ - | | - +-------+--------+ +---------+---------+ - | Status Dashboard| | Historian Runtime | - | (HTTP/JSON) | | (SQL Server) | - +----------------+ +-------------------+ + OPC UA Clients (CLI, Desktop UI, 3rd-party) + | + v + +-------------------------------------+ + | OtOpcUa.Server (.NET 10 AnyCPU) | + | address space + capability fan-out| + +-------------------------------------+ + | | | | | | | | + Galaxy Modbus S7 AbCip AbLeg TwinCAT FOCAS OpcUaClient + | + v + mxaccessgw (sibling repo, gRPC) + | + v + MXAccess COM (x86 worker, on AVEVA box) ``` -## Contained Name vs Tag Name +Galaxy is the only driver with an external runtime: it speaks gRPC to a separately installed `mxaccessgw` server (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`) which owns the MXAccess COM apartment and the x86/STA bitness constraint server-side. Everything in this repo is platform-agnostic .NET 10. -| Browse Path (contained names) | Runtime Reference (tag name) | -|-------------------------------|------------------------------| -| `TestMachine_001/DelmiaReceiver/DownloadPath` | `DelmiaReceiver_001.DownloadPath` | -| `TestMachine_001/MESReceiver/MoveInBatchID` | `MESReceiver_001.MoveInBatchID` | +## Prerequisites ---- +- .NET 10 SDK (server, drivers, clients all target .NET 10) +- SQL Server reachable for the central config DB +- For Galaxy specifically: a running `mxaccessgw` deployment — see [docs/v2/Galaxy.ParityRig.md](docs/v2/Galaxy.ParityRig.md) +- For Wonderware Historian read-back: optional `OtOpcUaWonderwareHistorian` sidecar — see [docs/ServiceHosting.md](docs/ServiceHosting.md) -## Server - -The OPC UA server runs on .NET Framework 4.8 (x86) and bridges the Galaxy runtime to OPC UA clients. - -### Server Prerequisites - -- .NET Framework 4.8 SDK -- AVEVA System Platform with ArchestrA Framework installed -- Galaxy repository database (SQL Server, Windows Auth) -- MXAccess COM registered (`LMXProxy.LMXProxyServer`) -- Wonderware Historian (optional, for historical data access) -- Windows (required for COM interop and MXAccess) - -### Build and Run Server +## Quick Start ```bash -dotnet restore ZB.MOM.WW.LmxOpcUa.slnx -dotnet build src/ZB.MOM.WW.LmxOpcUa.Host -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Host +dotnet restore ZB.MOM.WW.OtOpcUa.slnx +dotnet build ZB.MOM.WW.OtOpcUa.slnx +dotnet test ZB.MOM.WW.OtOpcUa.slnx + +# Run the server in dev (foreground) +dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server ``` -The server starts on `opc.tcp://localhost:4840/LmxOpcUa` with the `None` security profile by default. Configure `Security.Profiles` in `appsettings.json` to enable `Basic256Sha256-Sign` or `Basic256Sha256-SignAndEncrypt` for transport security. See [Security Guide](docs/security.md). +The server starts on `opc.tcp://localhost:4840` with the `None` security profile. Configure `Security.Profiles` in `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` to enable `Basic256Sha256-Sign` or `Basic256Sha256-SignAndEncrypt`. See [docs/security.md](docs/security.md). -### Install as Windows Service +## Install as Windows Services + +Production deployment is driven by `scripts/install/Install-Services.ps1`, which registers the `OtOpcUa` server service (and optionally the `OtOpcUaWonderwareHistorian` sidecar) under a chosen service account. Galaxy support requires a separately installed `mxaccessgw` — neither this repo nor the install script provisions it. + +```powershell +.\scripts\install\Install-Services.ps1 ` + -InstallRoot 'C:\Program Files\OtOpcUa' ` + -ServiceAccount 'DOMAIN\svc-otopcua' +``` + +Add `-InstallWonderwareHistorian` for the historian sidecar. See the script header and [docs/ServiceHosting.md](docs/ServiceHosting.md) for full options. + +## Client CLI ```bash -cd src/ZB.MOM.WW.LmxOpcUa.Host/bin/Debug/net48 -ZB.MOM.WW.LmxOpcUa.Host.exe install -ZB.MOM.WW.LmxOpcUa.Host.exe start +dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840 +dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840 -r -d 3 +dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" +dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- write -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" -v 42 +dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" -i 500 ``` -**Service logon requirement:** The service must run under a Windows account that has access to the AVEVA Galaxy and Historian. The default `LocalSystem` account can connect to MXAccess and SQL Server but **cannot authenticate with the Historian SDK** (HCAP). Configure the service to "Log on as" a domain or local user that is a recognized ArchestrA platform user. This can be set in `services.msc` or during install with `ZB.MOM.WW.LmxOpcUa.Host.exe install -username DOMAIN\user -password ***`. - -### Run Server Tests - -```bash -dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests -dotnet test tests/ZB.MOM.WW.LmxOpcUa.IntegrationTests -``` - ---- - -## Client Stack - -The client stack is cross-platform (.NET 10) and consists of three projects sharing a common `IOpcUaClientService` abstraction. No AVEVA software or COM is required — the clients connect to any OPC UA server. - -### Client Prerequisites - -- .NET 10 SDK -- No platform-specific dependencies (runs on Windows, macOS, Linux) - -### Build All Clients - -```bash -dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.Shared -dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.CLI -dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.UI -``` - -### Run Client Tests - -```bash -dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.Shared.Tests -dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.CLI.Tests -dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.UI.Tests -``` - -### Client CLI - -```bash -# Connect -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840/LmxOpcUa - -# Browse Galaxy hierarchy -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=ZB" -r -d 5 - -# Read a tag -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001.MachineID" - -# Write a tag -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- write -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestChildObject.TestString" -v "Hello" - -# Subscribe to changes -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- subscribe -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestChildObject.TestInt" -i 500 - -# Read historical data -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- historyread -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001.TestHistoryValue" --start "2026-03-25" --end "2026-03-30" - -# Subscribe to alarm events -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- alarms -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001" --refresh - -# Query redundancy state -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- redundancy -u opc.tcp://localhost:4840/LmxOpcUa -``` - -### Client UI - -```bash -dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.UI -``` - -The desktop application provides browse tree, subscriptions, alarm monitoring, history reads, and write dialogs. See [Client UI Documentation](docs/Client.UI.md) for details. - ---- - -## Project Structure - -``` -src/ - ZB.MOM.WW.LmxOpcUa.Host/ OPC UA server (.NET Framework 4.8, x86) - Configuration/ Config binding and validation - Domain/ Interfaces, DTOs, enums, mappers - Historian/ Wonderware Historian data source - Metrics/ Performance tracking (rolling P95) - MxAccess/ STA thread, COM interop, subscriptions - GalaxyRepository/ SQL queries, change detection - OpcUa/ Server, node manager, address space, alarms, diff - Status/ HTTP dashboard, health checks - - ZB.MOM.WW.LmxOpcUa.Client.Shared/ Shared OPC UA client library (.NET 10) - ZB.MOM.WW.LmxOpcUa.Client.CLI/ Command-line client (.NET 10) - ZB.MOM.WW.LmxOpcUa.Client.UI/ Avalonia desktop client (.NET 10) - -tests/ - ZB.MOM.WW.LmxOpcUa.Tests/ Server unit + integration tests - ZB.MOM.WW.LmxOpcUa.IntegrationTests/ Server integration tests (live DB) - ZB.MOM.WW.LmxOpcUa.Client.Shared.Tests/ Shared library tests - ZB.MOM.WW.LmxOpcUa.Client.CLI.Tests/ CLI command tests - ZB.MOM.WW.LmxOpcUa.Client.UI.Tests/ UI ViewModel + headless tests - -gr/ Galaxy repository docs, SQL queries, schema -``` +See [docs/Client.CLI.md](docs/Client.CLI.md) and [docs/Client.UI.md](docs/Client.UI.md). ## Documentation -### Server - -| Component | Description | +| Topic | Doc | |---|---| -| [OPC UA Server](docs/OpcUaServer.md) | Endpoint, sessions, security policy, server lifecycle | -| [Address Space](docs/AddressSpace.md) | Hierarchy nodes, variable nodes, primitive grouping, NodeId scheme | -| [Galaxy Repository](docs/GalaxyRepository.md) | SQL queries, deployed package chain, change detection | -| [MXAccess Bridge](docs/MxAccessBridge.md) | STA thread, COM interop, subscriptions, reconnection | -| [Data Type Mapping](docs/DataTypeMapping.md) | Galaxy to OPC UA types, arrays, security classification | -| [Read/Write Operations](docs/ReadWriteOperations.md) | Value reads, writes, access level enforcement, array element writes | -| [Subscriptions](docs/Subscriptions.md) | Ref-counted MXAccess subscriptions, data change dispatch | -| [Alarm Tracking](docs/AlarmTracking.md) | AlarmConditionState nodes, InAlarm monitoring, event reporting | -| [Historical Data Access](docs/HistoricalDataAccess.md) | Historian data source, HistoryReadRaw, HistoryReadProcessed | -| [Incremental Sync](docs/IncrementalSync.md) | Diff computation, subtree teardown/rebuild, subscription preservation | -| [Configuration](docs/Configuration.md) | appsettings.json binding, feature flags, validation | -| [Status Dashboard](docs/StatusDashboard.md) | HTTP server, health checks, metrics reporting | -| [Service Hosting](docs/ServiceHosting.md) | TopShelf, startup/shutdown sequence, error handling | -| [Security](docs/security.md) | Transport security profiles, certificate trust, production hardening | -| [Redundancy](docs/Redundancy.md) | Non-transparent warm/hot redundancy, ServiceLevel, paired deployment | - -### Client - -| Component | Description | -|---|---| -| [Client CLI](docs/Client.CLI.md) | Connect, browse, read, write, subscribe, historyread, alarms, redundancy commands | -| [Client UI](docs/Client.UI.md) | Avalonia desktop client: browse, subscribe, alarms, history, write values | - -### Reference - -- [Galaxy Repository Queries](gr/CLAUDE.md) — SQL queries for hierarchy, attributes, and change detection -- [Data Type Mapping](gr/data_type_mapping.md) — Galaxy to OPC UA type mapping with security classification +| Driver specs (per-driver capability surface, config, addressing) | [docs/v2/driver-specs.md](docs/v2/driver-specs.md) | +| Galaxy driver | [docs/drivers/Galaxy.md](docs/drivers/Galaxy.md) | +| Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient | [docs/drivers/](docs/drivers/) | +| Galaxy parity rig (mxaccessgw setup) | [docs/v2/Galaxy.ParityRig.md](docs/v2/Galaxy.ParityRig.md) | +| Galaxy performance + tracing | [docs/v2/Galaxy.Performance.md](docs/v2/Galaxy.Performance.md) | +| Service hosting | [docs/ServiceHosting.md](docs/ServiceHosting.md) | +| Security (transport, LDAP, certificates) | [docs/security.md](docs/security.md) | +| Redundancy | [docs/Redundancy.md](docs/Redundancy.md) | +| Address space | [docs/AddressSpace.md](docs/AddressSpace.md) | +| Configuration | [docs/Configuration.md](docs/Configuration.md) | +| Status dashboard | [docs/StatusDashboard.md](docs/StatusDashboard.md) | ## License diff --git a/docs/README.md b/docs/README.md index 7bf09fd..d53d534 100644 --- a/docs/README.md +++ b/docs/README.md @@ -11,9 +11,8 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess - **Core** owns the OPC UA stack, address space, session/security/subscription machinery. - **Drivers** plug in via capability interfaces in `ZB.MOM.WW.OtOpcUa.Core.Abstractions`: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`, `IPerCallHostResolver`. Each driver opts into whichever it supports. -- **Server** is the OPC UA endpoint process (net10, x64). Hosts every driver except Galaxy in-process; talks to Galaxy via a named pipe because MXAccess COM is 32-bit-only. +- **Server** is the OPC UA endpoint process (net10, AnyCPU). Hosts every driver in-process. The Galaxy driver reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo); it is no longer hosted from this repo. - **Admin** is the Blazor Server operator UI (net10, x64). Owns the Config DB draft/publish flow, ACL + role-grant authoring, fleet status + `/metrics` scrape endpoint. -- **Galaxy.Host** is a .NET Framework 4.8 x86 Windows service that wraps MXAccess COM on an STA thread for the Galaxy driver. ## Where to find what @@ -24,11 +23,11 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess | [OpcUaServer.md](OpcUaServer.md) | Top-level server architecture — Core, driver dispatch, Config DB, generations | | [AddressSpace.md](AddressSpace.md) | `GenericDriverNodeManager` + `ITagDiscovery` + `IAddressSpaceBuilder` | | [ReadWriteOperations.md](ReadWriteOperations.md) | OPC UA Read/Write → `CapabilityInvoker` → `IReadable`/`IWritable` | -| [Subscriptions.md](Subscriptions.md) | Monitored items → `ISubscribable` + per-driver subscription refcount | -| [AlarmTracking.md](AlarmTracking.md) | `IAlarmSource` + `AlarmSurfaceInvoker` + OPC UA alarm conditions | -| [DataTypeMapping.md](DataTypeMapping.md) | Per-driver `DriverAttributeInfo` → OPC UA variable types | +| [Subscriptions.md](v1/Subscriptions.md) | Monitored items → `ISubscribable` + per-driver subscription refcount (v1 archive) | +| [AlarmTracking.md](v1/AlarmTracking.md) | `IAlarmSource` + `AlarmSurfaceInvoker` + OPC UA alarm conditions (v1 archive) | +| [DataTypeMapping.md](v1/DataTypeMapping.md) | Per-driver `DriverAttributeInfo` → OPC UA variable types (v1 archive — live mapping is in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`) | | [IncrementalSync.md](IncrementalSync.md) | Address-space rebuild on redeploy + `sp_ComputeGenerationDiff` | -| [HistoricalDataAccess.md](HistoricalDataAccess.md) | `IHistoryProvider` as a per-driver optional capability | +| [HistoricalDataAccess.md](v1/HistoricalDataAccess.md) | `IHistoryProvider` as a per-driver optional capability (v1 archive) | | [VirtualTags.md](VirtualTags.md) | `Core.Scripting` + `Core.VirtualTags` — Roslyn script sandbox, engine, dispatch alongside driver tags | | [ScriptedAlarms.md](ScriptedAlarms.md) | `Core.ScriptedAlarms` — script-predicate `IAlarmSource` + Part 9 state machine | @@ -36,7 +35,7 @@ Two Core subsystems are shipped without a dedicated top-level doc; see the secti | Project | See | |---------|-----| -| `Core.AlarmHistorian` | [AlarmTracking.md](AlarmTracking.md) § Alarm historian sink | +| `Core.AlarmHistorian` | [AlarmTracking.md](v1/AlarmTracking.md) § Alarm historian sink (v1 archive) | | `Analyzers` (Roslyn OTOPCUA0001) | [security.md](security.md) § OTOPCUA0001 Analyzer | ### Drivers @@ -44,8 +43,8 @@ Two Core subsystems are shipped without a dedicated top-level doc; see the secti | Doc | Covers | |-----|--------| | [drivers/README.md](drivers/README.md) | Index of the eight shipped drivers + capability matrix | -| [drivers/Galaxy.md](drivers/Galaxy.md) | Galaxy driver — MXAccess bridge, Host/Proxy split, named-pipe IPC | -| [drivers/Galaxy-Repository.md](drivers/Galaxy-Repository.md) | Galaxy-specific discovery via the ZB SQL database | +| [drivers/Galaxy.md](drivers/Galaxy.md) | Galaxy driver — in-process gRPC client to the mxaccessgw sidecar | +| [v1/drivers/Galaxy-Repository.md](v1/drivers/Galaxy-Repository.md) | Galaxy-specific discovery via the ZB SQL database (v1 archive — the gateway owns this path now) | For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics, see [v2/driver-specs.md](v2/driver-specs.md). @@ -53,10 +52,10 @@ For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics | Doc | Covers | |-----|--------| -| [Configuration.md](Configuration.md) | appsettings bootstrap + Config DB + Admin UI draft/publish | +| [Configuration.md](v1/Configuration.md) | appsettings bootstrap + Config DB + Admin UI draft/publish (v1 archive — `OTOPCUA_GALAXY_*` env vars now live in mxaccessgw config) | | [security.md](security.md) | Transport security profiles, LDAP auth, ACL trie, role grants, OTOPCUA0001 analyzer | | [Redundancy.md](Redundancy.md) | `RedundancyCoordinator`, `ServiceLevelCalculator`, apply-lease, Prometheus metrics | -| [ServiceHosting.md](ServiceHosting.md) | Three-process deploy (Server + Admin + Galaxy.Host) install/uninstall | +| [ServiceHosting.md](ServiceHosting.md) | Two-process deploy (Server + Admin) install/uninstall, plus the optional `OtOpcUaWonderwareHistorian` sidecar | | [StatusDashboard.md](StatusDashboard.md) | Pointer — superseded by [v2/admin-ui.md](v2/admin-ui.md) | ### Client tooling @@ -79,10 +78,10 @@ For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics |-----|--------| | [reqs/HighLevelReqs.md](reqs/HighLevelReqs.md) | HLRs — numbered system-level requirements | | [reqs/OpcUaServerReqs.md](reqs/OpcUaServerReqs.md) | OPC UA server-layer reqs | -| [reqs/ServiceHostReqs.md](reqs/ServiceHostReqs.md) | Per-process hosting reqs | +| [v1/reqs/ServiceHostReqs.md](v1/reqs/ServiceHostReqs.md) | Per-process hosting reqs (v1 archive — only `OtOpcUa` server hosting remains in scope post-PR-7.2) | | [reqs/ClientRequirements.md](reqs/ClientRequirements.md) | Client CLI + UI reqs | -| [reqs/GalaxyRepositoryReqs.md](reqs/GalaxyRepositoryReqs.md) | Galaxy-scoped repository reqs | -| [reqs/MxAccessClientReqs.md](reqs/MxAccessClientReqs.md) | Galaxy-scoped MXAccess reqs | +| [v1/reqs/GalaxyRepositoryReqs.md](v1/reqs/GalaxyRepositoryReqs.md) | Galaxy-scoped repository reqs (v1 archive — owned by mxaccessgw today) | +| [v1/reqs/MxAccessClientReqs.md](v1/reqs/MxAccessClientReqs.md) | Galaxy-scoped MXAccess reqs (v1 archive — owned by mxaccessgw today) | | [reqs/StatusDashboardReqs.md](reqs/StatusDashboardReqs.md) | Pointer — superseded by Admin UI | ## Implementation history (`docs/v2/`) @@ -97,3 +96,7 @@ Design decisions + phase plans + execution notes. Load-bearing cross-references - [v2/dev-environment.md](v2/dev-environment.md) — dev-box bootstrap - [v2/test-data-sources.md](v2/test-data-sources.md) — integration-test simulator matrix (includes the pinned libplctag `ab_server` version for AB CIP tests) - [v2/implementation/phase-*-*.md](v2/implementation/) — per-phase execution plans with exit-gate evidence + +## v1 archive + +The v1 in-process MXAccess architecture (Galaxy.Host + Galaxy.Proxy + Galaxy.Shared, .NET 4.8 x86 COM, the `OtOpcUaGalaxyHost` Windows service) was retired in PR 7.2 (2026-04-30, commit `ae7106d`). Docs that described that shape are kept under [v1/](v1/) as historical record — see [v1/README.md](v1/README.md) for the index. diff --git a/docs/drivers/Galaxy.md b/docs/drivers/Galaxy.md index 7ea5e83..32ddbb3 100644 --- a/docs/drivers/Galaxy.md +++ b/docs/drivers/Galaxy.md @@ -1,211 +1,92 @@ # Galaxy Driver -The Galaxy driver bridges OtOpcUa to AVEVA System Platform (Wonderware) Galaxies through the `ArchestrA.MxAccess` COM API plus the Galaxy Repository SQL database. It is one driver of seven in the OtOpcUa platform (see [drivers/README.md](README.md) for the full list); all other drivers run in-process in the main Server (.NET 10 x64). Galaxy is the exception — it runs as its own Windows service and talks to the Server over a local named pipe. +The Galaxy driver bridges OtOpcUa to AVEVA System Platform (Wonderware) Galaxies. It is a **Tier-A in-process driver** that runs in the OtOpcUa server's .NET 10 AnyCPU process and speaks gRPC to a separately installed `mxaccessgw` server (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`). The gateway owns the MXAccess COM apartment, the STA + Win32 message pump, the Galaxy Repository SQL reader, and the Historian SDK — all the bits that need x86 / .NET Framework 4.8 / COM interop. The driver itself is platform-agnostic and contains no COM, no STA thread, and no x86 bitness constraint. -For the decision record on why Galaxy is out-of-process and how the refactor was staged, see [docs/v2/plan.md §4 Galaxy/MXAccess as Out-of-Process Driver](../v2/plan.md). For the full driver spec (addressing, data-type map, config shape), see [docs/v2/driver-specs.md §1](../v2/driver-specs.md). +For the driver spec (capability surface, config shape, addressing), see [docs/v2/driver-specs.md §1](../v2/driver-specs.md). For the gateway setup recipe, see [docs/v2/Galaxy.ParityRig.md](../v2/Galaxy.ParityRig.md). For tracing, metrics, and soak profile, see [docs/v2/Galaxy.Performance.md](../v2/Galaxy.Performance.md). -## Project Split +> **Note**: the related drivers `Galaxy-Repository.md` and `Galaxy-Test-Fixture.md` describe the previous v1 / out-of-process topology and are being moved to `docs/v1/` by a parallel cleanup track. Use `Galaxy.ParityRig.md` and the `mxaccessgw` repo for current testing. -Galaxy ships as three projects: +## Architecture -| Project | Target | Role | -|---------|--------|------| -| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/` | .NET Standard 2.0 | IPC contracts (MessagePack records + `MessageKind` enum) referenced by both sides | -| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/` | .NET Framework 4.8 **x86** | Separate Windows service hosting the MXAccess COM objects, STA thread + Win32 message pump, Galaxy Repository reader, Historian SDK, runtime-probe manager | -| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/` | .NET 10 (matches Server) | `GalaxyProxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IRediscoverable, IHostConnectivityProbe` — loaded in-process by the Server; every call forwards over the pipe to the Host | - -The Shared assembly is the **only** contract between the two runtimes. It carries no COM or SDK references so Proxy (net10) can reference it without dragging x86 code into the Server process. - -## Why Out-of-Process - -Two reasons drive the split, per `docs/v2/plan.md`: - -1. **Bitness constraint.** MXAccess is 32-bit COM only — `ArchestrA.MxAccess.dll` in `Program Files (x86)\ArchestrA\Framework\bin` has no 64-bit variant. The main OtOpcUa Server is .NET 10 x64 (the OPC Foundation stack, SqlClient, and every other non-Galaxy driver target 64-bit). In-process hosting would force the whole Server to x86, which every other driver project would then inherit. -2. **Tier-C stability isolation.** Galaxy is classified Tier C in [docs/v2/driver-stability.md](../v2/driver-stability.md) — the COM runtime, STA thread, Aveva Historian SDK, and SQL queries all have crash/hang modes that can take down the hosting process. Isolating the driver in its own Windows service means a COM deadlock, AccessViolation in an unmanaged Historian DLL, or a runaway SQL query never takes the Server endpoint down. The Proxy-side supervisor restarts the Host with crash-loop circuit-breaker. - -The same Tier-C isolation story applies to FOCAS (decision record in `docs/v2/plan.md` §7), which is the second out-of-process driver. - -## IPC Transport - -`GalaxyProxyDriver` → `GalaxyIpcClient` → named pipe → `Galaxy.Host` pipe server. - -- Pipe name: `otopcua-galaxy-{DriverInstanceId}` (localhost-only, no TCP surface) -- Wire format: MessagePack-CSharp, length-prefixed frames -- ACL: pipe is created with a DACL that grants `ReadWrite | Synchronize` only to the configured Server service-principal SID + denies `LocalSystem`. The per-connection SID check in `PipeServer.VerifyCaller` is the real authorization boundary — any caller whose impersonated token SID doesn't match the allowed SID is dropped before the first frame is read. -- Handshake: Proxy presents a shared secret at `OpenSessionRequest`; Host rejects anything else with `MessageKind.OpenSessionResponse{Success=false}` -- Heartbeat: Proxy sends a periodic ping; missed heartbeats trigger the Proxy-side crash-loop supervisor to restart the Host - -Every capability call on `GalaxyProxyDriver` (Read, Write, Subscribe, HistoryRead*, etc.) serializes a `*Request`, awaits the matching `*Response` via a `CallAsync` helper, and rehydrates the result into the `Core.Abstractions` shape the Server expects. - -## STA Thread Requirement (Host-side) - -MXAccess COM objects — `LMXProxyServer` instantiation, `Register`, `AddItem`, `AdviseSupervisory`, `Write`, and cleanup calls — must all execute on the same Single-Threaded Apartment. Calling a COM object from the wrong thread causes marshalling failures or silent data corruption. - -`StaComThread` in the Host provides that thread with the apartment state set before the thread starts: - -```csharp -_thread = new Thread(ThreadEntry) { Name = "MxAccess-STA", IsBackground = true }; -_thread.SetApartmentState(ApartmentState.STA); +``` + +---------------------------------------+ + | OtOpcUa.Server (.NET 10 AnyCPU) | + | GalaxyDriver (in-process) | + | ITagDiscovery / IReadable / | + | IWritable / ISubscribable / | + | IRediscoverable / | + | IHostConnectivityProbe | + +-------------------+-------------------+ + | + gRPC (default http://localhost:5120) + | + v + +---------------------------------------+ + | mxaccessgw (sibling repo) | + | +-------------------------------+ | + | | MxGateway.Worker (x86 net48) | | + | | STA + WM_APP pump | | + | | ArchestrA.MxAccess COM | | + | | Galaxy Repository SQL | | + | | Wonderware Historian SDK | | + | +-------------------------------+ | + +---------------------------------------+ ``` -Work items queue via `RunAsync(Action)` or `RunAsync(Func)` into a `ConcurrentQueue` and post `WM_APP` to wake the pump. Each work item is wrapped in a `TaskCompletionSource` so callers can `await` the result from any thread — including the IPC handler thread that receives the inbound pipe request. +History reads + alarm-condition tracking moved server-side in PR 7.2 (`IHistoryRouter`, `AlarmConditionService`). Galaxy no longer implements `IHistoryProvider` or `IAlarmSource` of its own. -## Win32 Message Pump (Host-side) +## Project Layout -COM callbacks (`OnDataChange`, `OnWriteComplete`) are delivered through the Windows message loop. `StaComThread` runs a standard Win32 message pump via P/Invoke: +The driver ships as a single project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (.NET 10, AnyCPU). Sub-folders: -1. `PeekMessage` primes the message queue (required before `PostThreadMessage` works) -2. `GetMessage` blocks until a message arrives -3. `WM_APP` drains the work queue -4. `WM_APP + 1` drains the queue and posts `WM_QUIT` to exit the loop -5. All other messages go through `TranslateMessage` / `DispatchMessage` for COM callback delivery +| Folder | Role | +|--------|------| +| `Browse/` | Static-side discovery: `GalaxyDiscoverer` walks the gateway's hierarchy + attribute-set RPCs, `DataTypeMap` and `SecurityMap` translate Galaxy types and security classifications into OPC UA equivalents, `AlarmRefBuilder` extracts alarm-bearing attribute references for the server-layer alarm engine. `IGalaxyHierarchySource` + `GatewayGalaxyHierarchySource` + `TracedGalaxyHierarchySource` decorate the gateway browse RPC; `IGalaxyDeployWatchSource` + `GatewayGalaxyDeployWatchSource` + `DeployWatcher` drive `IRediscoverable`. | +| `Runtime/` | Live data path: `EventPump` runs the gateway's `StreamEvents` RPC and fans out to subscribers via a bounded channel; `GalaxyMxSession` is the read-side handle; `GatewayGalaxySubscriber` + `GatewayGalaxyDataWriter` (each with a `Traced*` decorator) implement `ISubscribable` / `IWritable`; `SubscriptionRegistry` tracks subscription state for replay; `ReconnectSupervisor` owns the backoff loop and triggers `ReplaySubscriptions` on session loss; `StatusCodeMap` translates gateway StatusCodes to OPC UA; `MxValueDecoder` / `MxValueEncoder` handle scalar + array marshalling; `GalaxyTelemetry` + `GalaxySubscriptionHandle` round out the surface. | +| `Health/` | `HostStatusAggregator` rolls per-platform probe state into the driver's `IHostConnectivityProbe` view; `PerPlatformProbeWatcher` listens on the gateway's per-host status stream; `HostConnectivityForwarder` pushes transitions out to the server's connectivity bus. | +| `Config/` | `GalaxyDriverOptions` and the four nested option records (`GalaxyGatewayOptions`, `GalaxyMxAccessOptions`, `GalaxyRepositoryOptions`, `GalaxyReconnectOptions`). | -Without this pump MXAccess callbacks never fire and the driver delivers no live data. +Project root files: -## LMXProxyServer COM Object +- `GalaxyDriver.cs` — `IDriver` + capability-interface implementation; composes the Browse / Runtime / Health collaborators. +- `GalaxyDriverFactoryExtensions.cs` — DI registration helper used by the server's driver bootstrap. -`MxProxyAdapter` wraps the real `ArchestrA.MxAccess.LMXProxyServer` COM object behind the `IMxProxy` interface so Host unit tests can substitute a fake proxy without requiring the ArchestrA runtime. Lifecycle: +## Capability Surface -1. **`Register(clientName)`** — Creates a new `LMXProxyServer` instance, wires up `OnDataChange` and `OnWriteComplete` event handlers, calls `Register` to obtain a connection handle -2. **`Unregister(handle)`** — Unwires event handlers, calls `Unregister`, releases the COM object via `Marshal.ReleaseComObject` +`GalaxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IDisposable`. -## Register / AddItem / AdviseSupervisory Pattern +| Capability | Implementation entry point | +|------------|---------------------------| +| `ITagDiscovery` | `Browse/GalaxyDiscoverer.cs` | +| `IRediscoverable` | `Browse/DeployWatcher.cs` | +| `IReadable` | `Runtime/GalaxyMxSession.cs` | +| `IWritable` | `Runtime/GatewayGalaxyDataWriter.cs` | +| `ISubscribable` | `Runtime/GatewayGalaxySubscriber.cs` (driven by `EventPump`) | +| `IHostConnectivityProbe` | `Health/HostStatusAggregator.cs` | -Every MXAccess data operation follows a three-step pattern, all executed on the STA thread: +## Configuration -1. **`AddItem(handle, address)`** — Resolves a Galaxy tag reference (e.g., `TestMachine_001.MachineID`) to an integer item handle -2. **`AdviseSupervisory(handle, itemHandle)`** — Subscribes the item for supervisory data-change callbacks -3. The runtime begins delivering `OnDataChange` events +`DriverConfig` JSON binds to `Config/GalaxyDriverOptions.cs`. The four sections are: -For writes, after `AddItem` + `AdviseSupervisory`, `Write(handle, itemHandle, value, securityClassification)` sends the value; `OnWriteComplete` confirms or rejects. Cleanup reverses: `UnAdviseSupervisory` then `RemoveItem`. +- **`Gateway`** — endpoint, API key secret ref, TLS knobs, connect/call/stream timeouts. `StreamTimeoutSeconds = 0` keeps the long-lived `StreamEvents` RPC open for the driver's lifetime. +- **`MxAccess`** — `ClientName` (must be unique per OtOpcUa instance — redundancy pairs enforce uniqueness at install time), `PublishingIntervalMs` (forwarded as `buffered_update_interval_ms` on subscribe), `WriteUserId` for ArchestrA secured-write, `EventPumpChannelCapacity` (default 50_000 — one second of headroom at 50k tags / 1Hz; tune via the `galaxy.events.dropped` metric). +- **`Repository`** — `DiscoverPageSize`, `WatchDeployEvents`. +- **`Reconnect`** — `InitialBackoffMs`, `MaxBackoffMs`, `ReplayOnSessionLost` (calls the gateway's `ReplaySubscriptions` RPC after reconnect rather than re-issuing subscribe-bulk for every tag). -## OnDataChange and OnWriteComplete Callbacks +Full per-field descriptions live in `Config/GalaxyDriverOptions.cs`. The full JSON skeleton is reproduced in [docs/v2/driver-specs.md §1](../v2/driver-specs.md). -### OnDataChange +## Reconnect + Replay -Fired by the COM runtime on the STA thread when a subscribed tag changes. The handler in `MxAccessClient.EventHandlers.cs`: +`ReconnectSupervisor` owns an exponential-backoff loop bounded by `Reconnect.InitialBackoffMs` / `MaxBackoffMs`. On session loss it tears down the gRPC channel, redials, and — when `ReplayOnSessionLost = true` — calls the gateway's `ReplaySubscriptions` RPC with the cached subscription set from `SubscriptionRegistry` instead of re-subscribing tag-by-tag. The gateway's worker then re-issues `AdviseSupervisory` server-side under the apartment lock. -1. Maps the integer `phItemHandle` back to a tag address via `_handleToAddress` -2. Maps the MXAccess quality code to the internal `Quality` enum -3. Checks `MXSTATUS_PROXY` for error details and adjusts quality -4. Converts the timestamp to UTC -5. Constructs a `Vtq` (Value/Timestamp/Quality) and delivers it to: - - The stored per-tag subscription callback - - Any pending one-shot read completions - - The global `OnTagValueChanged` event (consumed by the Host's subscription dispatcher, which packages changes into `DataChangeEventArgs` and forwards them over the pipe to `GalaxyProxyDriver.OnDataChange`) +## Testing -### OnWriteComplete +- **Unit tests**: `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/` — fakes the gateway gRPC surface; covers Browse, Runtime, Health, and Config in isolation. +- **Parity rig + dev-rig walkthrough**: see [docs/v2/Galaxy.ParityRig.md](../v2/Galaxy.ParityRig.md). The rig stands up a real `mxaccessgw` against a live Galaxy and exercises the full read / write / subscribe / rediscover path. +- **Performance + soak**: see [docs/v2/Galaxy.Performance.md](../v2/Galaxy.Performance.md). -Fired when the runtime acknowledges or rejects a write. The handler resolves the pending `TaskCompletionSource` for the item handle. If `MXSTATUS_PROXY.success == 0` the write is considered failed and the error detail is logged. +## Operational Notes -## Reconnection Logic - -`MxAccessClient` implements automatic reconnection through two mechanisms. - -### Monitor loop - -`StartMonitor` launches a background task that polls at `MonitorIntervalSeconds`. On each cycle: - -- If the state is `Disconnected` or `Error` and `AutoReconnect` is enabled, it calls `ReconnectAsync` -- If connected and a probe tag is configured, it checks the probe staleness threshold - -### Reconnect sequence - -`ReconnectAsync` performs a full disconnect-then-connect cycle: - -1. Increment the reconnect counter -2. `DisconnectAsync` — tear down all active subscriptions (`UnAdviseSupervisory` + `RemoveItem` for each), detach COM event handlers, call `Unregister`, clear all handle mappings -3. `ConnectAsync` — create a fresh `LMXProxyServer`, register, replay all stored subscriptions, re-subscribe the probe tag - -Stored subscriptions (`_storedSubscriptions`) persist across reconnects. `ReplayStoredSubscriptionsAsync` iterates the stored entries and calls `AddItem` + `AdviseSupervisory` for each. - -## Probe Tag Health Monitoring - -A configurable probe tag (e.g., a frequently updating Galaxy attribute) serves as a connection health indicator. After connecting, the client subscribes to the probe tag and records `_lastProbeValueTime` on every `OnDataChange`. The monitor loop compares `DateTime.UtcNow - _lastProbeValueTime` against `ProbeStaleThresholdSeconds`; if the probe has not updated within the window, the connection is assumed stale and a reconnect is forced. This catches scenarios where the COM connection is technically alive but the runtime has stopped delivering data. - -## Per-Host Runtime Status Probes (`.ScanState`) - -Separate from the connection-level probe, the driver advises `.ScanState` on every deployed `$WinPlatform` and `$AppEngine` in the Galaxy. These probes track per-host runtime state so the Admin UI dashboard can report "this specific Platform / AppEngine is off scan" and the driver can proactively invalidate every OPC UA variable hosted by the stopped object — preventing MXAccess from serving stale Good-quality cached values to clients who read those tags while the host is down. - -Enabled by default via `MxAccess.RuntimeStatusProbesEnabled`; see [Configuration](../Configuration.md#mxaccess) for the two config fields. - -### How it works - -`GalaxyRuntimeProbeManager` lives in `Driver.Galaxy.Host` alongside the rest of the MXAccess code. It is owned by the Host's subscription dispatcher and runs a three-state machine per host (Unknown / Running / Stopped): - -1. **Discovery** — After the Host completes `BuildAddressSpace`, the manager filters the hierarchy to rows where `CategoryId == 1` (`$WinPlatform`) or `CategoryId == 3` (`$AppEngine`) and issues `AdviseSupervisory` for `.ScanState` on each one. Probes are driver-owned, not ref-counted against client subscriptions, and persist across address-space rebuilds via a `Sync` diff. -2. **Transition predicate** — A probe callback is interpreted as `isRunning = vtq.Quality.IsGood() && vtq.Value is bool b && b`. Everything else (explicit `ScanState = false`, bad quality, communication errors) means **Stopped**. -3. **On-change-only delivery** — `ScanState` is delivered only when the value actually changes. A stably Running host may go hours without a callback. `Tick()` does NOT run a starvation check on Running entries — the only time-based transition is **Unknown → Stopped** when the initial callback hasn't arrived within `RuntimeStatusUnknownTimeoutSeconds` (default 15s). This protects against a probe that fails to resolve at all without incorrectly flipping healthy long-running hosts. -4. **Transport gating** — When `IMxAccessClient.State != Connected`, `GetSnapshot()` forces every entry to `Unknown`. The dashboard shows the Connection panel as the primary signal in that case rather than misleading operators with "every host stopped". -5. **Subscribe failure rollback** — If `SubscribeAsync` throws for a new probe (SDK failure, broker rejection, transport error), the manager rolls back both `_byProbe` and `_probeByGobjectId` so the probe never appears in `GetSnapshot()`. Stability review 2026-04-13 Finding 1. - -### Subtree quality invalidation on transition - -When a host transitions **Running → Stopped**, the probe manager invokes a callback that walks `_hostedVariables[gobjectId]` — the set of every OPC UA variable transitively hosted by that Galaxy object — and sets each variable's `StatusCode` to `BadOutOfService`. **Stopped → Running** calls `ClearHostVariablesBadQuality` to reset each to `Good` so the next on-change MXAccess update repopulates the value. - -The hosted-variables map is built once per `BuildAddressSpace` by walking each object's `HostedByGobjectId` chain up to the nearest Platform or Engine ancestor. A variable hosted by an Engine inside a Platform lands in both the Engine's list and the Platform's list, so stopping the Platform transitively invalidates every descendant Engine's variables. - -### Read-path short-circuit (`IsTagUnderStoppedHost`) - -The Host's Read handler checks `IsTagUnderStoppedHost(tagRef)` (a reverse-index lookup `_hostIdsByTagRef[tagRef]` → `GalaxyRuntimeProbeManager.IsHostStopped(hostId)`) before the MXAccess round-trip. When the owning host is Stopped, the handler returns a synthesized `DataValue { Value = cachedVar.Value, StatusCode = BadOutOfService }` directly without touching MXAccess. This guarantees clients see a uniform `BadOutOfService` on every descendant tag of a stopped host, regardless of whether they're reading or subscribing. - -### Deferred dispatch — the STA deadlock - -**Critical**: probe transition callbacks must **not** run synchronously on the STA thread that delivered the `OnDataChange`. `MarkHostVariablesBadQuality` takes the subscription dispatcher lock, which may be held by a worker thread currently inside `Read` waiting on an `_mxAccessClient.ReadAsync()` round-trip that is itself waiting for the STA thread. Classic circular wait — the first real deploy of this feature hung inside 30 seconds from exactly this pattern. - -The fix is a deferred-dispatch queue: probe callbacks enqueue the transition onto `ConcurrentQueue<(int GobjectId, bool Stopped)>` and set the existing dispatch signal. The dispatch thread drains the queue inside its existing 100ms `WaitOne` loop — outside any locks held by the STA path — and then calls `MarkHostVariablesBadQuality` / `ClearHostVariablesBadQuality` under its own natural lock acquisition. No circular wait, no STA involvement. - -### Dashboard and health surface - -- Admin UI **Galaxy Runtime** panel shows per-host state with Name / Kind / State / Since / Last Error columns. Panel color is green (all Running), yellow (any Unknown, none Stopped), red (any Stopped), gray (MXAccess transport disconnected) -- `HealthCheckService.CheckHealth` rolls overall driver health to `Degraded` when any host is Stopped - -See [Status Dashboard](../StatusDashboard.md#galaxy-runtime) for the field table and [Configuration](../Configuration.md#mxaccess) for the config fields. - -## Request Timeout Safety Backstop - -Every sync-over-async site on the OPC UA stack thread that calls into Galaxy (`Read`, `Write`, address-space rebuild probe sync) is wrapped in a bounded `SyncOverAsync.WaitSync(...)` helper with timeout `MxAccess.RequestTimeoutSeconds` (default 30s). Inner `ReadTimeoutSeconds` / `WriteTimeoutSeconds` bounds on the async path are the first line of defense; the outer wrapper is a backstop so a scheduler stall, slow reconnect, or any other non-returning async path cannot park the stack thread indefinitely. - -On timeout, the underlying task is **not** cancelled — it runs to completion on the thread pool and is abandoned. This is acceptable because Galaxy IPC clients are shared singletons and the abandoned continuation does not capture request-scoped state. The OPC UA stack receives `StatusCodes.BadTimeout` on the affected operation. - -`ConfigurationValidator` enforces `RequestTimeoutSeconds >= 1` and warns when it is set below the inner Read/Write timeouts (operator misconfiguration). Stability review 2026-04-13 Finding 3. - -All capability calls at the Server dispatch layer are additionally wrapped by `CapabilityInvoker` (Core/Resilience/) which runs them through a Polly pipeline keyed on `(DriverInstanceId, HostName, DriverCapability)`. `OTOPCUA0001` analyzer enforces the wrap at build time. - -## Why Marshal.ReleaseComObject Is Needed - -The .NET Framework runtime's garbage collector releases COM references non-deterministically. For MXAccess, delayed release can leave stale COM connections open, preventing clean re-registration. `MxProxyAdapter.Unregister` calls `Marshal.ReleaseComObject(_lmxProxy)` in a `finally` block to immediately drive the COM reference count to zero. This ensures the underlying COM server is freed before a reconnect attempt creates a new instance. - -## Tag Discovery and Historical Data - -Tag discovery (the Galaxy Repository SQL reader + `LocalPlatform` scope filter) is covered in [Galaxy-Repository.md](Galaxy-Repository.md). The Galaxy driver is `ITagDiscovery` for the Server's bootstrap path and `IRediscoverable` for the on-change-redeploy path. - -Historical data access (raw, processed, at-time, events) runs against the Aveva Historian via the `aahClientManaged` SDK and is exposed through the Galaxy driver's `IHistoryProvider` implementation. See [HistoricalDataAccess.md](../HistoricalDataAccess.md). - -## Key source files - -Host-side (`.NET 4.8 x86`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/`): - -- `Backend/MxAccess/StaComThread.cs` — STA thread and Win32 message pump -- `Backend/MxAccess/MxAccessClient.cs` — Core client (partial) -- `Backend/MxAccess/MxAccessClient.Connection.cs` — Connect / disconnect / reconnect -- `Backend/MxAccess/MxAccessClient.Subscription.cs` — Subscribe / unsubscribe / replay -- `Backend/MxAccess/MxAccessClient.ReadWrite.cs` — Read and write operations -- `Backend/MxAccess/MxAccessClient.EventHandlers.cs` — `OnDataChange` / `OnWriteComplete` handlers -- `Backend/MxAccess/MxAccessClient.Monitor.cs` — Background health monitor -- `Backend/MxAccess/MxProxyAdapter.cs` — COM object wrapper -- `Backend/MxAccess/GalaxyRuntimeProbeManager.cs` — Per-host `ScanState` probes, state machine, `IsHostStopped` lookup -- `Backend/Historian/HistorianDataSource.cs` — `aahClientManaged` SDK wrapper (see [HistoricalDataAccess.md](../HistoricalDataAccess.md)) -- `Ipc/GalaxyIpcServer.cs` — Named-pipe server, message dispatch -- `Domain/IMxAccessClient.cs` — Client interface - -Shared (`.NET Standard 2.0`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/`): - -- `Contracts/MessageKind.cs` — IPC message kinds (`ReadRequest`, `HistoryReadRequest`, `OpenSessionResponse`, …) -- `Contracts/*.cs` — MessagePack DTOs for every request/response pair - -Proxy-side (`.NET 10`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/`): - -- `GalaxyProxyDriver.cs` — `IDriver`/`ITagDiscovery`/`IReadable`/`IWritable`/`ISubscribable`/`IAlarmSource`/`IHistoryProvider`/`IRediscoverable`/`IHostConnectivityProbe` implementation; every method forwards via `GalaxyIpcClient` -- `Ipc/GalaxyIpcClient.cs` — Named-pipe client, `CallAsync`, reconnect on broken pipe -- `GalaxyProxySupervisor.cs` — Host-process monitor, crash-loop circuit-breaker, Host relaunch +- **MXAccess `ClientName` collisions**: two OtOpcUa instances sharing a `ClientName` cause the older Wonderware session to lose subscription state. Redundancy pairs (decision #149) enforce uniqueness via install scripts. +- **Channel saturation**: `galaxy.events.dropped > 0` indicates `EventPump` is back-pressured. Raise `EventPumpChannelCapacity` or investigate downstream slowness in the server-side fan-out. +- **Connectivity surface**: per-platform probe state is exposed through `IHostConnectivityProbe` and aggregated by the server's connectivity bus — there is no driver-private dashboard surface anymore. The Admin UI's Host Status panel is the consumer. diff --git a/docs/AlarmTracking.md b/docs/v1/AlarmTracking.md similarity index 100% rename from docs/AlarmTracking.md rename to docs/v1/AlarmTracking.md diff --git a/docs/Configuration.md b/docs/v1/Configuration.md similarity index 100% rename from docs/Configuration.md rename to docs/v1/Configuration.md diff --git a/docs/DataTypeMapping.md b/docs/v1/DataTypeMapping.md similarity index 100% rename from docs/DataTypeMapping.md rename to docs/v1/DataTypeMapping.md diff --git a/docs/HistoricalDataAccess.md b/docs/v1/HistoricalDataAccess.md similarity index 100% rename from docs/HistoricalDataAccess.md rename to docs/v1/HistoricalDataAccess.md diff --git a/docs/v1/README.md b/docs/v1/README.md new file mode 100644 index 0000000..069ef9d --- /dev/null +++ b/docs/v1/README.md @@ -0,0 +1,30 @@ +# v1 documentation archive + +This folder contains documentation that described the original v1 +in-process MXAccess architecture (`Galaxy.Host` + `Galaxy.Proxy` + +`Galaxy.Shared` three-project split, .NET 4.8 x86 + COM apartment, the +`OtOpcUaGalaxyHost` Windows service). That architecture was retired in +PR 7.2 (merged 2026-04-30 at commit `ae7106d`). These docs are kept as +the historical record of how the system worked before the v2-mxgw +migration; treat their content as accurate at the time of writing, NOT +as current state. + +For current architecture see: + +- `CLAUDE.md` — agent-facing v2 overview +- `docs/drivers/Galaxy.md` — current Galaxy driver doc +- `docs/v2/Galaxy.ParityRig.md` — current testing setup +- `docs/v2/Galaxy.Performance.md` — observability + perf +- `lmx_mxgw.md` (in repo root) — design rationale for the migration + +| File | What it covered | +|---|---| +| `AlarmTracking.md` | v1 alarm-tracking flow through the in-process MXAccess client | +| `Configuration.md` | v1 server configuration (`OTOPCUA_GALAXY_*` env vars now live in mxaccessgw config) | +| `DataTypeMapping.md` | Galaxy `mx_data_type` → OPC UA type mapping (still accurate as a reference; the live mapping logic is in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`) | +| `HistoricalDataAccess.md` | v1 IHistoryProvider on the Host side; current path is the server-level HistoryRouter + Wonderware sidecar | +| `Subscriptions.md` | v1 MXAccess subscription mechanics; current path uses gateway StreamEvents | +| `drivers/Galaxy-Repository.md` | v1 Host-side ZB SQL repository client; the gateway owns this path now | +| `drivers/Galaxy-Test-Fixture.md` | v1 test-fixture setup (parity tests + Galaxy.Host EXE spawn) | +| `reqs/GalaxyRepositoryReqs.md`, `reqs/MxAccessClientReqs.md` | Original Phase 0 requirements; satisfied in mxaccessgw repo today | +| `reqs/ServiceHostReqs.md` | Service-hosting requirements including `OtOpcUaGalaxyHost` (GHX-* section); only `OtOpcUa` server hosting remains in scope post-7.2 | diff --git a/docs/Subscriptions.md b/docs/v1/Subscriptions.md similarity index 100% rename from docs/Subscriptions.md rename to docs/v1/Subscriptions.md diff --git a/docs/drivers/Galaxy-Repository.md b/docs/v1/drivers/Galaxy-Repository.md similarity index 100% rename from docs/drivers/Galaxy-Repository.md rename to docs/v1/drivers/Galaxy-Repository.md diff --git a/docs/drivers/Galaxy-Test-Fixture.md b/docs/v1/drivers/Galaxy-Test-Fixture.md similarity index 100% rename from docs/drivers/Galaxy-Test-Fixture.md rename to docs/v1/drivers/Galaxy-Test-Fixture.md diff --git a/docs/reqs/GalaxyRepositoryReqs.md b/docs/v1/reqs/GalaxyRepositoryReqs.md similarity index 100% rename from docs/reqs/GalaxyRepositoryReqs.md rename to docs/v1/reqs/GalaxyRepositoryReqs.md diff --git a/docs/reqs/MxAccessClientReqs.md b/docs/v1/reqs/MxAccessClientReqs.md similarity index 100% rename from docs/reqs/MxAccessClientReqs.md rename to docs/v1/reqs/MxAccessClientReqs.md diff --git a/docs/reqs/ServiceHostReqs.md b/docs/v1/reqs/ServiceHostReqs.md similarity index 100% rename from docs/reqs/ServiceHostReqs.md rename to docs/v1/reqs/ServiceHostReqs.md diff --git a/docs/v2/Galaxy.ParityMatrix.md b/docs/v2/Galaxy.ParityMatrix.md index f68277e..bbf3242 100644 --- a/docs/v2/Galaxy.ParityMatrix.md +++ b/docs/v2/Galaxy.ParityMatrix.md @@ -1,3 +1,15 @@ +> **✅ Completed 2026-04-30 — historical record of the parity-rig validation gate for PR 7.2.** +> +> The matrix below was the go/no-go gate for retiring the legacy +> Galaxy.Host backend (PR 7.2). Final run on the dev rig 2026-04-30 +> returned 14 passed / 1 skipped / 0 failed; PR 7.2 (commit `fe91d42`) +> deleted the legacy projects + service the next day. The "Running +> the matrix" section is preserved for historical reproducibility but +> the test projects it references (`Driver.Galaxy.ParityTests`) were +> deleted alongside the legacy backend; this matrix is no longer +> runnable. Current Galaxy testing flows through the gateway's own +> test suite (sibling mxaccessgw repo). + # Galaxy backend parity matrix This document tracks the scenario × result matrix that the diff --git a/docs/v2/Galaxy.ParityRig.md b/docs/v2/Galaxy.ParityRig.md index 5d153ee..ed5c77b 100644 --- a/docs/v2/Galaxy.ParityRig.md +++ b/docs/v2/Galaxy.ParityRig.md @@ -1,15 +1,30 @@ # Galaxy parity rig — runbook +> ✅ **Completed 2026-04-30 — historical record.** This runbook is the +> recipe that produced the green parity matrix that gated PR 7.2 +> (retire legacy Galaxy projects, merged at commit `ae7106d`). The +> matrix it produced is captured in +> [`Galaxy.ParityMatrix.md`](Galaxy.ParityMatrix.md), also marked +> historical. The test project this doc drove +> (`Driver.Galaxy.ParityTests`) was deleted in PR 7.2, along with +> `Driver.Galaxy.{Host,Proxy,Shared}` and the `OtOpcUaGalaxyHost` +> Windows service. **You cannot re-run this rig today.** Current +> Galaxy testing flows through the gateway's own test suite in the +> sibling `mxaccessgw` repo. +> +> The text below is preserved as-written so the migration trail (what +> was tested, against what shape, with what env vars) stays auditable. + Brings up both Galaxy backends side-by-side against a single live Galaxy so the parity matrix in `docs/v2/Galaxy.ParityMatrix.md` and the soak scenario in `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs` -can run for real. Closing the parity matrix is the gate for PR 7.2 +can run for real. Closing the parity matrix was the gate for PR 7.2 (retire legacy Galaxy projects). ## Conceptual layout ``` -Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) +Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) [DELETED in PR 7.2] │ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host" │ └── named pipe "OtOpcUaGalaxy" │ ▲ @@ -29,17 +44,19 @@ Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) Both halves talk to the **same Galaxy** through **two distinct MxAccess sessions** (different ClientNames so they don't evict each other). -## What's already on this dev box +## What was on the dev box at the time -Per `~/.claude/projects/.../memory/`: +Per `~/.claude/projects/.../memory/` *as of the rig run*: - **AVEVA System Platform + Galaxy + MXAccess runtime** — `project_aveva_platform_installed.md`. - **`OtOpcUaGalaxyHost`** Windows service running as `dohertj2`, NSSM-wrapped, binary at `C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe`, shared secret at `.local/galaxy-host-secret.txt`, ZB SQL on `localhost:1433` - — `project_galaxy_host_installed.md`. -- **Parity test project** (`Driver.Galaxy.ParityTests`) committed and - skip-clean — runs as soon as the mxgw half resolves. + — `project_galaxy_host_installed.md`. **(Service uninstalled and binary + retired as part of PR 7.2; the host source project no longer exists in + this repo.)** +- **Parity test project** (`Driver.Galaxy.ParityTests`) — committed and + skip-clean at the time of the rig run. **Deleted in PR 7.2.** ## Setup steps (one-time) @@ -282,7 +299,7 @@ sees the change: ```powershell graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 ` --confirm --confirm-target OtOpcUaParityTest_001 -sc.exe restart OtOpcUaGalaxyHost +sc.exe restart OtOpcUaGalaxyHost # service no longer exists post-PR-7.2; in the modern shape, restart mxaccessgw instead ``` Then re-run the parity matrix. The previously-skipped scenarios should @@ -343,11 +360,14 @@ Galaxy with a script that imports 50k attributes onto a generated UDO - **`LegacySkipReason` says "Galaxy ZB SQL not reachable on localhost:1433"** — SQL Server isn't running, or its TCP listener is off. Check `services.msc` for the SQL Server (default) instance. -- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — the parity - harness looks under `src/.../bin/Debug/net48/`. Build it once: - `dotnet build src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`. Note the - separately-published copy at `C:\publish\OtOpcUaGalaxyHost\` is for - the Windows service; the parity harness spawns its own subprocess. +- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — at rig time + the parity harness looked under + `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` for the + EXE it spawned as a subprocess, separate from the published copy at + `C:\publish\OtOpcUaGalaxyHost\` used by the Windows service. **Both + the source project and the published binary were removed in PR 7.2, + so this troubleshooting branch no longer applies — the legacy half + cannot be brought up at all.** - **Both halves resolve but parity scenarios assert deltas** — that's the expected outcome the rig exists to surface. Review each delta against `docs/v2/Galaxy.ParityMatrix.md`'s "Accepted deltas" section diff --git a/docs/v2/driver-specs.md b/docs/v2/driver-specs.md index 053c192..835695c 100644 --- a/docs/v2/driver-specs.md +++ b/docs/v2/driver-specs.md @@ -10,289 +10,65 @@ ### Summary -Out-of-process **Tier C** driver bridging AVEVA System Platform (Wonderware) Galaxies. The existing v1 implementation is refactored behind the new driver capability interfaces and hosted in a separate Windows service (.NET 4.8 x86) that communicates with the main OtOpcUa server (.NET 10 x64) via named pipes + MessagePack. Hosted out-of-process for **two reasons**: COM/.NET 4.8 x86 bitness constraint **and** Tier C stability isolation (per `driver-stability.md`). FOCAS is the second Tier C driver, also out-of-process — see §7. - -### Library & Dependencies - -| Component | Package / Source | Version | Target | Notes | -|-----------|------------------|---------|--------|-------| -| **MXAccess COM** | `ArchestrA.MxAccess` (GAC / `lib/ArchestrA.MxAccess.dll`) | version-neutral late-bound | .NET 4.8 x86 | Pinned via `` with `EmbedInteropTypes=false`; interfaces: `LMXProxyServer`, `ILMXProxyServerEvents`, `MXSTATUS_PROXY` | -| **Galaxy DB client** | `System.Data.SqlClient` (BCL) | BCL | .NET 4.8 x86 | Direct SQL for hierarchy/attribute/change-detection queries | -| **Wonderware Historian SDK** | `aahClientManaged`, `aahClientCommon` | Historian-shipped | .NET 4.8 x86 | Optional — loaded only when `Historian.Enabled=true` | -| **MessagePack-CSharp** | `MessagePack` NuGet | 2.x | .NET Standard 2.0 (Shared) | IPC serialization; shared contract between Proxy and Host | -| **Named pipes** | `System.IO.Pipes` (BCL) | BCL | both sides | IPC transport, localhost only | - -### Required Components - -- **AVEVA System Platform / ArchestrA Platform** deployed on the same machine as `Galaxy.Host` (installs MXAccess COM objects into the GAC) -- A **deployed Galaxy** with at least one $WinPlatform object hosting $AppEngine(s) hosting AutomationObjects -- **SQL Server** reachable from `Galaxy.Host` with the Galaxy repository database (default `ZB`); Windows Auth by default -- **32-bit .NET Framework 4.8** runtime on the Host machine (MXAccess is 32-bit COM, no 64-bit variant) -- **STA thread + Win32 message pump** inside the Host process for all COM calls and event callbacks (see §13) -- **Wonderware Historian** installed on-box or reachable via aah SDK — *only* if HDA is enabled -- **No external firewall ports** — MXAccess is local-machine COM/IPC; pipe is localhost-only. Galaxy DB port (default SQL 1433) if the ZB database is remote. - -### Connection Settings (per driver instance, from central config DB) - -All settings live under a schemaless `DriverConfig` JSON blob on the `DriverInstance` row. Current v1 equivalents (defaults and source file references in parentheses): - -**MXAccess** (`MxAccessConfiguration.cs`): - -| Setting | Type | Default | Description | -|---------|------|---------|-------------| -| `ClientName` | string | `"LmxOpcUa"` | Registration name passed to `LMXProxyServer.Register()` | -| `NodeName` | string? | `null` | Optional ArchestrA node override (null = local) | -| `GalaxyName` | string? | `null` | Optional Galaxy name override | -| `ReadTimeoutSeconds` | int | `5` | Per-read timeout | -| `WriteTimeoutSeconds` | int | `5` | Per-write timeout | -| `RequestTimeoutSeconds` | int | `30` | Outer safety timeout around any MXAccess request | -| `MaxConcurrentOperations` | int | `10` | Pool bound on in-flight MXAccess work items | -| `MonitorIntervalSeconds` | int | `5` | Connectivity heartbeat probe interval | -| `AutoReconnect` | bool | `true` | Replay stored subscriptions on COM reconnect | -| `ProbeTag` | string? | `null` | Optional heartbeat tag for health monitoring | -| `ProbeStaleThresholdSeconds` | int | `60` | Mark connection stale if no probe callback within | -| `RuntimeStatusProbesEnabled` | bool | `true` | Auto-subscribe `ScanState` for $WinPlatform / $AppEngine | -| `RuntimeStatusUnknownTimeoutSeconds` | int | `15` | Grace period before an un-probed host is assumed Stopped | - -**Galaxy repository** (`GalaxyRepositoryConfiguration.cs`): - -| Setting | Type | Default | Description | -|---------|------|---------|-------------| -| `ConnectionString` | string | `Server=localhost;Database=ZB;Integrated Security=true;` | ZB SQL Server connection | -| `ChangeDetectionIntervalSeconds` | int | `30` | Poll interval for `galaxy.time_of_last_deploy` | -| `CommandTimeoutSeconds` | int | `30` | SQL command timeout | -| `ExtendedAttributes` | bool | `false` | Include extended attribute metadata in discovery | -| `Scope` | enum (`Galaxy` \| `LocalPlatform`) | `Galaxy` | Address-space scope filter (commit bc282b6) | -| `PlatformName` | string? | `Environment.MachineName` | Platform to scope to when `Scope=LocalPlatform` | - -**IPC** (new for v2): - -| Setting | Type | Default | Description | -|---------|------|---------|-------------| -| `PipeName` | string | `otopcua-galaxy-{InstanceId}` | Named pipe name | -| `HostStartupTimeoutMs` | int | `30000` | Proxy wait for Host `Ready` handshake | -| `IpcCallTimeoutMs` | int | `15000` | Per-call RPC timeout | - -### Addressing - -Galaxy objects carry two names: - -- **`contained_name`** — human-readable, scoped to parent; used for OPC UA browse tree -- **`tag_name`** — globally unique system identifier; used for MXAccess runtime references - -| Layer | Example | -|-------|---------| -| OPC UA browse path | `TestMachine_001/DelmiaReceiver/DownloadPath` | -| OPC UA NodeId | `ns=;s=.` | -| MXAccess reference | `DelmiaReceiver_001.DownloadPath` (passed to `AddItem()`) | - -Tag discovery is **dynamic** — driven by the Galaxy repository DB (`gobject`, `dynamic_attribute`, `primitive_instance`, `template_definition`). Optional `Scope=LocalPlatform` filters the hierarchy via the `hosted_by_gobject_id` chain to the subtree rooted at the local $WinPlatform (on a dev Galaxy: 49→3 objects, 4206→386 attributes). - -### Data Type Mapping (`MxDataTypeMapper.cs`, `gr/data_type_mapping.md`) - -| mx_data_type | Galaxy Type | OPC UA BuiltInType | CLR Type | -|--------------|-------------|--------------------|----------| -| 1 | Boolean | Boolean (i=1) | `bool` | -| 2 | Integer | Int32 (i=6) | `int` | -| 3 | Float | Float (i=10) | `float` | -| 4 | Double | Double (i=11) | `double` | -| 5 | String | String (i=12) | `string` | -| 6 | Time | DateTime (i=13) | `DateTime` | -| 7 | ElapsedTime | Double (i=11) | `double` (seconds) | -| 8 | Reference | String (i=12) | `string` | -| 13 | Enumeration | Int32 (i=6) | `int` | -| 14 / 16 | Custom | String (i=12) | `string` | -| 15 | InternationalizedString | LocalizedText (i=21) | `string` | -| (default) | Unknown | String (i=12) | `string` | - -**Arrays**: `is_array=0` → ValueRank `-1` (Scalar); `is_array=1` → ValueRank `1` (OneDimension), ArrayDimensions = `[array_dimension]`. - -### Security Classification Mapping (`SecurityClassificationMapper.cs`) - -| security_classification | Galaxy Level | OPC UA Write Permission | -|-------------------------|--------------|-------------------------| -| 0 | FreeAccess | `WriteOperate` | -| 1 | Operate | `WriteOperate` | -| 2 | SecuredWrite | — (read-only in v1) | -| 3 | VerifiedWrite | — (read-only in v1) | -| 4 | Tune | `WriteTune` | -| 5 | Configure | `WriteConfigure` | -| 6 | ViewOnly | — (read-only) | - -Maps to the OPC UA roles `ReadOnly` / `WriteOperate` / `WriteTune` / `WriteConfigure` defined in the LDAP role provider (see `docs/security.md`). - -### Subscription Model — Native MXAccess Advisories - -**Galaxy is one of three drivers with native subscriptions (Galaxy, TwinCAT, OPC UA Client).** No polling. - -- Mechanism: `LMXProxyServer.AddItem()` → `AdviseSupervisory(handle, itemHandle)`; callbacks delivered through the `ILMXProxyServerEvents.OnDataChange` COM event -- Callback signature: `MxDataChangeHandler(itemHandle, MXSTATUS_PROXY, value, quality, timestamp)` -- Dispatch: STA COM event → dispatch-thread queue → OPC UA `ClearChangeMasks` fan-out (decouples COM thread from UA stack lock — commit c76ab8f) -- **Stored subscriptions** replayed on reconnect via `ReplayStoredSubscriptionsAsync()` -- **Probe tag** + runtime-status probes provide connection-health visibility (see §14) -- **Bad-quality fan-out**: when a host ($WinPlatform or $AppEngine) ScanState transitions to Stopped, every attribute under that host is immediately published as `BadOutOfService` (commits 7310925, c76ab8f) - -### Alarm Model - -In-process alarm-condition tracking (v1 baseline; extended in v2 to match `IAlarmSource`): - -- **Auto-subscribed attributes per alarm-eligible object**: `InAlarm`, `Priority`, `Description` (cached for severity and message) -- **Filtering**: `AlarmFilterConfiguration.ObjectFilters[]` — include/exclude by template chain (empty = all eligible) -- **Transitions**: `InAlarm` change → OPC UA A&C `AlarmConditionState` event (Active / Return to Normal) -- **Severity**: Galaxy `Priority` (1 = highest) mapped to OPC UA 1–1000 severity (higher = more severe) -- **Acknowledgment**: local OPC UA ack forwards to MXAccess write on the `Ack` attribute of the alarm-bearing object - -### History Model — Wonderware Historian (optional plugin) - -- Loaded **at runtime** from `ZB.MOM.WW.LmxOpcUa.Historian.Aveva.dll` when `Historian.Enabled=true`; compile-time optional -- SDK: `aahClientManaged` / `aahClientCommon` -- Supported OPC UA HDA calls: - - `HistoryReadRawModified` (raw values with bounds) - - `HistoryReadProcessed` (Historian aggregates: AVG, MIN, MAX, TIMEAVG, etc. — mapped to OPC UA aggregates) - - Continuation points for paged reads -- Only attributes flagged `historize=1` in the Galaxy DB expose `AccessLevel.HistoryRead` - -### Error Mapping — MXAccess → Quality → OPC UA StatusCode - -**Byte quality (OPC DA convention)** — `QualityMapper.cs`: - -| OPC DA Quality | Category | -|----------------|----------| -| `>= 192` | Good | -| `64–191` | Uncertain | -| `< 64` | Bad | - -**MXAccess error codes → Quality** (`MxErrorCodes.cs`): - -| Code | Name | Quality | -|------|------|---------| -| 1008 | `MX_E_InvalidReference` | `BadConfigError` | -| 1012 | `MX_E_WrongDataType` | `BadConfigError` | -| 1013 | `MX_E_NotWritable` | `BadOutOfService` | -| 1014 | `MX_E_RequestTimedOut` | `BadCommFailure` | -| 1015 | `MX_E_CommFailure` | `BadCommFailure` | -| 1016 | `MX_E_NotConnected` | `BadNotConnected` | - -**Quality → OPC UA StatusCode** (`QualityMapper.cs`): - -| Quality | StatusCode | -|---------|-----------| -| Good | `0x00000000` | -| GoodLocalOverride | `0x00D80000` | -| Uncertain | `0x40000000` | -| Bad (generic) | `0x80000000` | -| BadCommFailure | `0x80050000` | -| BadNotConnected | `0x808A0000` | -| BadOutOfService | `0x808D0000` | - -### Change Detection - -- `ChangeDetectionService` polls `galaxy.time_of_last_deploy` at `ChangeDetectionIntervalSeconds` (default 30s) -- On timestamp change, `OnGalaxyChanged` fires → Host re-queries hierarchy/attributes → emits `TagSetChanged` over IPC → Proxy implements `IRediscoverable` and rebuilds the affected subtree in the address space -- Platform-scope filter (commit bc282b6) applied during hierarchy load when `Scope=LocalPlatform` - -### IPC Contract (Proxy ↔ Host) — `Galaxy.Shared` - -.NET Standard 2.0 MessagePack contracts. Every request carries a correlation ID; responses carry the same ID plus success/error. - -**Lifecycle / handshake**: - -| Message | Direction | Payload | -|---------|-----------|---------| -| `ClientHello` | Proxy → Host | InstanceId, expected protocol version | -| `HostReady` | Host → Proxy | Host version, Galaxy name, capabilities | -| `Shutdown` | Proxy → Host | Graceful stop | - -**Tag discovery** (`ITagDiscovery`): - -| Message | Direction | Payload | -|---------|-----------|---------| -| `DiscoverHierarchyRequest` | Proxy → Host | `Scope`, `PlatformName` | -| `DiscoverHierarchyResponse` | Host → Proxy | `GalaxyObjectInfo[]` (TagName, ContainedName, ParentTagName, TemplateChain, category) | -| `DiscoverAttributesRequest` | Proxy → Host | `TagName[]` | -| `DiscoverAttributesResponse` | Host → Proxy | `GalaxyAttributeInfo[]` (Name, MxDataType, IsArray, ArrayDim, SecurityClass, Historized, WriteableRuntimeChecked) | -| `TagSetChangedNotification` | Host → Proxy | New deploy timestamp; triggers re-discover | - -**Read / Write** (`IReadable`, `IWritable`): - -| Message | Direction | Payload | -|---------|-----------|---------| -| `ReadRequest` | Proxy → Host | `TagRef[]` (tag_name + attribute) | -| `ReadResponse` | Host → Proxy | `VtqPayload[]` (value, quality, timestamp, statusCode) | -| `WriteRequest` | Proxy → Host | `(TagRef, Value, ExpectedDataType)[]` | -| `WriteResponse` | Host → Proxy | `(TagRef, StatusCode)[]` | - -**Subscription** (`ISubscribable`): - -| Message | Direction | Payload | -|---------|-----------|---------| -| `SubscribeRequest` | Proxy → Host | `TagRef[]` + Proxy-generated subscription ID | -| `SubscribeResponse` | Host → Proxy | Per-tag subscribe ack + handle | -| `UnsubscribeRequest` | Proxy → Host | handles | -| `DataChangeNotification` | Host → Proxy (push) | handle, VTQ, sequence number | -| `ProbeHealthNotification` | Host → Proxy (push) | probe tag staleness, `ScanState` transitions, overall connected/disconnected | - -**Alarms** (`IAlarmSource`): - -| Message | Direction | Payload | -|---------|-----------|---------| -| `AlarmEventNotification` | Host → Proxy (push) | source tag, InAlarm, Priority, Description, severity, transition type | -| `AlarmAckRequest` | Proxy → Host | source tag, user, comment | - -**History** (`IHistoryProvider`): - -| Message | Direction | Payload | -|---------|-----------|---------| -| `HistoryReadRawRequest` | Proxy → Host | TagRef, start, end, numValues, returnBounds, continuationPoint | -| `HistoryReadRawResponse` | Host → Proxy | values + next continuation point | -| `HistoryReadProcessedRequest` | Proxy → Host | TagRef, aggregateId, start, end, resampleInterval | -| `HistoryReadProcessedResponse` | Host → Proxy | aggregated values | - -**Framing**: length-prefixed MessagePack frames over a single `NamedPipeServerStream` in `PipeTransmissionMode.Byte`. Separate outgoing pipe for push notifications or multiplex via message type tag. - -### Threading / COM Constraints - -- **STA thread** (`StaComThread.cs`) hosts MXAccess: `ApartmentState.STA`, raw Win32 `GetMessage` / `DispatchMessage` loop -- Work items marshaled in via `PostThreadMessage(WM_APP=0x8000)` -- **Per-handle serialization**: LMXProxyServer is not thread-safe — all Read/Write/Subscribe calls on one handle run serially via the STA queue -- **Dispatch thread** (separate from STA thread) drains `_pendingDataChanges` to the OPC UA framework; decouples the STA pump from UA stack locks so a slow subscriber can't back up COM event delivery -- **Reentrancy guards** — event unwiring must precede `Marshal.ReleaseComObject()` on disconnect - -### Runtime Status (recent commits bc282b6 / 4b209f6 / 7310925 / c76ab8f / 0003984) - -- `GalaxyRuntimeProbeManager` auto-subscribes `.ScanState` for every $WinPlatform (category 1) and $AppEngine (category 3) in scope -- Per-host state machine: `Unknown → Running | Stopped`; transitions fire `_onHostStopped` / `_onHostRunning` callbacks on the dispatch thread -- **Synthetic OPC UA nodes** expose `ScanState` per host as read-only variables so clients see runtime topology without the dashboard -- **HealthCheck Rule 2e** monitors probe subscription health; a failed probe can no longer leave phantom entries that fan out false `BadOutOfService` -- Generalizes to the driver-agnostic `IHostConnectivityProbe` capability interface in v2 (see `plan.md` §5a) - -### Implementation Notes - -- **First Tier C out-of-process driver** — uses the `Galaxy.Proxy` / `Galaxy.Host` / `Galaxy.Shared` three-project split. The pattern is reusable; FOCAS is the second adopter (see §7), and any future driver with bitness, licensing, or stability-isolation needs reuses the same template. See `driver-stability.md` for the generalized contract -- `Galaxy.Proxy` (in the main server) implements `IDriver`, `ITagDiscovery`, `IRediscoverable`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IHostConnectivityProbe` -- `Galaxy.Host` owns `MxAccessBridge`, `GalaxyRepository`, alarm tracking, `GalaxyRuntimeProbeManager`, and the Historian plugin — no reference to `Core.Abstractions` -- `Galaxy.Shared` is .NET Standard 2.0, referenced by both sides -- Existing v1 code is the implementation — **refactor in place** (extract capability interfaces first, then move behind IPC — see `plan.md` Decision #55) -- **Parity gate**: v2 driver must pass v1 `IntegrationTests` suite + scripted Client.CLI walkthrough before Phase 3 begins - -### Operational Stability Notes - -Galaxy has a Tier C deep dive in `driver-stability.md` covering the STA pump, COM object lifetime, subscription replay, recycle policy, and post-mortem contents. Driver-instance specifics: - -- **Memory baseline scales with Galaxy size**. Watchdog floor of 200 MB above baseline + 1.5 GB hard ceiling — higher than FOCAS because legitimate Galaxy footprints are larger. -- **Slope tolerance is 5 MB/min** (more permissive than FOCAS) because address-space rebuild on redeploy can transiently allocate large amounts. -- **Known regression-prone failure modes** (closed in commits `c76ab8f` and `7310925`, must remain closed): phantom probe subscription flipping Tick() to Stopped; cross-host quality clear wiping sibling state during recovery; sync-over-async on the OPC UA stack thread; fire-and-forget alarm tasks racing shutdown. Each should have a regression test in the v2 parity suite. -- **STA pump health probe** every 10 s (separate from the proxy↔host heartbeat). A wedged pump is the most likely Tier C failure mode for Galaxy. -- **Recycle preserves cached `time_of_last_deploy` watermark** — the common case (crash unrelated to redeploy) skips full DB rediscovery for faster recovery. - -### Namespace Assignment - -Galaxy is the canonical **SystemPlatform-kind namespace** driver. It exposes Aveva System Platform / Galaxy objects as OPC UA — these are *processed* values with business meaning attached at Layer 3, not raw equipment signals. Per `plan.md` §4: - -- The Galaxy driver's `DriverInstance.NamespaceId` must reference a `Namespace` row with `Kind = 'SystemPlatform'`. -- **UNS naming rules do NOT apply** to the Galaxy hierarchy. Tags belong to `DriverInstanceId + FolderPath` (v1 LmxOpcUa pattern preserved); `Tag.EquipmentId` is NULL. -- The Galaxy hierarchy reflects the gobject parent chain as v1 has always done — no migration to UNS path conventions in v2. -- If a future need arises to expose raw Galaxy gobject data alongside processed (e.g. an Aveva-Wonderware Historian raw signal feed), that becomes a *separate* driver instance assigned to an Equipment-kind namespace, with its own per-equipment mapping. +Galaxy (MXAccess) is a **Tier-A in-process driver** that runs in the OtOpcUa server's .NET 10 AnyCPU process and speaks gRPC to a separately installed `mxaccessgw` (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`). The gateway owns the MXAccess COM apartment, the STA pump, and the Galaxy Repository / Historian SDK on its own host; the driver itself is platform-agnostic and carries no COM or x86 bitness constraint. Project lives at `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`. + +### Capability Surface + +`GalaxyDriver` (in `GalaxyDriver.cs`) implements `IDriver`, `IDisposable`, plus six driver capabilities — eight interfaces total. + +| Capability | Source files | +|------------|--------------| +| `ITagDiscovery` | `Browse/GalaxyDiscoverer.cs`, `Browse/GatewayGalaxyHierarchySource.cs`, `Browse/DataTypeMap.cs`, `Browse/SecurityMap.cs`, `Browse/AlarmRefBuilder.cs` | +| `IRediscoverable` | `Browse/DeployWatcher.cs`, `Browse/GatewayGalaxyDeployWatchSource.cs` | +| `IReadable` | `Runtime/GalaxyMxSession.cs`, `Runtime/MxValueDecoder.cs`, `Runtime/StatusCodeMap.cs` | +| `IWritable` | `Runtime/GatewayGalaxyDataWriter.cs` (+ `TracedGalaxyDataWriter.cs`), `Runtime/MxValueEncoder.cs` | +| `ISubscribable` | `Runtime/GatewayGalaxySubscriber.cs` (+ `TracedGalaxySubscriber.cs`), `Runtime/EventPump.cs`, `Runtime/SubscriptionRegistry.cs`, `Runtime/ReconnectSupervisor.cs` | +| `IHostConnectivityProbe` | `Health/HostStatusAggregator.cs`, `Health/HostConnectivityForwarder.cs`, `Health/PerPlatformProbeWatcher.cs` | + +History reads + alarm condition tracking now live in the server-layer `IHistoryRouter` and `AlarmConditionService` (PR 7.2). Galaxy no longer carries `IHistoryProvider` or `IAlarmSource` of its own. + +### DriverConfig JSON shape + +Per `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Config/GalaxyDriverOptions.cs`: + +```jsonc +{ + "Gateway": { + "Endpoint": "http://localhost:5120", + "ApiKeySecretRef": "secret:galaxy-gw-api-key", + "UseTls": true, + "CaCertificatePath": null, + "ConnectTimeoutSeconds": 10, + "DefaultCallTimeoutSeconds": 30, + "StreamTimeoutSeconds": 0 + }, + "MxAccess": { + "ClientName": "OtOpcUa", + "PublishingIntervalMs": 1000, + "WriteUserId": 0, + "EventPumpChannelCapacity": 50000 + }, + "Repository": { + "DiscoverPageSize": 5000, + "WatchDeployEvents": true + }, + "Reconnect": { + "InitialBackoffMs": 500, + "MaxBackoffMs": 30000, + "ReplayOnSessionLost": true + } +} +``` + +`Gateway.ApiKeySecretRef` resolves through the server-side secret store (DPAPI in production, env override in dev) — the API key never appears in cleartext config. `MxAccess.ClientName` MUST be unique per OtOpcUa instance; redundancy pairs enforce uniqueness at install time. `StreamTimeoutSeconds = 0` keeps the `StreamEvents` RPC alive for the lifetime of the driver. + +### Performance, tracing, soak + +See [Galaxy.Performance.md](Galaxy.Performance.md) for the OpenTelemetry trace map, the per-RPC metric set (`galaxy.events.dropped`, channel headroom, reconnect backoff distribution), and the soak-run profile. + +### Parity rig + gateway setup + +See [Galaxy.ParityRig.md](Galaxy.ParityRig.md) and the `mxaccessgw` repo for the gateway worker layout and the dev-rig recipe. --- diff --git a/docs/v2/implementation/phase-2-galaxy-out-of-process.md b/docs/v2/implementation/phase-2-galaxy-out-of-process.md index 111b914..2f58560 100644 --- a/docs/v2/implementation/phase-2-galaxy-out-of-process.md +++ b/docs/v2/implementation/phase-2-galaxy-out-of-process.md @@ -1,3 +1,14 @@ +> **✅ Completed 2026-04-30 — historical record of Phase 2 (Galaxy out-of-process split).** +> +> Phase 2 produced the `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` +> three-project split as a stepping stone toward the eventual mxaccessgw +> architecture. Those projects shipped, served their purpose for +> roughly a year, then retired in PR 7.2 alongside the +> `OtOpcUaGalaxyHost` Windows service. This file is preserved as the +> phase-exit evidence; do not treat it as live architecture +> documentation. See `docs/drivers/Galaxy.md` for the current +> in-process driver. + # Phase 2 — Galaxy Out-of-Process Refactor (Tier C) > **Status**: DRAFT — implementation plan for Phase 2 of the v2 build (`plan.md` §6, `driver-stability.md` §"Galaxy — Deep Dive"). diff --git a/lmx_backend.md b/lmx_backend.md index 94f1c4d..e33cba1 100644 --- a/lmx_backend.md +++ b/lmx_backend.md @@ -1,3 +1,11 @@ +> **✅ Completed 2026-04-30 — historical record of the v2-mxgw backend-options decision.** +> +> This document evaluated alternative backend topologies before the +> v2-mxgw migration. **Option 1 (in-process driver + gRPC gateway) was +> selected and implemented**; see `lmx_mxgw.md` for the design and +> `lmx_mxgw_impl.md` for the implementation plan. Both shipped at +> commit `ae7106d` (2026-04-30). Preserved here as the audit trail. + # Galaxy / LMX Backend — Restructuring Options ## Context diff --git a/lmx_mxgw.md b/lmx_mxgw.md index c231c0b..bdb4457 100644 --- a/lmx_mxgw.md +++ b/lmx_mxgw.md @@ -1,3 +1,13 @@ +> **✅ Completed 2026-04-30 — historical record of the v2-mxgw migration design.** +> +> This document is the design doc that drove the migration from the +> legacy out-of-process Galaxy.Host topology to the in-process +> GalaxyDriver + mxaccessgw architecture. Option 1 (the in-process +> driver path) was selected and implemented across 39 PRs spanning +> phases 0–7, merged to master at commit `ae7106d`. For current +> architecture see `CLAUDE.md`, `docs/drivers/Galaxy.md`, and +> `docs/v2/Galaxy.Performance.md`. + # Galaxy → MxAccessGateway Migration Plan Implements **Option 1** from `lmx_backend.md`: replace the bespoke `Galaxy.Host` @@ -360,7 +370,7 @@ percentile, equal or better throughput in `SubscribeBulk` setup time. call timeouts based on soak results. **Exit:** production-acceptable perf numbers documented in -`docs/Galaxy.Driver.md`. +`docs/drivers/Galaxy.md`. ### Phase 7 — retirement @@ -443,7 +453,7 @@ sample config. ## Cross-cutting deliverables -- **Docs:** `docs/v2/Galaxy.Driver.md` (new), updates to +- **Docs:** `docs/drivers/Galaxy.md` (new), updates to `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`, `docs/Redundancy.md`, `CLAUDE.md`. - **Install scripts:** `scripts/install/Install-Services.ps1` removes diff --git a/lmx_mxgw_impl.md b/lmx_mxgw_impl.md index a04819d..2739351 100644 --- a/lmx_mxgw_impl.md +++ b/lmx_mxgw_impl.md @@ -1,3 +1,13 @@ +> **✅ Completed 2026-04-30 — historical record of the v2-mxgw implementation plan.** +> +> All 39 PRs across 7 phases (1.1–1.3 + 2.1–2.3 + 1+2.W + 3.1–3.W + +> 4.0–4.W + 5.1–5.W + 6.1–6.W + 7.1–7.3) shipped and merged to master +> at commit `ae7106d`. Per-phase status tracking below is preserved as +> the historical PR-execution log; phase descriptions are +> retrospective, not pending. Parity matrix verified green on the dev +> rig 2026-04-30 (14 passed / 1 skipped / 0 failed — +> see `docs/v2/Galaxy.ParityMatrix.md`). + # Galaxy → MxGateway Migration — Detailed Implementation Plan Companion to `lmx_mxgw.md` (design plan). This document breaks the plan into