docs: post-PR-7.2 cleanup — audit + three-track scrub

Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-30 08:59:59 -04:00
parent ae7106dfce
commit 006af51768
21 changed files with 322 additions and 671 deletions

View File

@@ -0,0 +1,152 @@
# Galaxy Repository — Tag Discovery for the Galaxy Driver
`GalaxyRepositoryService` reads the Galaxy object hierarchy and attribute metadata from the System Platform Galaxy Repository SQL Server database. It is the Galaxy driver's implementation of **`ITagDiscovery.DiscoverAsync`** — every driver has its own discovery source, and the Galaxy driver's is a direct SQL query against the Galaxy Repository (the `ZB` database). Other drivers use completely different mechanisms:
| Driver | `ITagDiscovery` source |
|--------|------------------------|
| Galaxy | ZB SQL hierarchy + attribute queries (this doc) |
| AB CIP | `@tags` walker against the PLC controller |
| AB Legacy | Data-table scan via PCCC `LogicalRead` on the PLC |
| TwinCAT | Beckhoff `SymbolLoaderFactory` — uploads the full symbol tree from the ADS runtime |
| S7 | Config-DB enumeration (no native symbol upload for S7comm) |
| Modbus | Config-DB enumeration (flat register map, user-authored) |
| FOCAS | CNC queries (`cnc_rdaxisname`, `cnc_rdmacroinfo`, …) + optional Config-DB overlays |
| OPC UA Client | `Session.Browse` against the remote server |
`GalaxyRepositoryService` lives in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/` — Host-side, .NET Framework 4.8 x86, same process that owns the MXAccess COM objects. The Proxy forwards discovery over IPC the same way it forwards reads and writes.
## Connection Configuration
`GalaxyRepositoryConfiguration` controls database access:
| Property | Default | Description |
|----------|---------|-------------|
| `ConnectionString` | `Server=localhost;Database=ZB;Integrated Security=true;` | SQL Server connection using Windows Authentication |
| `ChangeDetectionIntervalSeconds` | `30` | Polling frequency for deploy change detection |
| `CommandTimeoutSeconds` | `30` | SQL command timeout for all queries |
| `ExtendedAttributes` | `false` | When true, loads primitive-level attributes in addition to dynamic attributes |
| `Scope` | `Galaxy` | `Galaxy` loads all deployed objects. `LocalPlatform` filters to the local platform's subtree only |
| `PlatformName` | `null` | Explicit platform hostname for `LocalPlatform` filtering. When null, uses `Environment.MachineName` |
The connection uses Windows Authentication because the Galaxy Repository database is local to the System Platform node and secured through domain credentials.
## SQL Queries
All queries are embedded as `const string` fields in `GalaxyRepositoryService`. No dynamic SQL is used. Project convention `GR-006` requires `const string` SQL queries; any new query must be added as a named constant rather than built at runtime.
### Hierarchy query
Returns deployed Galaxy objects with their parent relationships, browse names, and template derivation chains:
- Joins `gobject` to `template_definition` to filter by relevant `category_id` values (1, 3, 4, 10, 11, 13, 17, 24, 26)
- Uses `contained_name` as the browse name, falling back to `tag_name` when `contained_name` is null or empty
- Resolves the parent using `contained_by_gobject_id` when non-zero, otherwise falls back to `area_gobject_id`
- Marks objects with `category_id = 13` as areas
- Filters to `is_template = 0` (instances only, not templates)
- Filters to `deployed_package_id <> 0` (deployed objects only)
- Returns a `template_chain` column built by a recursive CTE that walks `gobject.derived_from_gobject_id` from each instance through its immediate template and ancestor templates (depth guard `< 10`). Template names are ordered by depth and joined with `|` via `STUFF(... FOR XML PATH(''))`. Example: `TestMachine_001` returns `$TestMachine|$gMachine|$gUserDefined|$UserDefined`. The C# repository reader splits the column on `|`, trims, and populates `GalaxyObjectInfo.TemplateChain`, which is consumed by `AlarmObjectFilter` for template-based alarm filtering. See [Alarm Tracking](../AlarmTracking.md#template-based-alarm-object-filter).
- Returns `template_definition.category_id` as a `category_id` column, populated into `GalaxyObjectInfo.CategoryId`. The runtime status probe manager filters this down to `CategoryId == 1` (`$WinPlatform`) and `CategoryId == 3` (`$AppEngine`) to decide which objects get a `<Host>.ScanState` probe advised. Also used during the hosted-variables walk to identify Platform/Engine ancestors.
- Returns `gobject.hosted_by_gobject_id` as a `hosted_by_gobject_id` column, populated into `GalaxyObjectInfo.HostedByGobjectId`. This is the **runtime host** of the object (e.g., which `$AppEngine` actually runs it), **not** the browse-containment parent (`contained_by_gobject_id`). The two are often different — an object can live in one Area in the browse tree but be hosted by an Engine on a different Platform for runtime execution. The driver walks this chain during `BuildHostedVariablesMap` to find the nearest `$WinPlatform` or `$AppEngine` ancestor so subtree quality invalidation on a Stopped host reaches exactly the variables that were actually executing there. Note: the Galaxy schema column is named `hosted_by_gobject_id` (not `host_gobject_id` as some documentation sources guess). See [Galaxy driver — Per-Host Runtime Status Probes](Galaxy.md#per-host-runtime-status-probes-hostscanstate).
### Attributes query (standard)
Returns user-defined dynamic attributes for deployed objects:
- Uses a recursive CTE (`deployed_package_chain`) to walk the package inheritance chain from `deployed_package_id` through `derived_from_package_id`, limited to 10 levels
- Joins `dynamic_attribute` on each package in the chain to collect inherited attributes
- Uses `ROW_NUMBER() OVER (PARTITION BY gobject_id, attribute_name ORDER BY depth)` to pick the most-derived definition when an attribute is overridden at multiple levels
- Builds `full_tag_reference` as `tag_name.attribute_name` with `[]` appended for arrays
- Extracts `array_dimension` from the binary `mx_value` column (bytes 13-16, little-endian int32)
- Detects historized attributes by checking for a `HistoryExtension` primitive instance
- Detects alarm attributes by checking for an `AlarmExtension` primitive instance
- Excludes internal attributes (names starting with `_`) and `.Description` suffixes
- Filters by `mx_attribute_category` to include only user-relevant categories
### Attributes query (extended)
When `ExtendedAttributes = true`, a more comprehensive query runs that unions two sources:
1. **Primitive attributes** — Joins through `primitive_instance` and `attribute_definition` to include system-level attributes from primitive components. Each attribute carries its `primitive_name` so the address space can group them under their parent variable.
2. **Dynamic attributes** — The same CTE-based query as the standard path, with an empty `primitive_name`.
The `full_tag_reference` for primitive attributes follows the pattern `tag_name.primitive_name.attribute_name` (e.g., `TestMachine_001.AlarmAttr.InAlarm`).
### Change detection query
A single-column query: `SELECT time_of_last_deploy FROM galaxy`. The `galaxy` table contains one row with the timestamp of the most recent deployment.
## Why deployed_package_id Instead of checked_in_package_id
The Galaxy maintains two package references for each object:
- `checked_in_package_id` — the latest saved version, which may include undeployed configuration changes
- `deployed_package_id` — the version currently running on the target platform
The queries filter on `deployed_package_id <> 0` because the OPC UA address space must mirror what is actually running in the Galaxy runtime. Using `checked_in_package_id` would expose attributes and objects that exist in the IDE but have not been deployed, causing mismatches between the OPC UA address space and the MXAccess runtime.
## Platform Scope Filter
When `Scope` is set to `LocalPlatform`, the repository applies a post-query C# filter to restrict the address space to objects hosted by the local platform. This reduces memory footprint, MXAccess subscription count, and address space size on multi-node Galaxy deployments where each OPC UA server instance only needs to serve its own platform's objects.
### How it works
1. **Platform lookup** — A separate `const string` SQL query (`PlatformLookupSql`) reads `platform_gobject_id` and `node_name` from the `platform` table for all deployed platforms. This runs once per hierarchy load.
2. **Platform matching** — The configured `PlatformName` (or `Environment.MachineName` when null) is matched case-insensitively against the `node_name` column. If no match is found, a warning is logged listing the available platforms and the address space is empty.
3. **Host chain collection** — The filter collects the matching platform's `gobject_id`, then iterates the hierarchy to find all `$AppEngine` (category 3) objects whose `HostedByGobjectId` equals the platform. This produces the full set of host gobject_ids under the local platform.
4. **Object inclusion** — All non-area objects whose `HostedByGobjectId` is in the host set are included, along with the hosts themselves.
5. **Area retention**`ParentGobjectId` chains are walked upward from included objects to pull in ancestor areas, keeping the browse tree connected. Areas that contain no local descendants are excluded.
6. **Attribute filtering** — The set of included `gobject_id` values is cached after `GetHierarchyAsync` and reused by `GetAttributesAsync` to filter attributes to the same scope.
### Design rationale
The filter is applied in C# rather than SQL because project convention `GR-006` requires `const string` SQL queries with no dynamic SQL. The hierarchy query already returns `HostedByGobjectId` and `CategoryId` on every row, so all information needed for filtering is already in memory after the query runs. The only new SQL is the lightweight platform lookup query.
### Configuration
```json
"GalaxyRepository": {
"Scope": "LocalPlatform",
"PlatformName": null
}
```
- Set `Scope` to `"LocalPlatform"` to enable filtering. Default is `"Galaxy"` (load everything).
- Set `PlatformName` to an explicit hostname to target a specific platform, or leave null to use the local machine name.
### Startup log
When `LocalPlatform` is active, the startup log shows the filtering result:
```
GalaxyRepository.Scope="LocalPlatform", PlatformName=MYNODE
GetHierarchyAsync returned 49 objects
GetPlatformsAsync returned 2 platform(s)
Scope filter targeting platform 'MYNODE' (gobject_id=1042)
Scope filter retained 25 of 49 objects for platform 'MYNODE'
GetAttributesAsync returned 4206 attributes (extended=true)
Scope filter retained 2100 of 4206 attributes
```
## Change Detection Polling and IRediscoverable
`ChangeDetectionService` runs a background polling loop in the Host process that calls `GetLastDeployTimeAsync` at the configured interval. It compares the returned timestamp against the last known value:
- On the first poll (no previous state), the timestamp is recorded and `OnGalaxyChanged` fires unconditionally
- On subsequent polls, `OnGalaxyChanged` fires only when `time_of_last_deploy` differs from the cached value
When the event fires, the Host re-runs the hierarchy and attribute queries and pushes the result back to the Server via an IPC `RediscoveryNeeded` message. That surfaces on `GalaxyProxyDriver` as the **`IRediscoverable.OnRediscoveryNeeded`** event; the Server's `DriverNodeManager` consumes it and calls `SyncAddressSpace` to compute the diff against the live address space.
The polling approach is used because the Galaxy Repository database does not provide change notifications. The `galaxy.time_of_last_deploy` column updates only on completed deployments, so the polling interval controls how quickly the OPC UA address space reflects Galaxy changes.
## TestConnection
`TestConnectionAsync` runs `SELECT 1` against the configured database. This is used at Host startup to verify connectivity before attempting the full hierarchy query.
## Key source files
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/GalaxyRepositoryService.cs` — SQL queries and data access
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/PlatformScopeFilter.cs` — Platform-based hierarchy and attribute filtering
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/GalaxyRepository/ChangeDetectionService.cs` — Deploy timestamp polling loop
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Configuration/GalaxyRepositoryConfiguration.cs` — Connection, polling, and scope settings
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Domain/PlatformInfo.cs` — Platform-to-hostname DTO
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/Contracts/DiscoveryResponse.cs` — IPC DTO the Host uses to return hierarchy + attribute results across the pipe

View File

@@ -0,0 +1,165 @@
# Galaxy test fixture
Coverage map + gap inventory for the Galaxy driver — out-of-process Host
(net48 x86 MXAccess COM) + Proxy (net10) + Shared protocol.
**TL;DR: Galaxy has the richest test harness in the fleet** — real Host
subprocess spawn, real ZB SQL queries, IPC parity checks against the v1
LmxProxy reference, + live-smoke tests when MXAccess runtime is actually
installed. Gaps are live-plant + failover-shaped: the E2E suite covers the
representative ~50-tag deployment but not large-site discovery stress, real
Rockwell/Siemens PLC enumeration through MXAccess, or ZB SQL Always-On
replica failover.
## What the fixture is
Multi-project test topology:
- **E2E parity** —
`tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs` spawns the
production `OtOpcUa.Driver.Galaxy.Host.exe` as a subprocess, opens the
named-pipe IPC, connects `GalaxyProxyDriver` + runs hierarchy / stability
parity tests against both.
- **Host.Tests** —
`tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/` — direct Host process
testing (18+ test classes covering alarm discovery, AVEVA prerequisite
checks, IPC dispatcher, alarm tracker, probe manager, historian
cluster/quality/wiring, history read, OPC UA attribute mapping,
subscription lifecycle, reconnect, multi-host proxy, ADS address routing,
expression evaluation) + `GalaxyRepositoryLiveSmokeTests` that hit real
ZB SQL.
- **Proxy.Tests** — `GalaxyProxyDriver` client contract tests.
- **Shared.Tests** — shared protocol + address model.
- **TestSupport** — test helpers reused across the above.
## How tests skip
- **E2E parity**: `ParityFixture.SkipIfUnavailable()` runs at class init and
checks Windows-only, ZB SQL reachable on `localhost:1433`, Host EXE built
in the expected `bin/` folder. Any miss → tests skip.
- **Live-smoke** (`GalaxyRepositoryLiveSmokeTests`): `Assert.Skip` when ZB
unreachable. A `per project_galaxy_host_installed` memory on this repo's
dev box notes the MXAccess runtime is installed. The pipe ACL allows the
configured SID outright; elevation of the caller doesn't matter because
the per-connection SID check in `PipeServer.VerifyCaller` only compares
user SIDs (not group membership or integrity level).
- **Unit** tests (Shared, Proxy contract, most Host.Tests) have no skip —
they run anywhere.
## What it actually covers
### E2E parity suite
- `HierarchyParityTests` — Host address-space hierarchy vs v1 LmxProxy
reference (same ZB, same Galaxy, same shape)
- `StabilityFindingsRegressionTests` — probe subscription failure
handling + host-status mutation guard from the v1 stability findings
backlog
### Host.Tests (representative)
- Alarm discovery → subsystem setup
- AVEVA prerequisite checks (runtime installed, platform deployed, etc.)
- IPC dispatcher — request/response routing over the named pipe
- Alarm tracker state machine
- Probe manager — per-runtime probe subscription + reconnect
- Historian cluster / quality / wiring — Aveva Historian integration
- OPC UA attribute mapping
- Subscription lifecycle + reconnect
- Multi-host proxy routing
- ADS address routing + expression evaluation (Galaxy's legacy expression
language)
### Live-smoke
- `GalaxyRepositoryLiveSmokeTests` — real SQL against ZB database, verifies
the ZB schema + `LocalPlatform` scope filter + change-detection query
shape match production.
### Capability surfaces hit
All of them: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`,
`ISubscribable`, `IHostConnectivityProbe`, `IPerCallHostResolver`,
`IAlarmSource`, `IHistoryProvider`. Galaxy is the only driver where every
interface sees both contract + real-integration coverage.
## What it does NOT cover
### 1. MXAccess COM by default
The E2E parity suite backs subscriptions via the DB-only path; MXAccess COM
integration opts in via a separate live-smoke. So "does the MXAccess STA
pump correctly handle real Wonderware runtime events" is exercised only
when the operator runs live smoke on a machine with MXAccess installed.
### 2. Real Rockwell / Siemens PLC enumeration
Galaxy runtime talks to PLCs through MXAccess (Device Integration Objects).
The CI parity suite uses a representative ~50-tag deployment; large sites
(1000+ tag hierarchies, multi-Galaxy replication, deeply-nested templates)
are not stressed.
### 3. ZB SQL Always-On failover
Live-smoke hits a single SQL instance. Real production ZB often runs on
Always-On availability groups; replica failover behavior is not tested.
### 4. Galaxy replication / backup-restore
Galaxy supports backup + partial replication across platforms — these
rewrite the ZB schema in ways that change the contained_name vs tag_name
mapping. Not exercised.
### 5. Historian failover
Aveva Historian can be clustered. `historian cluster / quality` tests
verify the cluster-config query; they don't exercise actual failover
(primary dies → secondary takes over mid-HistoryRead).
### 6. AVEVA runtime version matrix
MXAccess COM contract varies subtly across System Platform 2017 / 2020 /
2023. The live-smoke runs against whatever version is installed on the dev
box; CI has no AVEVA installed at all (licensing + footprint).
## When to trust the Galaxy suite, when to reach for a live plant
| Question | E2E parity | Live-smoke | Real plant |
| --- | --- | --- | --- |
| "Does Host spawn + IPC round-trip work?" | yes | yes | yes |
| "Does the ZB schema query match production shape?" | partial | yes | yes |
| "Does MXAccess COM handle runtime reconnect correctly?" | no | yes | yes |
| "Does the driver scale to 1000+ tags on one Galaxy?" | no | partial | yes (required) |
| "Does historian failover mid-read return a clean error?" | no | no | yes (required) |
| "Does System Platform 2023's MXAccess differ from 2020?" | no | partial | yes (required) |
| "Does ZB Always-On replica failover preserve generation?" | no | no | yes (required) |
## Follow-up candidates
1. **System Platform 2023 live-smoke matrix** — set up a second dev box
running SP2023; run the same live-smoke against both to catch COM-contract
drift early.
2. **Synthetic large-site fixture** — script a ZB populator that creates a
1000-Equipment / 20000-tag hierarchy, run the parity suite against it.
Catches O(N) → O(N²) discovery regressions.
3. **Historian failover scripted test** — with a two-node AVEVA Historian
cluster, tear down primary mid-HistoryRead + verify the driver's failover
behavior + error surface.
4. **ZB Always-On CI** — SQL Server 2022 on Linux supports Always-On;
could stand up a two-replica group for replica-failover coverage.
This is already the best-tested driver; the remaining work is site-scale
+ production-topology coverage, not capability coverage.
## Key fixture / config files
- `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ParityFixture.cs` — E2E fixture
that spawns Host + connects Proxy
- `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/GalaxyRepositoryLiveSmokeTests.cs`
— live ZB smoke with `Assert.Skip` gate
- `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport/` — shared helpers
- `docs/drivers/Galaxy.md` — COM bridge + STA pump + IPC architecture
- `docs/drivers/Galaxy-Repository.md` — ZB SQL reader + `LocalPlatform`
scope filter + change detection
- `docs/v2/aveva-system-platform-io-research.md` — MXAccess + Wonderware
background