Closes the v2-mxgw migration's housekeeping debt now that PR 7.2 has
retired the legacy projects + service.
Repo docs:
- CLAUDE.md: rewrote the Galaxy section + reference-impl + MXAccess
documentation pointers; replaced .NET 4.8 x86 / COM apartment
constraints with .NET 10 AnyCPU + a pointer to the gateway. Dropped
the "Service hosting (Galaxy.Host)" library-preferences row.
- docs/ServiceHosting.md: rewrote (was 156 lines of Galaxy.Host pipe
IPC details). Now reflects the v2 process shape: OtOpcUa.Server +
OtOpcUa.Admin + optional OtOpcUaWonderwareHistorian, with Galaxy
access via the in-process driver → mxaccessgw.
- docs/v2/dev-environment.md: scrubbed four Galaxy.Host references
(TwinCAT/Galaxy.Host shared-host note; .NET 4.8 SDK row; install
step #2; risks table). The .NET 4.8 SDK is now correctly framed as
"optional, only needed when building the mxaccessgw worker".
- mxaccess_documentation.md: deleted from the repo root (obsolete; the
gateway repo is the canonical MxAccess API doc).
Memory housekeeping (under ~/.claude/projects/.../memory/):
- Retired: project_galaxy_host_service.md,
project_galaxy_host_installed.md, reference_impl.md (the LmxProxy
Host MXAccess reference is no longer the design pattern this repo
uses).
- Revised: project_overview.md (now describes the .NET 10 + mxaccessgw
shape), project_aveva_platform_installed.md (AVEVA still required
on the dev box but consumed by the gateway worker, not by anything
here), project_galaxy_via_mxgateway.md (post-7.2 state — flagged as
the only Galaxy backend), project_server_history_alarm_subsystems.md
(per-driver fallbacks retired in PR 7.2).
- MEMORY.md index updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three root-cause fixes to get an elevated dev-box shell past session open
through to real MXAccess reads:
1. PipeAcl — drop BUILTIN\Administrators deny ACE. UAC's filtered token
carries the Admins SID as deny-only, so the deny fired even from
non-elevated admin-account shells. The per-connection SID check in
PipeServer.VerifyCaller remains the real authorization boundary.
2. PipeServer — swap the Hello-read / VerifyCaller order. ImpersonateNamedPipeClient
returns ERROR_CANNOT_IMPERSONATE until at least one frame has been read
from the pipe; reading Hello first satisfies that rule. Previously the
ACL deny-first path masked this race — removing the deny ACE exposed it.
3. GalaxyIpcClient — add a background reader + single pending-response
slot. A RuntimeStatusChange event between OpenSessionRequest and
OpenSessionResponse used to satisfy the caller's single ReadFrameAsync
and fail CallAsync with "Expected OpenSessionResponse, got
RuntimeStatusChange". The reader now routes response kinds (and
ErrorResponse) to the pending TCS and everything else to a handler the
driver registers in InitializeAsync. The Proxy was already set up to
raise managed events from RaiseDataChange / RaiseAlarmEvent /
OnHostConnectivityUpdate — those helpers had no caller until now.
4. RedundancyPublisherHostedService — swallow BadServerHalted while
polling host.Server.CurrentInstance. StandardServer throws that code
during startup rather than returning null, so the first poll attempt
crashed the BackgroundService (and the host) before OnServerStarted
ran. This race was latent behind the Galaxy init failure above.
Updates docs that described the Admins deny ACE + mandatory non-elevated
shells, and drops the admin-skip guards from every Galaxy integration +
E2E fixture that had them (IpcHandshakeIntegrationTests, EndToEndIpcTests,
ParityFixture, LiveStackFixture, HostSubprocessParityTests).
Adds GalaxyIpcClientRoutingTests covering the router's
request/response match, ErrorResponse, event-between-call, idle event,
and peer-close paths.
Verified live on the dev box against the p7-smoke cluster (gen 6):
driver registered=1 failedInit=0, Phase 7 bridge subscribed, OPC UA
server up on 4840, MXAccess read round-trip returns real data with
Status=0x00000000.
Task #112 — partial: Galaxy live stack is functional end-to-end. The
supplied test-galaxy.ps1 script still fails because the UNS walker
encodes TagConfig JSON as the tag's NodeId instead of the seeded TagId
(pre-existing; separate issue from this commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five operational docs rewritten for v2 (multi-process, multi-driver, Config-DB authoritative):
- docs/Configuration.md — replaced appsettings-only story with the two-layer model.
appsettings.json is bootstrap only (Node identity, Config DB connection string,
transport security, LDAP bind, logging). Authoritative config (clusters, namespaces,
UNS, equipment, tags, driver instances, ACLs, role grants, poll groups) lives in
the Config DB accessed via OtOpcUaConfigDbContext and edited through the Admin UI
draft/publish workflow. Added v1-to-v2 migration index so operators can locate where
each old section moved. Cross-links to docs/v2/config-db-schema.md + docs/v2/admin-ui.md.
- docs/Redundancy.md — Phase 6.3 rewrite. Named every class under
src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/: RedundancyCoordinator, RedundancyTopology,
ApplyLeaseRegistry (publish fencing), PeerReachabilityTracker, RecoveryStateManager,
ServiceLevelCalculator (pure function), RedundancyStatePublisher. Documented the
full 11-band ServiceLevel matrix (Maintenance=0 through AuthoritativePrimary=255)
from ServiceLevelCalculator.cs and the per-ClusterNode fields (RedundancyRole,
ServiceLevelBase, ApplicationUri). Covered metrics
(otopcua.redundancy.role_transition counter + primary/secondary/stale_count gauges
on meter ZB.MOM.WW.OtOpcUa.Redundancy) and SignalR RoleChanged push from
FleetStatusPoller to RedundancyTab.razor.
- docs/security.md — preserved the transport-security section (still accurate) and
added Phase 6.2 authorization. Four concerns now documented in one place:
(1) transport security profiles, (2) OPC UA auth via LdapUserAuthenticator
(note: task spec called this LdapAuthenticationProvider — actual class name is
LdapUserAuthenticator in Server/Security/), (3) data-plane authorization via
NodeAcl + PermissionTrie + AuthorizationGate — additive-only model per decision
#129, ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag hierarchy,
NodePermissions bundle, PermissionProbeService in Admin for "probe this permission",
(4) control-plane authorization via LdapGroupRoleMapping + AdminRole
(ConfigViewer / ConfigEditor / FleetAdmin, CanEdit / CanPublish policies) —
deliberately independent of data-plane ACLs per decision #150. Documented the
OTOPCUA0001 Roslyn analyzer (UnwrappedCapabilityCallAnalyzer) as the compile-time
guard ensuring every driver-capability async call is wrapped by CapabilityInvoker.
- docs/ServiceHosting.md — three-process rewrite: OtOpcUa Server (net10 x64,
BackgroundService + AddWindowsService, hosts OPC UA endpoint + all non-Galaxy
drivers), OtOpcUa Admin (net10 x64, Blazor Server + SignalR + /metrics via
OpenTelemetry Prometheus exporter), OtOpcUa Galaxy.Host (.NET Framework 4.8 x86,
NSSM-wrapped, env-variable driven, STA thread + MXAccess COM). Pipe ACL
denies-Admins detail + non-elevated shell requirement captured from feedback memory.
Divergence from CLAUDE.md: task spec said "TopShelf is still the service-installer
wrapper per CLAUDE.md note" but no csproj in the repo references TopShelf — decision
#30 replaced it with the generic host's AddWindowsService wrapper (per the doc
comment on OpcUaServerService). Reflected the actual state + flagged this divergence
here so someone can update CLAUDE.md separately.
- docs/StatusDashboard.md — replaced the full v1 reference (dashboard endpoints,
health check rules, StatusData DTO, etc.) with a short "superseded by Admin UI"
pointer that preserves git-blame continuity + avoids broken links from other docs
that reference it.
Class references verified by reading:
src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{RedundancyCoordinator, ServiceLevelCalculator,
ApplyLeaseRegistry, RedundancyStatePublisher}.cs
src/ZB.MOM.WW.OtOpcUa.Core/Authorization/{PermissionTrie, PermissionTrieBuilder,
PermissionTrieCache, TriePermissionEvaluator, AuthorizationGate}.cs
src/ZB.MOM.WW.OtOpcUa.Server/Security/{AuthorizationGate, LdapUserAuthenticator}.cs
src/ZB.MOM.WW.OtOpcUa.Admin/{Program.cs, Services/AdminRoles.cs,
Services/RedundancyMetrics.cs, Hubs/FleetStatusPoller.cs}
src/ZB.MOM.WW.OtOpcUa.Server/Program.cs + appsettings.json
src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/{Program.cs, Ipc/PipeServer.cs}
src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/{ClusterNode, NodeAcl,
LdapGroupRoleMapping}.cs
src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renames all 11 projects (5 src + 6 tests), the .slnx solution file, all source-file namespaces, all axaml namespace references, and all v1 documentation references in CLAUDE.md and docs/*.md (excluding docs/v2/ which is already in OtOpcUa form). Also updates the TopShelf service registration name from "LmxOpcUa" to "OtOpcUa" per Phase 0 Task 0.6.
Preserves runtime identifiers per Phase 0 Out-of-Scope rules to avoid breaking v1/v2 client trust during coexistence: OPC UA `ApplicationUri` defaults (`urn:{GalaxyName}:LmxOpcUa`), server `EndpointPath` (`/LmxOpcUa`), `ServerName` default (feeds cert subject CN), `MxAccessConfiguration.ClientName` default (defensive — stays "LmxOpcUa" for MxAccess audit-trail consistency), client OPC UA identifiers (`ApplicationName = "LmxOpcUaClient"`, `ApplicationUri = "urn:localhost:LmxOpcUaClient"`, cert directory `%LocalAppData%\LmxOpcUaClient\pki\`), and the `LmxOpcUaServer` class name (class rename out of Phase 0 scope per Task 0.5 sed pattern; happens in Phase 1 alongside `LmxNodeManager → GenericDriverNodeManager` Core extraction). 23 LmxOpcUa references retained, all enumerated and justified in `docs/v2/implementation/exit-gate-phase-0.md`.
Build clean: 0 errors, 30 warnings (lower than baseline 167). Tests at strict improvement over baseline: 821 passing / 1 failing vs baseline 820 / 2 (one flaky pre-existing failure passed this run; the other still fails — both pre-existing and unrelated to the rename). `Client.UI.Tests`, `Historian.Aveva.Tests`, `Client.Shared.Tests`, `IntegrationTests` all match baseline exactly. Exit gate compliance results recorded in `docs/v2/implementation/exit-gate-phase-0.md` with all 7 checks PASS or DEFERRED-to-PR-review (#7 service install verification needs Windows service permissions on the reviewer's box).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The aahClientManaged SDK is now isolated in ZB.MOM.WW.LmxOpcUa.Historian.Aveva and loaded via HistorianPluginLoader from a Historian/ subfolder only when enabled, removing the SDK from Host's compile-time and deploy-time surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Separates ApplicationUri from namespace identity so each instance in a
redundant pair has a unique server URI while sharing the same Galaxy
namespace. Exposes RedundancySupport, ServerUriArray, and dynamic
ServiceLevel through the standard OPC UA server object. ServiceLevel
is computed from role (Primary/Secondary) and runtime health (MXAccess
and DB connectivity). Adds CLI redundancy command, second deployed
service instance, and 31 new tests including paired-server integration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>