Commit Graph

40 Commits

Author SHA1 Message Date
Joseph Doherty 9cf88e33bd docs(galaxy): rewrite stale PR-4.W/legacy-host forward-ref comments to shipped reality (#13) 2026-06-18 11:52:19 -04:00
Joseph Doherty e9da9c29d2 fix(galaxy): authoritative handle resolution + review cleanups
Make SubscriptionRegistry.TryResolveItemHandle confirm a live subscription
genuinely binds fullRef->handle (via the reverse index) rather than trusting
the forward-map hint + a bare liveness check. Fixes the cross-ref-same-handle
hazard (wrong-tag borrow) while preserving the legitimate
multiple-subscriptions-per-tag borrow. Adds cross-ref + same-ref-multi-sub
tests; drops a duplicate SubscriptionEntry <summary>; documents the writer's
supervisory-advise reconnect lifecycle.
2026-06-18 04:29:45 -04:00
Joseph Doherty 3ffe45db53 feat(galaxy): wire SubscriptionRegistry handle resolver into the production writer 2026-06-18 04:20:34 -04:00
Joseph Doherty 2e3f528afc feat(galaxy): writer borrows live subscription item handles (skip redundant AddItem)
GatewayGalaxyDataWriter now accepts an optional subscribedHandleSource
delegate; TryResolveCachedOrBorrowed checks _itemHandles first then the
source, so the first write to an already-subscribed tag skips the
AddItem round-trip. Borrowed handles are not cached (subscription
registry owns lifecycle). AddItemCallCount seam confirms gateway calls.
2026-06-18 04:18:35 -04:00
Joseph Doherty 1411950077 feat(galaxy): SubscriptionRegistry.TryResolveItemHandle forward lookup
Add _itemHandleByFullRef (OrdinalIgnoreCase ConcurrentDictionary) maintained
in lock-step with _subscribersByItemHandle across Register/Remove/Rebind.
TryResolveItemHandle cross-checks the authoritative reverse map so a stale
forward entry can never hand out a dead handle. Also wires the scaffolded
_addItemCallCount increment in EnsureItemHandleAsync (field was declared but
never assigned, causing a TreatWarningsAsErrors build failure on the branch).
8 new xUnit + Shouldly facts covering register/case-insensitive/remove/rebind/
failed-handle/liveness-guard paths.
2026-06-18 04:18:01 -04:00
Joseph Doherty bec37848d5 fix(galaxy): degrade all parent-cycle members to root (review) 2026-06-16 19:51:20 -04:00
Joseph Doherty 21c7645da8 feat(galaxy): nest gobject browse tree by parent_gobject_id (degrade-to-flat) 2026-06-16 19:41:26 -04:00
Joseph Doherty 1164d423b6 fix(probe): Galaxy gRPC ping — drop invalid Retry, treat MxGatewayAuth exceptions as reachable (live /run)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two bugs caught by live verification against the mxaccessgw at 10.100.0.48:5120:
- MaxAttempts=1 produced an invalid Polly RetryStrategyOptions -> the probe failed
  on every real gateway. Removed the Retry override (matches GalaxyDriver); fail-fast
  is already guaranteed by the TCP preflight + the per-call deadline.
- A rejected key surfaces as a typed MxGatewayAuthenticationException, not a raw
  RpcException, so 'auth-rejection = reachable' was bypassed. Catch the typed auth/
  authorization exceptions -> Ok=true.
Adds DriverProbeHandshakeE2eTests: direct-probe, skip-gated cross-protocol green/red
discrimination (Modbus, OpcUaClient, Galaxy + a local real OPC UA server).
2026-06-16 07:32:59 -04:00
Joseph Doherty 2d688c2a6d feat(probe): Galaxy Test-Connect does a gRPC ping (auth-rejection counts as reachable) 2026-06-16 06:48:40 -04:00
Joseph Doherty ed941c51da feat(alarms): AlarmAcknowledgeRequest carries OperatorUser; Galaxy/ScriptedAlarmSource honor it [H6b] 2026-06-15 14:11:40 -04:00
Joseph Doherty b4af9e7f37 docs(comments): correct 7 stale 'later task/milestone' comments (stillpending §9) 2026-06-15 09:47:08 -04:00
Joseph Doherty 013882262a fix(galaxy): bound alarm-subscription handles to one (no reconnect leak)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
GalaxyDriver's StreamAlarms feed is session-less and survives an in-place
reconnect, so DriverInstanceActor re-subscribed on every Connected re-entry
(after dropping its own cached handle without an Unsubscribe — sync teardown).
The re-subscribe was additive: _alarmSubscriptions.Add grew the list by one
untracked handle per reconnect cycle — a slow unbounded leak. Functionally
harmless (the gate is Count>0 and OnAlarmFeedTransition only reads [0], firing
once regardless), but it accumulated forever.

Fix: SubscribeAlarmsAsync clears the set before adding, collapsing to a single
live handle (under the existing _alarmHandlersLock, atomic w.r.t. the fan-out
reader). There is exactly one consumer per driver instance (factory-per-actor
lifecycle), so replacing the set with the latest handle is faithful. Chosen
over making the actor's sync DetachAlarmSource call UnsubscribeAlarmsAsync
async/fire-and-forget — disproportionate for a minor leak.

Regression test Re_subscribe_collapses_to_a_single_handle_no_accumulation
(TDD-verified: FAILS without the Clear — releasing the latest handle leaves
the feed open because stale handles remain; PASSES with the fix). Galaxy tests
263 pass / 3 skip; Runtime native-alarm 24 pass. Code-reviewed (approved).
2026-06-15 05:49:07 -04:00
Joseph Doherty dcb0be650e feat(galaxy): debug-log native alarm feed delivery (subs + fanout)
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Native-alarm delivery through OnAlarmFeedTransition was a black box — there was no way
to answer 'is the gateway feed delivering / is a subscription un-gating it', which is
partly why the missing-SubscribeAlarmsAsync wiring shipped undetected. Add a single
per-transition Debug line (kind, ref, live subscription count, fanout flag). Debug so a
flapping galaxy doesn't flood prod, but available on demand.
2026-06-15 01:30:15 -04:00
Joseph Doherty f44d8d1e6b feat(alarms): carry transition Kind on AlarmEventArgs; Galaxy populates it (Phase B WS-1) 2026-06-14 03:04:44 -04:00
Joseph Doherty f77488eed9 fix(galaxy): invalidate writer handle caches on session reconnect
Add IGalaxyDataWriter.InvalidateHandleCaches() and call it in
GalaxyDriver.ReopenAsync after RecreateAsync succeeds. Prior to this
fix, GatewayGalaxyDataWriter's _itemHandles and _supervisedHandles
dictionaries survived across reconnects, causing the next write to
skip AddItem and AdviseSupervisory against already-dead handles.
2026-06-14 00:39:24 -04:00
Joseph Doherty f05b5d79c4 fix(galaxy): AdviseSupervisory before raw Write so writes commit
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
A plain MXAccess Write runs with no user login (WriteUserId is typically 0),
and MXAccess only COMMITS such a write when the item is advised in supervisory
mode. Without it the gateway's Write call doesn't throw (the reply looks OK) but
the value never reaches the galaxy. GatewayGalaxyDataWriter now issues
AdviseSupervisory (once per item handle) before each raw Write; SecuredWrite/
VerifiedWrite tags keep their own user-identity path. Live-verified end-to-end:
an authorized write to a Galaxy equipment tag commits and PERSISTS across a
fresh re-subscribe; an anonymous write is denied.

(The sister ScadaBridge driver commits writes the other way — a configured
non-zero WriteUserId + regular Advise; we have no galaxy login, so we use the
supervisory context.)
2026-06-13 23:23:46 -04:00
Joseph Doherty 57355405a6 chore(security): drop dead audit suppressions; patch OpenTelemetry + Tmds.DBus CVEs
All five suppressed advisories are now resolved at baseline/resolved versions,
so every NuGetAuditSuppress is removed repo-wide:
- System.Security.Cryptography.Xml (GHSA-37gx-xxp4-5rgx / GHSA-w3x6-4m5h-cxqf)
  -> fixed by the .NET 10 baseline (10.0.6)
- OPCFoundation Opc.Ua.Core (GHSA-h958-fxgg-g7w3) -> fixed at resolved 1.5.378.106

Two were still live and are now patched via direct security pins:
- OpenTelemetry.Api 1.9.0 -> 1.15.3 (GHSA-g94r-2vxg-569j) pinned in Cluster;
  Runtime/ControlPlane/AdminUI + tests inherit via project reference
- Tmds.DBus.Protocol 0.20.0 -> 0.21.3 (GHSA-xrw6-gwf8-vvr9) pinned in Client.UI

Also correct the Historian sidecar runtime comments (x86 -> x64, matching the
csproj PlatformTarget). Solution audit: 0 vulnerable packages; full build clean.
2026-06-12 09:03:42 -04:00
Joseph Doherty 565b77e6cf fix(galaxy): unify IsConnected with _connected guard; AttachForTests marks connected (review) 2026-06-11 09:16:51 -04:00
Joseph Doherty 43b96441a5 fix(galaxy): reconnect recreates a faulted session instead of no-op'ing 2026-06-11 09:10:52 -04:00
Joseph Doherty 32d7fd7cc9 fix(galaxy): complete PR 7.2 rename — use canonical GalaxyMxGateway driver type
v2-ci / build (push) Failing after 48s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The driver/factory/seed use 'GalaxyMxGateway' (legacy 'Galaxy' was retired),
but the AdminUI editor router, GalaxyDriverPage, address picker, identity
dropdown, the Galaxy browser/probe, and DraftValidator still keyed on 'Galaxy'.
Result: the seeded GalaxyMxGateway driver couldn't be edited ('no editor
registered'), UI-created Galaxy drivers wrote a type with no factory, and a
SystemPlatform-bound GalaxyMxGateway driver failed publish validation.
Align all stragglers to GalaxyMxGateway (+ failing-test-first DraftValidator
coverage). ShouldStub's 'Galaxy' legacy safety-net left intact.
2026-05-29 12:31:55 -04:00
Joseph Doherty 560b327ee1 refactor(galaxy): migrate to ZB.MOM.WW.MxGateway.* nupkg packages
v2-ci / build (push) Failing after 33s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Imports the freshly-rebuilt ZB.MOM.WW.MxGateway.Client + ZB.MOM.WW.MxGateway.Contracts
nupkgs (0.1.0) from /tmp/mxgw-dist. Replaces the vendored libs/ DLLs and the
pre-restructure MxGateway.* namespaces across the runtime Galaxy driver,
Galaxy.Browser, and their tests.

Key changes:
- nuget-packages/ added as a local feed via NuGet.config; .gitignore exempts it
  from the *.nupkg rule so the packages are tracked
- Directory.Packages.props pins both packages at 0.1.0
- 4 csprojs swap <Reference HintPath="libs/...dll"/> for <PackageReference/>
- 36 .cs files renamed `using MxGateway.*` -> `using ZB.MOM.WW.MxGateway.*`
- libs/ removed (vendored DLLs + README.md)

GalaxyBrowseSession rewritten around the new lazy API:
- RootAsync calls GalaxyRepositoryClient.BrowseAsync (returns LazyBrowseNodes)
  and caches them by TagName instead of bulk-fetching the whole hierarchy
- ExpandAsync looks up the cached LazyBrowseNode and calls its ExpandAsync,
  giving true one-wire-call-per-click instead of in-memory parent/child scan
- _byGobjectId + _hasChildrenSet dropped (LazyBrowseNode carries HasChildrenHint)
- AttributesAsync unchanged (already uses DiscoverHierarchyAsync MaxDepth=0)

Tests: Galaxy.Tests 245/245, Galaxy.Browser.Tests 10/10, AdminUI.Tests 66/66.
Pre-existing 12 solution errors unchanged (test sinks + Cli XML comments).
2026-05-29 07:14:18 -04:00
Joseph Doherty 54f0dbddb9 fix(drivers): align probe DriverType strings with AdminUI keys
ModbusDriverProbe.DriverType was "Modbus" but the AdminUI's
ModbusDriverPage persists DriverInstance.DriverType = "ModbusTcp".
GalaxyDriverProbe used the runtime DriverTypeName constant
("GalaxyMxGateway") but the AdminUI saves "Galaxy". The probe DI
lookup is case-insensitive but not name-insensitive, so Test
Connect would fail to find a probe for these two drivers.
2026-05-28 10:55:15 -04:00
Joseph Doherty c19d124e89 feat(drivers): TCP-connect IDriverProbe for all 9 driver types
Cheap-and-fast probe: open TCP socket to the configured endpoint,
close immediately. Surfaces SocketError on failure, latency on
success, "timed out" on caller cancel. Sufficient for the AdminUI
Test Connect "can we reach the host?" question. Richer protocol-
level probes (OPC UA session open, FOCAS handshake, gRPC ping)
are a documented follow-up. Each probe registered as
AddSingleton<IDriverProbe, X> in DriverFactoryBootstrap so they
flow through DI into AdminOperationsActor.

Historian.Wonderware returns a clean "TCP probe not applicable"
result because it communicates over a Windows named pipe, not TCP.
Also adds OpcUaClient + Historian.Wonderware.Client project
references to Host.csproj (both were missing from the driver
ItemGroup).
2026-05-28 10:53:42 -04:00
Joseph Doherty 5ffbc42d8c refactor(driver-galaxy): extract GalaxyDriverOptions to .Contracts
Move GalaxyDriverOptions (and nested records GalaxyGatewayOptions,
GalaxyMxAccessOptions, GalaxyRepositoryOptions, GalaxyReconnectOptions)
from Config/GalaxyDriverOptions.cs into a new Driver.Galaxy.Contracts
sibling project at the contracts root (no Config/ subdirectory). The
existing namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config is preserved
unchanged — it is a runtime ABI concern and all consumers already import
it via the namespace qualifier.

No doc-comment substitutions required — the only cref in the file
(<see cref="ApiKeySecretRef"/>) is an intra-type parameter reference
that resolves within the contracts project itself.

The options file had no using directives and no NuGet type surface;
the contracts project is dependency-free. The runtime Driver.Galaxy
project gains a ProjectReference to .Contracts; the .slnx is updated
accordingly.
2026-05-28 09:15:57 -04:00
Joseph Doherty 64e3fbe035 docs: backfill XML documentation across 756 files
v2-ci / build (push) Failing after 1m43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
2026-05-28 08:10:17 -04:00
Joseph Doherty 2b811477d1 chore(build): introduce central package management for v2
Adds Directory.Packages.props (ManagePackageVersionsCentrally) and
Directory.Build.props (net10.0/nullable/implicit usings/LangVersion latest).
Strips Version attributes from every csproj PackageReference and consolidates
versions into the central file.

Side fixes (necessary to keep the build green on .NET SDK 10.0.105 on macOS):

- Microsoft.CodeAnalysis.CSharp{,.Workspaces}: 5.3.0 -> 5.0.0. The 5.3.0
  analyzer DLL references compiler 5.3.0.0 and the local SDK ships compiler
  5.0.0.0, producing CS9057 on every project that loaded the Analyzers
  output. Master itself was broken on this machine pre-change.
- Server + Server.Tests pin OPCFoundation.NetStandard.Opc.Ua.{Configuration,
  Client} to 1.5.374.126 via VersionOverride, matching Opc.Ua.Server's
  pin. Mixing 1.5.378.106 Opc.Ua.Core transitively with 1.5.374.126
  Opc.Ua.Server breaks CustomNodeManager2 override signatures
  (CS0115 on LoadPredefinedNodes/Browse/HistoryRead*) and CS7069 in
  the tests. The pin disappears when the legacy Server project is
  deleted in Task 56.
- Client.UI + Client.UI.Tests: NuGetAuditSuppress for
  GHSA-xrw6-gwf8-vvr9 (Tmds.DBus.Protocol 0.20.0 reaches both projects
  transitively from Avalonia.Desktop on Linux/macOS only).

Deviation from the plan: TreatWarningsAsErrors=true is NOT set in
Directory.Build.props because the pre-v2 Admin/Server test projects carry
~240 xUnit1051 analyzer warnings that would fail the build. New v2 projects
opt in via their own csproj; the global flag can return once the legacy
projects are deleted in Task 56.
2026-05-26 03:40:24 -04:00
Joseph Doherty c2abbf45bd fix(driver-galaxy): align package versions + record vendored-DLL provenance
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).

Driver.Galaxy-016 (Medium, Perf/Resource):
  Reconciled the csproj PackageReferences with what the vendored
  MxGateway.Client.dll was actually built against, verified by
  reflecting Assembly.GetReferencedAssemblies() on the DLL:
    - Polly 8.5.2  →  Polly.Core 8.6.6
      (most consequential — Polly v7 fluent API vs Polly.Core v8
       resilience-pipeline API are DIFFERENT packages; the DLL was
       built against Polly.Core so the prior Polly reference would
       have failed at runtime with MissingMethodException the first
       time the gateway client's retry pipeline ran)
    - Grpc.Net.Client 2.71.0  →  2.76.0  (matches sibling Server/Worker)
    - Microsoft.Extensions.Logging.Abstractions 10.0.0  →  10.0.7
  Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
  left unchanged.

Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
  Original framing was a security concern about unknown-provenance
  binaries. User clarified the DLLs are their own code, built from
  their own mxaccessgw project, not third-party. Re-triaged to a
  documentation / audit-trail concern. Fix:
    - Added a Provenance section to libs/README.md recording the
      source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
      extracted from the AssemblyInformationalVersion baked into
      both DLLs by the original build) and SHA-256 checksums.
    - Documented the re-verification recipe (sha256sum + ilspycmd
      | grep AssemblyInformationalVersion).
  Recommendations about .gitattributes and CI hash-check deferred —
  the DLLs are frozen until an unwinding path is taken, so adding
  LFS or CI infrastructure now would need removal at unwinding.

Driver.Galaxy-018 (Low, Documentation):
  Most of the recommendation folded into the libs/README.md rewrite
  (pointed at sibling Server/Worker csproj as the live version source
  rather than the deleted MxGateway.Client.csproj; recorded source
  commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
  <Reference> items intentionally not added — MSBuild's default for
  HintPath references with bare-name Include attributes is already
  SpecificVersion=false, so explicitly setting it would be cosmetic
  without changing behaviour.

Driver.Galaxy-017 (Low, Design) — Deferred:
  Recommendation part (b) (record mxaccessgw source-commit SHA in
  libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
  Parts (a) and (c) — a GetVersion RPC at session-open and a parity
  test against the live gateway's proto descriptor — are substantial
  new RPC + plumbing work not in scope for this code-review sweep.
  The risk surface is bounded because either of the libs/README.md
  unwinding paths closes the vendoring + this concern naturally.
  Re-open if neither path is taken within the next quarter and the
  live gateway evolves its proto under the driver.

Verification:
  - Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
  - Driver.Galaxy.Tests: 245/245 pass against the corrected
    package set.
  - Solution-wide build remains clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:45:24 -04:00
Joseph Doherty 994997ba7b fix(driver-galaxy): vendor MxGateway.Client + MxGateway.Contracts as binary refs
The sibling mxaccessgw repo restructured: clients/dotnet/MxGateway.Client
no longer exists, and the proto contracts moved to a new namespace
(ZB.MOM.WW.MxGateway.Contracts.Proto, was MxGateway.Contracts.Proto). The
driver's source still expects the pre-restructure namespace, so the
broken ProjectReference produced 86 build errors in src/ + 1 in tests/
on master.

Resolution: vendor the last known-good build of MxGateway.Client.dll
(99 KB, May 22) and MxGateway.Contracts.dll (490 KB, May 23) under
src/Drivers/.../Driver.Galaxy/libs/, reference them via <Reference
HintPath=...> in both the driver and its test csproj, and declare the
NuGet packages the dropped ProjectReference was supplying transitively
(Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client,
Microsoft.Extensions.Logging.Abstractions, Polly) at versions matching
the sibling repo's ZB.MOM.WW.MxGateway.Contracts.csproj so binary
compatibility is preserved.

Why this over a source migration:
  Source migration would require namespace renames across ~19 driver
  files PLUS reimplementing MxGatewayClient / MxGatewaySession /
  GalaxyRepositoryClient (~2,200 LoC) — the sibling repo dropped the
  client library entirely, keeping only the proto contracts. Vendoring
  the last known-good binaries unblocks the build in minutes, freezes
  the gateway contract surface at a known-good version, and preserves
  the option to migrate properly once the sibling repo decides whether
  to restore a client library or hand the work back to us.

libs/README.md documents the unwinding plan (either path closes the
debt: sibling restores a client library, or driver migrates to the new
contracts namespace + reimplements the client wrapper).

Verification:
  - dotnet build ZB.MOM.WW.OtOpcUa.slnx: 0 errors (was 87).
  - Driver.Galaxy unit tests: 245/245 pass.
  - Integration tests not run here (require a live mxaccessgw gateway).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:32:56 -04:00
Joseph Doherty 9f7ae20995 fix(driver-galaxy): resolve Low code-review findings (Driver.Galaxy-005,010,012,013)
- Driver.Galaxy-005: rewrite the EventPump BoundedChannelOptions comment
  to honestly describe the Wait+TryWrite pattern.
- Driver.Galaxy-010: ResolveApiKey now warns when a literal API key is
  used in production wiring; added an explicit dev: prefix for known
  cleartext-in-dev cases and rewrote the GalaxyGatewayOptions doc.
- Driver.Galaxy-012: O(1) reverse-lookup for SubscriptionRegistry
  dispatch via per-entry FullRefByItemHandle map; immutable hash-set for
  the cross-binding reverse map; SubscribeAsync / ReadViaSubscribeOnce
  use BuildResultIndex for per-reference correlation.
- Driver.Galaxy-013: ReinitializeAsync now validates the incoming JSON
  against the running options; ReplayOnSessionLost honoured by the
  Replay path; class summary rewritten to describe the shipped surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:45:08 -04:00
Joseph Doherty ebfd5d7871 fix(driver-galaxy): fix XML doc comment cref in StatusCodeMap.ToQualityCategoryByte
StatusCode is not a .NET type reference in this assembly — replace the unresolvable
<see cref="StatusCode"/> with prose text so TreatWarningsAsErrors does not fail the
build on the CS1574 unresolved-cref warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:51:17 -04:00
Joseph Doherty ecc91b0e48 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-011)
GetMemoryFootprint() returned a constant 0 with a stale "PR 4.4 sets this" comment
even though PR 4.4 shipped the SubscriptionRegistry. Replace with a live estimate:
64 bytes × TrackedItemHandleCount + 256 bytes × TrackedSubscriptionCount. A 50k-tag
set now registers ~3 MB with the server's cache-flush heuristic instead of being
invisible. Returns 0 when no subscriptions are active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:48:37 -04:00
Joseph Doherty 0f3de4d510 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-009)
Fix two resource-management bugs in StartDeployWatcher / BuildDefaultHierarchySource:
(a) Replace the discarded `_ = StartAsync(...)` with an explicit task variable that
    surfaces any synchronous InvalidOperationException (called-twice guard) rather than
    silently swallowing it.
(b) Change both StartDeployWatcher and BuildDefaultHierarchySource to use ??= on
    _ownedRepositoryClient so the first client created (by whichever path runs first)
    is reused by the second path, preventing a second GalaxyRepositoryClient from being
    created and the first from leaking past the driver's lifetime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:47:52 -04:00
Joseph Doherty d572a011ef fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-007)
Implement IAsyncDisposable on GalaxyDriver so async sub-component disposals
(EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) are awaited rather
than blocked on GetAwaiter().GetResult(). DisposeAsync is now the primary path;
Dispose() delegates to it for using-statement compatibility. Each async component's
shutdown is awaited individually with a best-effort catch so a single slow shutdown
cannot prevent the rest of the cleanup sequence from running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:47:00 -04:00
Joseph Doherty d14564839e fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-006)
HashSet<T>.First() enumeration order is unspecified and unstable across mutations, so
the "owner" handle attached to alarm events was non-deterministic when multiple alarm
subscriptions were active. Change _alarmSubscriptions from HashSet to List (preserving
insertion order) and pick [0] — the earliest-registered handle — as the deterministic
owner. The server routes transitions by SourceNodeId, not by handle, so the choice of
handle does not affect delivery to active subscribers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:45:55 -04:00
Joseph Doherty 910a538b19 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-004)
Add StatusCodeMap.ToQualityCategoryByte(uint) so the StatusCode → quality-byte
mapping lives in one place next to its inverse (FromQualityByte). GalaxyDriver
OnPumpDataChange now delegates to the helper instead of duplicating the shift+switch
inline; a future edit to the OPC UA bit layout cannot silently desync the probe-health
decode. Unit tests in StatusCodeMapTests pin all three category buckets and the
round-trip invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:43:53 -04:00
Joseph Doherty 39a02f6794 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-003)
StatusCodeMap.FromMxStatus checked `success != 0` to determine success, but the
mxaccessgw proto contract explicitly documents that `success` is not a boolean and
that clients must branch on `category` (MX_STATUS_CATEGORY_OK), not on `success`
alone. Replace the raw field check with `status.IsSuccess()` from
MxStatusProxyExtensions, which requires both `success != 0` AND `category == Ok`.
A worker reporting success=1 with a non-OK category was previously misreported as
Good. Updated StatusCodeMapTests with a regression case covering the inverted scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:42:47 -04:00
Joseph Doherty 7f2e144f8d fix(driver-galaxy): resolve High code-review findings (Driver.Galaxy-002, Driver.Galaxy-008)
Driver.Galaxy-002 — DataTypeMap.Map had no Int64 arm though MxValueDecoder/
MxValueEncoder both fully support Int64. Galaxy attributes with the Int64
mx_data_type code fell through to the String default, creating a String
address-space node while runtime reads decoded a boxed long. Added
`6 => DriverDataType.Int64`, extending the contiguous 0..5 scheme so the type
map agrees with the decoder/encoder on all seven Galaxy data types.

Driver.Galaxy-008 — after a stream fault the EventPump's StreamEvents consumer
loop exited and its channel completed; EventPump.Start() is a no-op on a
completed-but-non-null loop, so a replayed subscription had no consumer and
ReplayAsync never re-registered the post-reconnect item handles. ReplayAsync
now recreates the EventPump (RestartEventPumpForReplay) and rebinds the
SubscriptionRegistry per subscription with the fresh item handles returned by
the post-reconnect SubscribeBulkAsync, via new SubscriptionRegistry.SnapshotEntries
and Rebind APIs.

Regression tests: DataTypeMapTests (every code incl. Int64), SubscriptionRegistry
Tests (Rebind/SnapshotEntries), EventPumpStreamFaultTests (faulted pump dead,
fresh pump resumes dispatch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:59:38 -04:00
Joseph Doherty 4df8737c86 fix(driver-galaxy): wire event-stream faults to the reconnect supervisor (Driver.Galaxy-001)
The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.

EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.

This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.

Resolves code-review finding Driver.Galaxy-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:33 -04:00
Joseph Doherty 27a8d05b7c feat(driver-galaxy): consume the gateway's session-less alarm model
The mxaccessgw updated alarms to a session-less central monitor:
AcknowledgeAlarm dropped SessionId and alarm transitions now come from
the session-less StreamAlarms feed instead of the per-session worker
StreamEvents stream. The GalaxyDriver no longer compiled against the
updated client.

- GatewayGalaxyAlarmAcknowledger: session-less rewrite — no GalaxyMxSession;
  outcome read from ProtocolStatus (throw) and Hresult (warn).
- New IGalaxyAlarmFeed seam + GatewayGalaxyAlarmFeed: background consumer
  of StreamAlarms that decodes the active-alarm snapshot plus live
  transitions into GalaxyAlarmTransition and reopens the stream on
  transport faults.
- EventPump: drop the dead per-session OnAlarmTransition path; the
  per-session stream no longer carries alarms.
- GalaxyDriver: bridge the feed onto IAlarmSource.OnAlarmEvent; the feed
  starts on SubscribeAlarmsAsync, independent of data subscriptions.
- Tests: replace EventPumpAlarmTests with GatewayGalaxyAlarmFeedTests;
  move the driver alarm-source tests onto the IGalaxyAlarmFeed seam.

Browse needed no change — GatewayGalaxyHierarchySource consumes the
unchanged DiscoverHierarchy contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 03:59:36 -04:00
Joseph Doherty a25593a9c6 chore: organize solution into module folders (Core/Server/Drivers/Client/Tooling)
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.

- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
  the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
  mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
  integration, install).

Build green (0 errors); unit tests pass. Docs left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:55:28 -04:00