Default Branch

bc282b6788 · Add Galaxy platform scope filter so multi-node deployments can restrict the OPC UA address space to only objects hosted by the local platform, reducing memory footprint and MXAccess subscription count from the full Galaxy (49 objects / 4206 attributes) down to the local subtree (3 objects / 386 attributes on the dev Galaxy). · Updated 2026-04-16 00:39:11 -04:00

Branches

70a5d06b37 · Phase 2 PR 9 — thread IsAlarm discovery flag end-to-end. GalaxyRepository.GetAttributesAsync has always emitted is_alarm alongside is_historized (CASE WHEN EXISTS with the primitive_definition join on primitive_name='AlarmExtension' per v1's Extended Attributes SQL lifted byte-for-byte into the PR 5 repository port), and GalaxyAttributeRow.IsAlarm has been populated since the port, but the flag was silently dropped at the MapAttribute helper in both MxAccessGalaxyBackend and DbBackedGalaxyBackend because GalaxyAttributeInfo on the IPC side had no field to carry it — every deployed alarm attribute arrived at the Proxy with no signal that it was alarm-bearing. This PR wires the flag through the three translation boundaries: GalaxyAttributeInfo gains [Key(6)] public bool IsAlarm { get; set; } at the end of the message to preserve wire-compat with pre-PR9 payloads that omit the key (MessagePack treats missing keys as default, so a newer Proxy talking to an older Host simply gets IsAlarm=false for every attribute); both backend MapAttribute helpers copy row.IsAlarm into the IPC shape; DriverAttributeInfo in Core.Abstractions gains a new IsAlarm parameter with default value false so the positional record signature change doesn't force every non-Galaxy driver call site to flow a flag they don't produce (the existing generic node-manager and future Modbus/etc. drivers keep compiling without modification); GalaxyProxyDriver.DiscoverAsync passes attr.IsAlarm through to the DriverAttributeInfo positional constructor. This is the discovery-side foundation — the generic node-manager can now enrich alarm-bearing variables with OPC UA AlarmConditionState during address-space build (the existing v1 LmxNodeManager pattern that subscribes to <tag>.InAlarm + .Priority + .DescAttrName + .Acked and merges them into a ConditionState) but this PR deliberately stops at discovery: the full alarm subsystem (subscription management for the 4 alarm-status attributes, state-machine tracking for Active/Unacknowledged/Confirmed/Inactive transitions, OPC UA Part 9 alarm event emission, and the write-to-AckMsg ack path) is a follow-up PR 10+ because it touches the node-manager's address-space build path — orthogonal to the IPC flow this PR covers. Tests — AlarmDiscoveryTests (new, 3 cases): GalaxyAttributeInfo_IsAlarm_round_trips_true_through_MessagePack serializes an IsAlarm=true instance and asserts the decoded flag is true + IsHistorized is true + AttributeName survives unchanged; GalaxyAttributeInfo_IsAlarm_round_trips_false_through_MessagePack covers the default path; Pre_PR9_payload_without_IsAlarm_key_deserializes_with_default_false is the wire-compat regression guard — serializes a stand-in PrePR9Shape class with only keys 0..5 (identical layout to the pre-PR9 GalaxyAttributeInfo) and asserts the newer GalaxyAttributeInfo deserializer produces IsAlarm=false without throwing, so a rolling upgrade where the Proxy ships first can talk to an old Host during the window before the Host upgrades without a MessagePack "missing key" exception. Full solution build: 0 errors, 38 warnings (existing). Galaxy.Host.Tests Unit suite: 27 pass / 0 fail (3 new alarm-discovery + 9 PR5 historian + 15 pre-existing). This PR branches off phase-2-pr5-historian because GalaxyProxyDriver's constructor signature + GalaxyHierarchyRow's IsAlarm init-only property are both ancestor state that the simpler branch bases (phase-2-pr4-findings, master) don't yet include. · Updated 2026-04-18 06:28:01 -04:00    dohertj2

0
28
#8 Merged

30ece6e22c · Phase 2 PR 8 — wire gateway-level host-status push from MxAccessGalaxyBackend. PR 4 built the IPC infrastructure for OnHostStatusChanged (MessageKind.RuntimeStatusChange frame + ConnectionSink forwarding through FrameWriter) but no backend actually raised the event; the #pragma warning disable CS0067 around MxAccessGalaxyBackend.OnHostStatusChanged declared the event for interface symmetry while acknowledging the wire-up was Phase 2 follow-up. This PR closes the gateway-level signal: MxAccessClient.ConnectionStateChanged (already raised on false→true Register and true→false Unregister transitions, including the reconnect path in MonitorLoopAsync) now drives OnHostStatusChanged with a synthetic HostConnectivityStatus tagged HostName=MxAccessClient.ClientName, RuntimeStatus="Running" on reconnect + "Stopped" on disconnect, LastObservedUtcUnixMs set to the transition moment. The Admin UI's existing IHostConnectivityProbe subscriber on GalaxyProxyDriver (HostStatusChangedEventArgs) already handles the full translation — OnHostConnectivityUpdate parses "Running"/"Stopped"/"Faulted" into the Core.Abstractions HostState enum and fires OnHostStatusChanged downstream, so this single backend-side event wire-up produces an end-to-end signal with no further Proxy changes required. Per-platform and per-AppEngine ScanState probing (the 472 LOC GalaxyRuntimeProbeManager state machine in v1 that advises <Host>.ScanState on every deployed $WinPlatform + $AppEngine gobject, tracks Unknown → Running → Stopped transitions, handles the on-change-only delivery quirk of ScanState, and surfaces IsHostStopped(gobjectId) for the node manager's Read path to short-circuit on-demand reads against known-stopped runtimes) remains deferred to a follow-up PR — the gateway-level signal gives operators the top-level transport-health rung of the status ladder, which is what matters when the Galaxy COM proxy itself goes down (vs a specific platform going down). MxAccessClient.ClientName property exposes the previously-private _clientName field so the backend can tag its pushes with a stable gateway identity — operators configure this via OTOPCUA_GALAXY_CLIENT_NAME env var (default "OtOpcUa-Galaxy.Host" per Program.cs). MxAccessGalaxyBackend constructor subscribes the new _onConnectionStateChanged field before returning + Dispose unsubscribes it via _mx.ConnectionStateChanged -= _onConnectionStateChanged to prevent the backend's own dispose from leaving a dangling handler on the MxAccessClient (same shape as MxAccessClient.SubscriptionReplayFailed PR 6 dispose discipline). #pragma warning disable CS0067 removed from around OnHostStatusChanged since the event is now raised; the directive is narrowed to cover only OnAlarmEvent which stays unraised pending the alarm subsystem port (PR 9 candidate). Tests — HostStatusPushTests (new, 2 cases): ConnectionStateChanged_raises_OnHostStatusChanged_with_gateway_name fires mx.ConnectAsync → mx.DisconnectAsync and asserts two notifications in order with HostName="GatewayClient" (the clientName passed to MxAccessClient ctor), RuntimeStatus="Running" then "Stopped", LastObservedUtcUnixMs > 0; Dispose_unsubscribes_so_post_dispose_state_changes_do_not_fire_events asserts that after backend.Dispose() a subsequent mx.DisconnectAsync does not bump the count on a registered OnHostStatusChanged handler — guards against the subscription-leak regression where a lingering backend instance would accumulate cross-reconnect notifications for a dead writer. Host.Tests csproj gains a Reference to lib/ArchestrA.MxAccess.dll (identical to the reference PR 6 adds — conflict-free cherry-pick/merge since both PRs stage the same <Reference> node; git will collapse to one when either lands first). Full Galaxy.Host.Tests Unit suite: 26 pass / 0 fail (2 new host-status + 9 PR5 historian + 15 pre-existing PostMortemMmf/RecyclePolicy/StaPump/MemoryWatchdog/EndToEndIpc/Handshake). Galaxy.Host builds clean (0 errors, 0 warnings). Branch base — PR 8 is on phase-2-pr5-historian rather than phase-2-pr4-findings because the constructor path on MxAccessGalaxyBackend gained a new historian parameter in PR 5 and the Dispose implementation needs to coordinate the two unsubscribes; targeting the earlier base would leave a trivial conflict on Dispose. · Updated 2026-04-18 06:03:16 -04:00    dohertj2

0
28
#7 Merged

3717405aa6 · Phase 2 PR 7 — wire IHistoryProvider.ReadProcessedAsync end-to-end. PR 5 ported HistorianDataSource.ReadAggregateAsync into Galaxy.Host but left it internal — GalaxyProxyDriver.ReadProcessedAsync still threw NotSupportedException, so OPC UA clients issuing HistoryReadProcessed requests against the v2 topology got rejected at the driver boundary. This PR closes that gap by adding two new Shared.Contracts messages (HistoryReadProcessedRequest/Response, MessageKind 0x62/0x63), routing them through GalaxyFrameHandler, implementing HistoryReadProcessedAsync on all three IGalaxyBackend implementations (Stub/DbBacked return the canonical "pending" Success=false, MxAccessGalaxyBackend delegates to _historian.ReadAggregateAsync), mapping HistorianAggregateSample → GalaxyDataValue at the IPC boundary (null bucket Value → BadNoData 0x800E0000u, otherwise Good 0u), and flipping GalaxyProxyDriver.ReadProcessedAsync from the NotSupported throw to a real IPC call with OPC UA HistoryAggregateType enum mapped to Wonderware AnalogSummary column name on the Proxy side (Average → "Average", Minimum → "Minimum", Maximum → "Maximum", Count → "ValueCount", Total → NotSupported since there's no direct SDK column for sum). Decision #13 IPC data-shape stays intact — HistoryReadProcessedResponse carries GalaxyDataValue[] with the same MessagePack value + OPC UA StatusCode + timestamps shape as the other history responses, so the Proxy's existing ToSnapshot helper handles the conversion without a new code path. MxAccessGalaxyBackend.HistoryReadProcessedAsync guards: null historian → "Historian disabled" (symmetric with HistoryReadAsync); IntervalMs <= 0 → "HistoryReadProcessed requires IntervalMs > 0" (prevents division-by-zero inside the SDK's Resolution parameter); exception during SDK call → Success=false Values=[] with the message so the Proxy surfaces it as InvalidOperationException with a clean error chain. Tests — HistoryReadProcessedTests (new, 4 cases): disabled-error when historian null, rejects zero interval, maps Good sample with Value=12.34 and the Proxy-supplied AggregateColumn + IntervalMs flow unchanged through to the fake IHistorianDataSource, maps null Value bucket to 0x800E0000u BadNoData with null ValueBytes. AggregateColumnMappingTests (new, 5 cases in Proxy.Tests): theory covers all 4 supported HistoryAggregateType enum values → correct column string, and asserts Total throws NotSupportedException with a message that steers callers to Average/Minimum/Maximum/Count (the SDK's AnalogSummaryQueryResult doesn't expose a sum column — the closest is Average × ValueCount which is the responsibility of a caller-side aggregation rather than an extra IPC round-trip). InternalsVisibleTo added to Galaxy.Proxy csproj so Proxy.Tests can reach the internal MapAggregateToColumn static. Builds — Galaxy.Host (net48 x86) + Galaxy.Proxy (net10) both 0 errors, full solution 201 warnings (pre-existing) / 0 errors. Test counts — Host.Tests Unit suite: 28 pass (4 new processed + 9 PR5 historian + 15 pre-existing); Proxy.Tests Unit suite: 14 pass (5 new column-mapping + 9 pre-existing). Deferred to a later PR — ReadAtTime + ReadEvents + Health IPC surfaces (HistorianDataSource has them ported in PR 5 but they need additional contract messages and would push this PR past a comfortable review size); the alarm subsystem wire-up (OnAlarmEvent raising from MxAccessGalaxyBackend) which overlaps the ReadEventsAsync IPC work since both pull from HistorianAccess.CreateEventQuery on the SDK side; the Proxy-side quality-byte refinement where HistorianDataSource's per-sample raw quality byte gets decoded through the existing QualityMapper instead of the category-only mapping in ToWire(HistorianSample) — doesn't change correctness today since Good/Uncertain/Bad categories are all the Admin UI and OPC UA clients surface, but richer OPC DA status codes (BadNotConnected, UncertainSubNormal, etc.) are available on the wire and the Proxy could promote them before handing DataValueSnapshot to ISubscribable consumers. This PR branches off phase-2-pr5-historian because it directly extends the Historian IPC surface added there; if PR 5 merges first PR 7 fast-forwards, otherwise it needs a rebase after PR 5 lands. · Updated 2026-04-18 05:53:01 -04:00    dohertj2

0
28
#6 Merged

1c2bf74d38 · Phase 2 PR 6 — close the 2 low findings carried forward from PR 4. Low finding #1 ($Heartbeat probe handle leak in MonitorLoopAsync): the probe calls _proxy.AddItem(connectionHandle, "$Heartbeat") on every monitor tick that observes the connection is past StaleThreshold, but previously discarded the returned item handle — so every probe (one per MonitorInterval, default 5s) leaked one item handle into the MXAccess proxy's internal handle table. Fix: capture the item handle, call RemoveItem(connectionHandle, probeHandle) in the InvokeAsync's finally block so it runs on the same pump turn as the AddItem, best-effort RemoveItem swallow so a dying proxy doesn't throw secondary exceptions out of the probe path. Probe ok becomes probeHandle > 0 so any AddItem that returns 0 (MXAccess's "could not create") counts as a failed probe, matching v1 behavior. Low finding #2 (subscription replay silently swallowed per-tag failures): after a reconnect, the replay loop iterates the pre-reconnect subscription snapshot and calls SubscribeOnPumpAsync for each; previously those failures went into a bare catch { /* skip */ } so an operator had no signal when specific tags failed to re-subscribe — the first indication downstream was a quality drop on OPC UA clients. Fix: new SubscriptionReplayFailedEventArgs (TagReference + Exception) + SubscriptionReplayFailed event on MxAccessClient that fires once per tag that fails to re-subscribe, Log.Warning per failure with the reconnect counter + tag reference, and a summary log line at the end of the replay loop ("{failed} of {total} failed" or "{total} re-subscribed cleanly"). Serilog using + ILogger Log = Serilog.Log.ForContext<MxAccessClient>() added. Tests — MxAccessClientMonitorLoopTests (new file, 2 cases): Heartbeat_probe_calls_RemoveItem_for_every_AddItem constructs a CountingProxy IMxProxy that tracks AddItem/RemoveItem pair counts scoped to the "$Heartbeat" address, runs the client with MonitorInterval=150ms + StaleThreshold=50ms for 700ms, asserts HeartbeatAddCount > 1, HeartbeatAddCount == HeartbeatRemoveCount, OutstandingHeartbeatHandles == 0 after dispose; SubscriptionReplayFailed_fires_for_each_tag_that_fails_to_replay uses a ReplayFailingProxy that throws on the next $Heartbeat probe (to trigger the reconnect path) and throws on the replay-time AddItem for specified tag names ("BadTag.A", "BadTag.B"), subscribes GoodTag.X + BadTag.A + BadTag.B before triggering probe failure, collects SubscriptionReplayFailed args into a ConcurrentBag, asserts exactly 2 events fired and both bad tags are represented — GoodTag.X replays cleanly so it does not fire. Host.Tests csproj gains a Reference to lib/ArchestrA.MxAccess.dll because IMxProxy's MxDataChangeHandler delegate signature mentions MXSTATUS_PROXY and the compiler resolves all delegate parameter types when a test class implements the interface, even if the test code never names the type. No regressions: full Galaxy.Host.Tests Unit suite 26 pass / 0 fail (2 new monitor-loop tests + 9 PR5 historian + 15 pre-existing PostMortemMmf/RecyclePolicy/StaPump/MemoryWatchdog/EndToEndIpc/Handshake). Galaxy.Host builds clean (0 errors, 0 warnings) — the new Serilog.Log.ForContext usage picks up the existing Serilog package ref that PR 4 pulled in for the monitor-loop infrastructure. Both findings were flagged as non-blocking for PR 4 merge and are now resolved alongside whichever merge order the reviewer picks; this PR branches off phase-2-pr4-findings so it can rebase cleanly if PR 4 lands first or be re-based onto master after PR 4 merges. · Updated 2026-04-18 02:06:15 -04:00    dohertj2

0
28
#5 Merged

6df1a79d35 · Phase 2 PR 5 — port Wonderware Historian SDK into Driver.Galaxy.Host/Backend/Historian/. The full v1 Historian.Aveva code path (HistorianDataSource + HistorianClusterEndpointPicker + IHistorianConnectionFactory + SdkHistorianConnectionFactory) now lives inside Galaxy.Host instead of the previously-required out-of-tree plugin + HistorianPluginLoader AssemblyResolve hack, and MxAccessGalaxyBackend.HistoryReadAsync — which previously returned a Phase 2 Task B.1.h follow-up placeholder — now delegates to the ported HistorianDataSource.ReadRawAsync, maps HistorianSample to GalaxyDataValue via the IPC wire shape, and reports Success=true with per-tag HistoryTagValues arrays. OPC-UA-free surface inside Galaxy.Host: the v1 code returned Opc.Ua.DataValue on the hot path, which would have required dragging OPCFoundation.NetStandard.Opc.Ua.Server into net48 x86 Galaxy.Host and bleeding OPC types across the IPC boundary — instead, the port introduces HistorianSample (Value, Quality byte, TimestampUtc) + HistorianAggregateSample (Value, TimestampUtc) POCOs that carry the raw MX quality byte through the pipe unchanged, and the OPC translation happens on the Proxy side via the existing QualityMapper that the live-read path already uses. Decision #13's IPC data-shape contract survives intact — GalaxyDataValue (TagReference + ValueBytes MessagePack + ValueMessagePackType + StatusCode + SourceTimestampUtcUnixMs + ServerTimestampUtcUnixMs) — so no Shared.Contracts wire break vs PR 4. Cluster failover preserved verbatim: HistorianClusterEndpointPicker is the thread-safe pure-logic picker ported verbatim with no SDK dependency (injected DateTime clock, per-node cooldown state, unknown-node-name tolerance, case-insensitive de-dup on configuration-order list), ConnectToAnyHealthyNode iterates the picker's healthy candidates, clones config per attempt, calls the factory, marks healthy on success / failed on exception with the failure message stored for dashboard surfacing, throws "All N healthy historian candidate(s) failed" with the last exception chained when every node exhausts. Process path + Event path use separate HistorianAccess connections (CreateHistoryQuery vs CreateEventQuery vs CreateAnalogSummaryQuery on the SDK surface) guarded by independent _connection/_eventConnection locks — a mid-query failure on one silo resets only that connection, the other stays open. Four SDK paths ported: ReadRawAsync (RetrievalMode.Full, BatchSize from config.MaxValuesPerRead, MoveNext pump, per-sample quality + value decode with the StringValue/Value fallback the v1 code did, limit-based early exit), ReadAggregateAsync (AnalogSummaryQuery + Resolution in ms, ExtractAggregateValue maps Average/Minimum/Maximum/ValueCount/First/Last/StdDev column names — the NodeId to column mapping is moved to the Proxy side since the IPC request carries a string column), ReadAtTimeAsync (per-timestamp HistoryQuery with RetrievalMode.Interpolated + BatchSize=1, returns Quality=0 / Value=null for missing samples), ReadEventsAsync (EventQuery + AddEventFilter("Source",Equal,sourceName) when sourceName is non-null, EventOrder.Ascending, EventCount = maxEvents or config.MaxValuesPerRead); GetHealthSnapshot returns the full runtime-health snapshot (TotalQueries/Successes/Failures + ConsecutiveFailures + LastSuccess/FailureTime + LastError + ProcessConnectionOpen/EventConnectionOpen + ActiveProcessNode/ActiveEventNode + per-node state list). ReadRaw is the only path wired through IPC in PR 5 (HistoryReadRequest/HistoryTagValues/HistoryReadResponse already existed in Shared.Contracts); Aggregate/AtTime/Events/Health are ported-but-not-yet-IPC-exposed — they stay internal to Galaxy.Host for PR 6+ to surface via new contract message kinds (aggregate = OPC UA HistoryReadProcessed, at-time = HistoryReadAtTime, events = HistoryReadEvents, health = admin dashboard IPC query). Galaxy.Host csproj gains aahClientManaged + aahClientCommon references with Private=false (managed wrappers) + None items for aahClient.dll + Historian.CBE.dll + Historian.DPAPI.dll + ArchestrA.CloudHistorian.Contract.dll native satellites staged alongside the host exe via CopyToOutputDirectory=PreserveNewest so aahClientManaged can P/Invoke into them at runtime without an AssemblyResolve hook (cleaner than the v1 HistorianPluginLoader.cs 180-LOC AssemblyResolve + Assembly.LoadFrom dance that existed solely because the plugin was loaded late from Host/bin/Debug/net48/Historian/). Program.cs adds BuildHistorianIfEnabled() that reads OTOPCUA_HISTORIAN_ENABLED (true or 1) + OTOPCUA_HISTORIAN_SERVER + OTOPCUA_HISTORIAN_SERVERS (comma-separated cluster list overrides single-server) + OTOPCUA_HISTORIAN_PORT (default 32568) + OTOPCUA_HISTORIAN_INTEGRATED (default true) + OTOPCUA_HISTORIAN_USER/OTOPCUA_HISTORIAN_PASS + OTOPCUA_HISTORIAN_TIMEOUT_SEC (30) + OTOPCUA_HISTORIAN_MAX_VALUES (10000) + OTOPCUA_HISTORIAN_COOLDOWN_SEC (60), returns null when disabled so MxAccessGalaxyBackend.HistoryReadAsync surfaces a clean "Historian disabled" Success=false instead of a localhost-SDK hang; server.RunAsync finally block now also casts backend to IDisposable.Dispose() so the historian SDK connections get cleanly closed on Ctrl+C. MxAccessGalaxyBackend gains an IHistorianDataSource? historian constructor parameter (defaults null to preserve existing Host.Tests call sites that don't exercise HistoryRead), implements IDisposable that forwards to _historian.Dispose(), and the pragma warning disable CS0618 is locally scoped to the ToDto(HistorianEvent) mapper since the SDK marks Id/Source/DisplayText/Severity obsolete but the replacement surface isn't available in the aahClientManaged version we bind against — every other deprecated-SDK use still surfaces as an error under TreatWarningsAsErrors. Ported from v1 Historian.Aveva unchanged: the CloneConfigWithServerName helper that preserves every config field except ServerName per attempt; the double-checked locking in EnsureConnected/EnsureEventConnected (fast path = Volatile.Read outside lock, slow path acquires lock + re-checks + disposes any raced-in-parallel connection); HandleConnectionError/HandleEventConnectionError that close the dead connection, clear the active-node tracker, MarkFailed the picker entry with the exception message so the node enters cooldown, and log the reset with node= for operator correlation; RecordSuccess/RecordFailure that bump counters under _healthLock. Tests: HistorianClusterEndpointPickerTests (7 cases) — single-node ServerName fallback when ServerNames empty, MarkFailed enters cooldown and skips, cooldown expires after window, MarkHealthy immediately clears, all-in-cooldown returns empty healthy list, Snapshot reports failure count + last error + IsHealthy, case-insensitive de-dup on duplicate hostnames. HistorianWiringTests (2 cases) — HistoryReadAsync returns "Historian disabled" Success=false when historian:null passed; HistoryReadAsync with a fake IHistorianDataSource maps the returned HistorianSample (Value=42.5, Quality=192 Good, Timestamp) to a GalaxyDataValue with StatusCode=0u + SourceTimestampUtcUnixMs matching the sample + MessagePack-encoded value bytes. InternalsVisibleTo("...Host.Tests") added to Galaxy.Host.csproj so tests can reach the internal HistorianClusterEndpointPicker. Full Galaxy.Host.Tests suite: 24 pass / 0 fail (9 new historian + 15 pre-existing MemoryWatchdog/PostMortemMmf/RecyclePolicy/StaPump/EndToEndIpc/Handshake). Full solution build: 0 errors (202 pre-existing warnings). The v1 Historian.Aveva project + Historian.Aveva.Tests still build intact because the archive PR (Stream D.1 destructive delete) is still ahead of us — PR 5 intentionally does not delete either; once PR 2+3 merge and the archive-delete PR lands, a follow-up cleanup can remove Historian.Aveva + its 4 source files + 18 test cases. Alarm subsystem wire-up (OnAlarmEvent raising from MxAccessGalaxyBackend via AlarmExtension primitives) + host-status push (OnHostStatusChanged via a ported GalaxyRuntimeProbeManager) remain PR 6 candidates; they were on the same "Task B.1.h follow-up" list and share the IPC connection-sink wiring with the historian events path — it made PR 5 scope-manageable to do Historian first since that's what has the biggest surface area (981 LOC v1 plus SDK binding) and alarms/host-status have more bespoke integration with the existing MxAccess subscription fan-out. · Updated 2026-04-18 01:44:04 -04:00    dohertj2

0
27
#4 Closed

caa9cb86f6 · Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from exit-gate-phase-2-final.md. High 1 (ReadAsync subscription-leak on cancel): the one-shot read now wraps subscribe→first-OnDataChange→unsubscribe in try/finally so the per-tag callback is always detached, and if the read installed the underlying MXAccess subscription itself (the prior _addressToHandle key was absent) it tears it down on the way out — no leaked probe item handles when the caller cancels or times out. High 2 (no reconnect loop): MxAccessClient gets a MxAccessClientOptions {AutoReconnect, MonitorInterval=5s, StaleThreshold=60s} + a background MonitorLoopAsync started at first ConnectAsync. The loop wakes every MonitorInterval, checks _lastObservedActivityUtc (bumped by every OnDataChange callback), and if stale probes the proxy with a no-op COM AddItem("$Heartbeat") on the StaPump; if the probe throws or returns false, the loop reconnects-with-replay — Unregister (best-effort), Register, snapshot _addressToHandle.Keys + clear, re-AddItem every previously-active subscription, ConnectionStateChanged events fire for the false→true transition, ReconnectCount bumps. Medium 3 (subscriptions don't push frames back to Proxy): IGalaxyBackend gains OnDataChange/OnAlarmEvent/OnHostStatusChanged events; new IFrameHandler.AttachConnection(FrameWriter) is called per-connection by PipeServer after Hello + the returned IDisposable disposes at connection close; GalaxyFrameHandler.ConnectionSink subscribes the events for the connection lifetime, fire-and-forget pushes them as MessageKind.OnDataChangeNotification / AlarmEvent / RuntimeStatusChange frames through the writer, swallows ObjectDisposedException for the dispose race, and unsubscribes in Dispose to prevent leaked invocation list refs across reconnects. MxAccessGalaxyBackend's existing SubscribeAsync (which previously discarded values via a (_, __) => {} callback) now wires OnTagValueChanged that fans out per-tag value changes to every subscription ID listening (one MXAccess subscription, multi-fan-out — _refToSubs reverse map). UnsubscribeAsync also reverse-walks the map to only call mx.UnsubscribeAsync when the LAST sub for a tag drops. Stub + DbBacked backends declare the events with #pragma warning disable CS0067 because they never raise them but must satisfy the interface (treat-warnings-as-errors would otherwise fail). Medium 4 (WriteValuesAsync doesn't await OnWriteComplete): MxAccessClient.WriteAsync rewritten to return Task<bool> via the v1-style TaskCompletionSource-keyed-by-item-handle pattern in _pendingWrites — adds the TCS before the Write call, awaits it with a configurable timeout (default 5s), removes the TCS in finally, returns true only when OnWriteComplete reported success. MxAccessGalaxyBackend.WriteValuesAsync now reports per-tag Bad_InternalError ("MXAccess runtime reported write failure") when the bool returns false, instead of false-positive Good. PipeServer's IFrameHandler interface adds the AttachConnection(FrameWriter):IDisposable method + a public NoopAttachment nested class (net48 doesn't support default interface methods so the empty-attach is exposed for stub implementations). StubFrameHandler returns IFrameHandler.NoopAttachment.Instance. RunOneConnectionAsync calls AttachConnection after HelloAck and usings the returned disposable so it disposes at the connection scope's finally. ConnectionStateChanged event added on MxAccessClient (caller-facing diagnostics for false→true reconnect transitions). docs/v2/implementation/pr-4-body.md is the Gitea web-UI paste-in for opening PR 4 once pushed; includes 2 new low-priority adversarial findings (probe item-handle leak; replay-loop silently swallows per-subscription failures) flagged as follow-ups not PR 4 blockers. Full solution 460 pass / 7 skip (E2E on admin shell) / 1 pre-existing Phase 0 baseline. No regressions vs PR 2's baseline. · Updated 2026-04-18 01:12:09 -04:00    dohertj2

0
26
#3 Merged

a3d16a28f1 · Phase 2 Stream D Option B — archive v1 surface + new Driver.Galaxy.E2E parity suite. Non-destructive intermediate state: the v1 OtOpcUa.Host + Historian.Aveva + Tests + IntegrationTests projects all still build (494 v1 unit + 6 v1 integration tests still pass when run explicitly), but solution-level dotnet test ZB.MOM.WW.OtOpcUa.slnx now skips them via IsTestProject=false on the test projects + archive-status PropertyGroup comments on the src projects. The destructive deletion is reserved for Phase 2 PR 3 with explicit operator review per CLAUDE.md "only use destructive operations when truly the best approach". tests/ZB.MOM.WW.OtOpcUa.Tests/ renamed via git mv to tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive/; csproj <AssemblyName> kept as the original ZB.MOM.WW.OtOpcUa.Tests so v1 OtOpcUa.Host's [InternalsVisibleTo("ZB.MOM.WW.OtOpcUa.Tests")] still matches and the project rebuilds clean. tests/ZB.MOM.WW.OtOpcUa.IntegrationTests gets <IsTestProject>false</IsTestProject>. src/ZB.MOM.WW.OtOpcUa.Host + src/ZB.MOM.WW.OtOpcUa.Historian.Aveva get PropertyGroup archive-status comments documenting they're functionally superseded but kept in-build because cascading dependencies (Historian.Aveva → Host; IntegrationTests → Host) make a single-PR deletion high blast-radius. New tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ project (.NET 10) with ParityFixture that spawns OtOpcUa.Driver.Galaxy.Host.exe (net48 x86) as a Process.Start subprocess with OTOPCUA_GALAXY_BACKEND=db env vars, awaits 2s for the PipeServer to bind, then exposes a connected GalaxyProxyDriver; skips on non-Windows / Administrator shells (PipeAcl denies admins per decision #76) / ZB unreachable / Host EXE not built — each skip carries a SkipReason string the test method reads via Assert.Skip(SkipReason). RecordingAddressSpaceBuilder captures every Folder/Variable/AddProperty registration so parity tests can assert on the same shape v1 LmxNodeManager produced. HierarchyParityTests (3) — Discover returns gobjects with attributes; attribute full references match the tag.attribute Galaxy reference grammar; HistoryExtension flag flows through correctly. StabilityFindingsRegressionTests (4) — one test per 2026-04-13 stability finding from commits c76ab8f and 7310925: phantom probe subscription doesn't corrupt unrelated host status; HostStatusChangedEventArgs structurally carries a specific HostName + OldState + NewState (event signature mathematically prevents the v1 cross-host quality-clear bug); all GalaxyProxyDriver capability methods return Task or Task<T> (sync-over-async would deadlock OPC UA stack thread); AcknowledgeAsync completes before returning (no fire-and-forget background work that could race shutdown). Solution test count: 470 pass / 7 skip (E2E on admin shell) / 1 pre-existing Phase 0 baseline. Run archived suites explicitly: dotnet test tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive (494 pass) + dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests (6 pass). docs/v2/V1_ARCHIVE_STATUS.md inventories every archived surface with run-it-explicitly instructions + a 10-step deletion plan for PR 3 + rollback procedure (git revert restores all four projects). docs/v2/implementation/exit-gate-phase-2-final.md supersedes the two partial-exit docs with the per-stream status table (A/B/C/D/E all addressed, D split across PR 2/3 per safety protocol), the test count breakdown, fresh adversarial review of PR 2 deltas (4 new findings: medium IsTestProject=false safety net loss, medium structural-vs-behavioral stability tests, low backend=db default, low Process.Start env inheritance), the 8 carried-forward findings from exit-gate-phase-2.md, the recommended PR order (1 → 2 → 3 → 4). docs/v2/implementation/pr-2-body.md is the Gitea web-UI paste-in for opening PR 2 once pushed. · Updated 2026-04-18 00:56:21 -04:00    dohertj2

0
25
#2 Closed

50f81a156d · Doc — PR 1 body for Gitea web UI paste-in. PR title + summary + test matrix + reviewer test plan + follow-up tracking. Source phase-1-configuration → target v2; URL https://gitea.dohertylan.com/dohertj2/lmxopcua/pulls/new/phase-1-configuration. No gh/tea CLI on this box, so the body is staged here for the operator to paste into the Gitea web UI rather than auto-created via API. · Updated 2026-04-18 00:46:23 -04:00    dohertj2

0
24
#1 Closed

3b2defd94f · Phase 0 — mechanical rename ZB.MOM.WW.LmxOpcUa.* → ZB.MOM.WW.OtOpcUa.* · Updated 2026-04-17 13:57:47 -04:00    dohertj2

0
11