Files
lmxopcua/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests
Joseph Doherty c14624f012 Phase 2 PR 14 — alarm subsystem wire-up. Per IsAlarm=true attribute (PR 9 added the discovery flag), GalaxyAlarmTracker in Backend/Alarms/ advises the four Galaxy alarm-state attributes: .InAlarm (boolean alarm active), .Priority (int severity), .DescAttrName (human-readable description), .Acked (boolean acknowledged). Runs the OPC UA Part 9 alarm lifecycle state machine simplified for the Galaxy AlarmExtension model and raises AlarmTransition events on transitions operators must react to — Active (InAlarm false→true, default Unacknowledged), Acknowledged (Acked false→true while InAlarm still true), Inactive (InAlarm true→false). MxAccessGalaxyBackend instantiates the tracker in its constructor with delegate-based subscribe/unsubscribe/write pointers to MxAccessClient, hooks TransitionRaised to forward each transition through the existing OnAlarmEvent IPC event that PR 4 ConnectionSink wires into MessageKind.AlarmEvent frames — no new contract messages required since GalaxyAlarmEvent already exists in Shared.Contracts. Field mapping: EventId = fresh Guid.ToString('N') per transition, ObjectTagName = alarm attribute full reference, AlarmName = alarm attribute full reference, Severity = tracked Priority, StateTransition = 'Active'|'Acknowledged'|'Inactive', Message = DescAttrName or tag fallback, UtcUnixMs = transition time. DiscoverAsync caches every IsAlarm=true attribute's full reference (tag.attribute) into _discoveredAlarmTags (ConcurrentBag cleared-then-filled on every re-Discover to track Galaxy redeploys). SubscribeAlarmsAsync iterates the cache and advises each via GalaxyAlarmTracker.TrackAsync; best-effort per-alarm — a subscribe failure on one alarm doesn't abort the whole call since operators prefer partial alarm coverage to none. Tracker is internally idempotent on repeat Track calls (second invocation for same alarm tag is a no-op; already-subscribed check short-circuits before the 4 MXAccess sub calls). Subscribe-failure rollback inside TrackAsync removes the alarm state + unadvises any of the 4 that did succeed so a partial advise can't leak a phantom tracking entry. AcknowledgeAlarmAsync routes to tracker.AcknowledgeAsync which writes the operator comment to <tag>.AckMsg via MxAccessClient.WriteAsync — writes use the existing MXAccess OnWriteComplete TCS-by-handle path (PR 4 Medium 4) so a runtime-refused ack bubbles up as Success=false rather than false-positive. State-machine quirks preserved from v1: (1) initial Acked=true on subscribe does NOT fire Acknowledged (alarm at rest, pre-acknowledged — default state is Acked=true so the first subscribe callback is a no-op transition), (2) Acked false→true only fires Acknowledged when InAlarm is currently true (acking a latched-inactive alarm is not a user-visible transition), (3) Active transition clears the Acked flag in-state so the next Acked callback correctly fires Acknowledged (v1 had this buried in the ConditionState logic; we track it on the AlarmState struct directly). Priority value handled as int/short/long via type pattern match with int.MaxValue guard — Galaxy attribute category returns varying CLR types (Int32 is canonical but some older templates use Int16), and a long overflow cast to int would silently corrupt the severity. Dispose cascade in MxAccessGalaxyBackend.Dispose: alarm-tracker unsubscribe→dispose, probe-manager unsubscribe→dispose, mx.ConnectionStateChanged detach, historian dispose — same discipline PR 6 / PR 8 / PR 13 established so dangling invocation-list refs don't survive a backend recycle. #pragma warning disable CS0067 around OnAlarmEvent removed since the event is now raised. Tests (9 new, GalaxyAlarmTrackerTests): four-attribute subscribe per alarm, idempotent repeat-track, InAlarm false→true fires Active with Priority + Desc, InAlarm true→false fires Inactive, Acked false→true while InAlarm fires Acknowledged, Acked transition while InAlarm=false does not fire, AckMsg write path carries the comment, snapshot reports latest four fields, foreign probe callback for a non-tracked tag is silently dropped. Full Galaxy.Host.Tests Unit suite 84 pass / 0 fail (9 new alarm + 12 PR 13 probe + 21 PR 12 quality + 42 pre-existing). Galaxy.Host builds clean (0/0). Branches off phase-2-pr13-runtime-probe so the MxAccessGalaxyBackend constructor/Dispose chain gets the probe-manager + alarm-tracker wire-up in a coherent order; fast-forwards if PR 13 merges first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 07:34:13 -04:00
..
Phase 2 PR 9 — thread IsAlarm discovery flag end-to-end. GalaxyRepository.GetAttributesAsync has always emitted is_alarm alongside is_historized (CASE WHEN EXISTS with the primitive_definition join on primitive_name='AlarmExtension' per v1's Extended Attributes SQL lifted byte-for-byte into the PR 5 repository port), and GalaxyAttributeRow.IsAlarm has been populated since the port, but the flag was silently dropped at the MapAttribute helper in both MxAccessGalaxyBackend and DbBackedGalaxyBackend because GalaxyAttributeInfo on the IPC side had no field to carry it — every deployed alarm attribute arrived at the Proxy with no signal that it was alarm-bearing. This PR wires the flag through the three translation boundaries: GalaxyAttributeInfo gains [Key(6)] public bool IsAlarm { get; set; } at the end of the message to preserve wire-compat with pre-PR9 payloads that omit the key (MessagePack treats missing keys as default, so a newer Proxy talking to an older Host simply gets IsAlarm=false for every attribute); both backend MapAttribute helpers copy row.IsAlarm into the IPC shape; DriverAttributeInfo in Core.Abstractions gains a new IsAlarm parameter with default value false so the positional record signature change doesn't force every non-Galaxy driver call site to flow a flag they don't produce (the existing generic node-manager and future Modbus/etc. drivers keep compiling without modification); GalaxyProxyDriver.DiscoverAsync passes attr.IsAlarm through to the DriverAttributeInfo positional constructor. This is the discovery-side foundation — the generic node-manager can now enrich alarm-bearing variables with OPC UA AlarmConditionState during address-space build (the existing v1 LmxNodeManager pattern that subscribes to <tag>.InAlarm + .Priority + .DescAttrName + .Acked and merges them into a ConditionState) but this PR deliberately stops at discovery: the full alarm subsystem (subscription management for the 4 alarm-status attributes, state-machine tracking for Active/Unacknowledged/Confirmed/Inactive transitions, OPC UA Part 9 alarm event emission, and the write-to-AckMsg ack path) is a follow-up PR 10+ because it touches the node-manager's address-space build path — orthogonal to the IPC flow this PR covers. Tests — AlarmDiscoveryTests (new, 3 cases): GalaxyAttributeInfo_IsAlarm_round_trips_true_through_MessagePack serializes an IsAlarm=true instance and asserts the decoded flag is true + IsHistorized is true + AttributeName survives unchanged; GalaxyAttributeInfo_IsAlarm_round_trips_false_through_MessagePack covers the default path; Pre_PR9_payload_without_IsAlarm_key_deserializes_with_default_false is the wire-compat regression guard — serializes a stand-in PrePR9Shape class with only keys 0..5 (identical layout to the pre-PR9 GalaxyAttributeInfo) and asserts the newer GalaxyAttributeInfo deserializer produces IsAlarm=false without throwing, so a rolling upgrade where the Proxy ships first can talk to an old Host during the window before the Host upgrades without a MessagePack "missing key" exception. Full solution build: 0 errors, 38 warnings (existing). Galaxy.Host.Tests Unit suite: 27 pass / 0 fail (3 new alarm-discovery + 9 PR5 historian + 15 pre-existing). This PR branches off phase-2-pr5-historian because GalaxyProxyDriver's constructor signature + GalaxyHierarchyRow's IsAlarm init-only property are both ancestor state that the simpler branch bases (phase-2-pr4-findings, master) don't yet include.
2026-04-18 06:28:01 -04:00
Phase 2 — port MXAccess COM client to Galaxy.Host + MxAccessGalaxyBackend (3rd IGalaxyBackend) + live MXAccess smoke + Phase 2 exit-gate doc + adversarial review. The full Galaxy data-plane now flows through the v2 IPC topology end-to-end against live ArchestrA.MxAccess.dll, on this dev box, with 30/30 Host tests + 9/9 Proxy tests + 963/963 solution tests passing alongside the unchanged 494 v1 IntegrationTests baseline. Backend/MxAccess/Vtq is a focused port of v1's Vtq value-timestamp-quality DTO. Backend/MxAccess/IMxProxy abstracts LMXProxyServer (port of v1's IMxProxy with the same Register/Unregister/AddItem/RemoveItem/AdviseSupervisory/UnAdviseSupervisory/Write surface + OnDataChange + OnWriteComplete events); MxProxyAdapter is the concrete COM-backed implementation that does Marshal.ReleaseComObject-loop on Unregister, must be constructed on an STA thread. Backend/MxAccess/MxAccessClient is the focused port of v1's MxAccessClient partials — Connect/Disconnect/Read/Write/Subscribe/Unsubscribe through the new Sta/StaPump (the real Win32 GetMessage pump from the previous commit), ConcurrentDictionary handle tracking, OnDataChange event marshalling to per-tag callbacks, ReadAsync implemented as the canonical subscribe → first-OnDataChange → unsubscribe one-shot pattern. Galaxy.Host csproj flipped to x86 PlatformTarget + Prefer32Bit=true with the ArchestrA.MxAccess HintPath ..\..\lib\ArchestrA.MxAccess.dll reference (lib/ already contains the production DLL). Backend/MxAccessGalaxyBackend is the third IGalaxyBackend implementation (alongside StubGalaxyBackend and DbBackedGalaxyBackend): combines GalaxyRepository (Discover) with MxAccessClient (Read/Write/Subscribe), MessagePack-deserializes inbound write values, MessagePack-serializes outbound read values into ValueBytes, decodes ArrayDimension/SecurityClassification/category_id with the same v1 mapping. Program.cs selects between stub|db|mxaccess via OTOPCUA_GALAXY_BACKEND env var (default = mxaccess); OTOPCUA_GALAXY_ZB_CONN overrides the ZB connection string; OTOPCUA_GALAXY_CLIENT_NAME sets the Wonderware client identity; the StaPump and MxAccessClient lifecycles are tied to the server.RunAsync try/finally so a clean Ctrl+C tears down the COM proxy via Marshal.ReleaseComObject before the pump's WM_QUIT. Live MXAccess smoke tests (MxAccessLiveSmokeTests, net48 x86) — skipped when ZB unreachable or aaBootstrap not running, otherwise verify (1) MxAccessClient.ConnectAsync returns a positive LMXProxyServer handle on the StaPump, (2) MxAccessGalaxyBackend.OpenSession + Discover returns at least one gobject with attributes, (3) MxAccessGalaxyBackend.ReadValues against the first discovered attribute returns a response with the correct TagReference shape (value + quality vary by what's running, so we don't assert specific values). All 3 pass on this dev box. EndToEndIpcTests + IpcHandshakeIntegrationTests moved from Galaxy.Proxy.Tests (net10) to Galaxy.Host.Tests (net48 x86) — the previous test placement silently dropped them at xUnit discovery because Host became net48 x86 and net10 process can't load it. Rewritten to use Shared's FrameReader/FrameWriter directly instead of going through Proxy's GalaxyIpcClient (functionally equivalent — same wire protocol, framing primitives + dispatcher are the production code path verbatim). 7 IPC tests now run cleanly: Hello+heartbeat round-trip, wrong-secret rejection, OpenSession session-id assignment, Discover error-response surfacing, WriteValues per-tag bad status, Subscribe id assignment, Recycle grace window. Phase 2 exit-gate doc (docs/v2/implementation/exit-gate-phase-2.md) supersedes the partial-exit doc with the as-built state — Streams A/B/C complete; D/E gated only on the legacy-Host removal + parity-test rewrite cycle that fundamentally requires multi-day debug iteration; full adversarial-review section ranking 8 findings (2 high, 3 medium, 3 low) all explicitly deferred to Stream D/E or v2.1 with rationale; Stream-D removal checklist gives the next-session entry point with two policy options for the 494 v1 tests (rewrite-to-use-Proxy vs archive-and-write-smaller-v2-parity-suite). Cannot one-shot Stream D.1 in any single session because deleting OtOpcUa.Host requires the v1 IntegrationTests cycle to be retargeted first; that's the structural blocker, not "needs more code" — and the plan itself budgets 3-4 weeks for it.
2026-04-18 00:23:24 -04:00
Phase 2 PR 14 — alarm subsystem wire-up. Per IsAlarm=true attribute (PR 9 added the discovery flag), GalaxyAlarmTracker in Backend/Alarms/ advises the four Galaxy alarm-state attributes: .InAlarm (boolean alarm active), .Priority (int severity), .DescAttrName (human-readable description), .Acked (boolean acknowledged). Runs the OPC UA Part 9 alarm lifecycle state machine simplified for the Galaxy AlarmExtension model and raises AlarmTransition events on transitions operators must react to — Active (InAlarm false→true, default Unacknowledged), Acknowledged (Acked false→true while InAlarm still true), Inactive (InAlarm true→false). MxAccessGalaxyBackend instantiates the tracker in its constructor with delegate-based subscribe/unsubscribe/write pointers to MxAccessClient, hooks TransitionRaised to forward each transition through the existing OnAlarmEvent IPC event that PR 4 ConnectionSink wires into MessageKind.AlarmEvent frames — no new contract messages required since GalaxyAlarmEvent already exists in Shared.Contracts. Field mapping: EventId = fresh Guid.ToString('N') per transition, ObjectTagName = alarm attribute full reference, AlarmName = alarm attribute full reference, Severity = tracked Priority, StateTransition = 'Active'|'Acknowledged'|'Inactive', Message = DescAttrName or tag fallback, UtcUnixMs = transition time. DiscoverAsync caches every IsAlarm=true attribute's full reference (tag.attribute) into _discoveredAlarmTags (ConcurrentBag cleared-then-filled on every re-Discover to track Galaxy redeploys). SubscribeAlarmsAsync iterates the cache and advises each via GalaxyAlarmTracker.TrackAsync; best-effort per-alarm — a subscribe failure on one alarm doesn't abort the whole call since operators prefer partial alarm coverage to none. Tracker is internally idempotent on repeat Track calls (second invocation for same alarm tag is a no-op; already-subscribed check short-circuits before the 4 MXAccess sub calls). Subscribe-failure rollback inside TrackAsync removes the alarm state + unadvises any of the 4 that did succeed so a partial advise can't leak a phantom tracking entry. AcknowledgeAlarmAsync routes to tracker.AcknowledgeAsync which writes the operator comment to <tag>.AckMsg via MxAccessClient.WriteAsync — writes use the existing MXAccess OnWriteComplete TCS-by-handle path (PR 4 Medium 4) so a runtime-refused ack bubbles up as Success=false rather than false-positive. State-machine quirks preserved from v1: (1) initial Acked=true on subscribe does NOT fire Acknowledged (alarm at rest, pre-acknowledged — default state is Acked=true so the first subscribe callback is a no-op transition), (2) Acked false→true only fires Acknowledged when InAlarm is currently true (acking a latched-inactive alarm is not a user-visible transition), (3) Active transition clears the Acked flag in-state so the next Acked callback correctly fires Acknowledged (v1 had this buried in the ConditionState logic; we track it on the AlarmState struct directly). Priority value handled as int/short/long via type pattern match with int.MaxValue guard — Galaxy attribute category returns varying CLR types (Int32 is canonical but some older templates use Int16), and a long overflow cast to int would silently corrupt the severity. Dispose cascade in MxAccessGalaxyBackend.Dispose: alarm-tracker unsubscribe→dispose, probe-manager unsubscribe→dispose, mx.ConnectionStateChanged detach, historian dispose — same discipline PR 6 / PR 8 / PR 13 established so dangling invocation-list refs don't survive a backend recycle. #pragma warning disable CS0067 around OnAlarmEvent removed since the event is now raised. Tests (9 new, GalaxyAlarmTrackerTests): four-attribute subscribe per alarm, idempotent repeat-track, InAlarm false→true fires Active with Priority + Desc, InAlarm true→false fires Inactive, Acked false→true while InAlarm fires Acknowledged, Acked transition while InAlarm=false does not fire, AckMsg write path carries the comment, snapshot reports latest four fields, foreign probe callback for a non-tracked tag is silently dropped. Full Galaxy.Host.Tests Unit suite 84 pass / 0 fail (9 new alarm + 12 PR 13 probe + 21 PR 12 quality + 42 pre-existing). Galaxy.Host builds clean (0/0). Branches off phase-2-pr13-runtime-probe so the MxAccessGalaxyBackend constructor/Dispose chain gets the probe-manager + alarm-tracker wire-up in a coherent order; fast-forwards if PR 13 merges first.
2026-04-18 07:34:13 -04:00
Phase 2 — port GalaxyRepository to Galaxy.Host + DbBackedGalaxyBackend, smoke-tested against live ZB. Real Galaxy gobject hierarchy + dynamic attributes now flow through the IPC contract end-to-end without any MXAccess code involvement, so the OPC UA address-space build (Stream C.4 acceptance) becomes parity-testable today even before the COM client port lands. Backend/Galaxy/GalaxyRepository.cs is a byte-for-byte port of v1 GalaxyRepositoryService's HierarchySql + AttributesSql (the two SQL bodies, both ~50 lines of recursive CTE template-chain + deployed_package_chain logic, are identical to v1 so the row set is verifiably the same — extended-attributes + scope-filter queries from v1 are intentionally not ported yet, they're refinements not on the Phase 2 critical path); plus TestConnectionAsync (SELECT 1) and GetLastDeployTimeAsync (SELECT time_of_last_deploy FROM galaxy) for the ChangeDetection deploy-watermark path. Backend/Galaxy/GalaxyRepositoryOptions defaults to localhost ZB Integrated Security; runtime override comes from DriverConfig.Database section per plan.md §"Galaxy DriverConfig". Backend/Galaxy/GalaxyHierarchyRow + GalaxyAttributeRow are the row-shape DTOs (no required modifier — net48 lacks RequiredMemberAttribute and we'd need a polyfill shim like the existing IsExternalInit one; default-string init is simpler). System.Data.SqlClient 4.9.0 added (the same package the v1 Host uses; net48-compatible). Backend/DbBackedGalaxyBackend wraps the repository: DiscoverAsync builds a real DiscoverHierarchyResponse (groups attributes by gobject, resolves parent-by-tagname, maps category_id → human-readable template-category name mirroring v1 AlarmObjectFilter); ReadValuesAsync/WriteValuesAsync/HistoryReadAsync still surface "MXAccess code lift pending (Phase 2 Task B.1)" because runtime data values genuinely need the COM client; OpenSession/CloseSession/Subscribe/Unsubscribe/AlarmSubscribe/AlarmAck/Recycle return success without backend work (subscription ID is a synthetic counter for now). Live smoke tests (GalaxyRepositoryLiveSmokeTests) skip when localhost ZB is unreachable; when present they verify (1) TestConnection returns true, (2) GetHierarchy returns at least one deployed gobject with a non-empty TagName, (3) GetAttributes returns rows with FullTagReference matching the "tag.attribute" shape, (4) GetLastDeployTime returns a value, (5) DbBackedBackend.DiscoverAsync returns at least one gobject with attributes and a populated TemplateCategory. All 5 pass against the local Galaxy. Full solution 957 pass / 1 pre-existing Phase 0 baseline; the 494 v1 IntegrationTests + 6 v1 IntegrationTests-net48 tests still pass — legacy OtOpcUa.Host untouched. Remaining for the Phase 2 exit gate is the MXAccess COM client port itself (the v1 MxAccessClient partials + IMxProxy abstraction + StaPump-based Connect/Subscribe/Read/Write semantics) — Discover is now solved in DB-backed form, so the lift can focus exclusively on the runtime data-plane.
2026-04-17 23:14:09 -04:00
Phase 2 PR 13 — port GalaxyRuntimeProbeManager state machine + wire per-platform ScanState probing. PR 8 gave operators the gateway-level transport signal (MxAccessClient.ConnectionStateChanged → OnHostStatusChanged tagged with the Wonderware client identity) — enough to detect when the entire MXAccess COM proxy dies, but silent when a specific or goes down while the gateway stays alive. This PR closes that gap: a pure-logic GalaxyRuntimeProbeManager (ported from v1's 472-LOC GalaxyRuntimeProbeManager.cs, distilled to ~240 LOC of state machine without the OPC UA node-manager entanglement) lives in Backend/Stability/, advises <TagName>.ScanState per WinPlatform + AppEngine gobject discovered during DiscoverAsync, runs the Unknown → Running → Stopped state machine with v1's documented semantics preserved verbatim, and raises a StateChanged event on transitions operators should react to. MxAccessGalaxyBackend instantiates the probe manager in the constructor with SubscribeAsync/UnsubscribeAsync delegate pointers into MxAccessClient, hooks StateChanged to forward each transition through the same OnHostStatusChanged IPC event the gateway signal uses (HostName = platform/engine TagName, RuntimeStatus = 'Running'|'Stopped'|'Unknown', LastObservedUtcUnixMs from the state-change timestamp), so Admin UI gets per-host signals flowing through the existing PR 8 wire with no additional IPC plumbing. State machine rules ported from v1 runtimestatus.md: (a) ScanState is on-change-only — a stably-Running host may go hours without a callback, so Running → Stopped is driven only by explicit ScanState=false, never by starvation; (b) Unknown → Running is a startup transition and does NOT fire StateChanged (would paint every host as 'just recovered' at startup, which is noise and can clear Bad quality set by a concurrently-stopping sibling); (c) Stopped → Running fires StateChanged for the real recovery case; (d) Running → Stopped fires StateChanged; (e) Unknown → Stopped fires StateChanged because that's the first-known-bad signal operators need when a host is down at our startup time. MxAccessGalaxyBackend.DiscoverAsync calls _probeManager.SyncAsync with the runtime-host subset of the hierarchy (CategoryId == 1 WinPlatform or 3 AppEngine) as a best-effort step after building the Discover response — probe failures are swallowed so Discover still returns the hierarchy even if a per-host advise fails; the gateway signal covers the critical rung. SyncAsync is idempotent (second call with the same set is a no-op) and handles the diff on re-Discover for tag rename / host add / host remove. Subscribe failure rolls back the host's state entry under the lock so a later probe callback for a never-advised tag can't transition a phantom state from Unknown to Stopped and fan out a false host-down signal (the same protection v1's GalaxyRuntimeProbeManager had at line 237-243 of v1 with a captured-probe-string comparison under the lock). MxAccessGalaxyBackend.Dispose unsubscribes the StateChanged handler before disposing the probe manager to prevent dangling invocation-list references across reconnects, same discipline as PR 8's ConnectionStateChanged and PR 6's SubscriptionReplayFailed. Tests (12 new GalaxyRuntimeProbeManagerTests): Sync_subscribes_to_ScanState_per_host verifies tag.ScanState subscriptions are advised per Platform/Engine; Sync_is_idempotent_on_repeat_call_with_same_set verifies no duplicate subscribes; Sync_unadvises_removed_hosts verifies the diff unadvises gone hosts; Subscribe_failure_rolls_back_host_entry_so_later_transitions_do_not_fire_stale_events covers the rollback-on-subscribe-fail guard; Unknown_to_Running_does_not_fire_StateChanged preserves the startup-noise rule; Running_to_Stopped_fires_StateChanged_with_both_states asserts OldState and NewState are both captured in the transition record; Stopped_to_Running_fires_StateChanged_for_recovery verifies the recovery case; Unknown_to_Stopped_fires_StateChanged_for_first_known_bad_signal preserves the first-bad rule; Repeated_Good_Running_callbacks_do_not_fire_duplicate_events verifies the state-tracking de-dup; Unknown_callback_for_non_tracked_probe_is_dropped asserts a foreign callback is silently ignored; Snapshot_reports_current_state_for_every_tracked_host covers the dashboard query hook; IsRuntimeHost_recognizes_WinPlatform_and_AppEngine_category_ids asserts the CategoryId filter. Galaxy.Host.Tests Unit suite 75 pass / 0 fail (12 new probe + 63 pre-existing). Galaxy.Host builds clean (0 errors / 0 warnings). Branches off v2.
2026-04-18 07:27:56 -04:00
Phase 2 PR 5 — port Wonderware Historian SDK into Driver.Galaxy.Host/Backend/Historian/. The full v1 Historian.Aveva code path (HistorianDataSource + HistorianClusterEndpointPicker + IHistorianConnectionFactory + SdkHistorianConnectionFactory) now lives inside Galaxy.Host instead of the previously-required out-of-tree plugin + HistorianPluginLoader AssemblyResolve hack, and MxAccessGalaxyBackend.HistoryReadAsync — which previously returned a Phase 2 Task B.1.h follow-up placeholder — now delegates to the ported HistorianDataSource.ReadRawAsync, maps HistorianSample to GalaxyDataValue via the IPC wire shape, and reports Success=true with per-tag HistoryTagValues arrays. OPC-UA-free surface inside Galaxy.Host: the v1 code returned Opc.Ua.DataValue on the hot path, which would have required dragging OPCFoundation.NetStandard.Opc.Ua.Server into net48 x86 Galaxy.Host and bleeding OPC types across the IPC boundary — instead, the port introduces HistorianSample (Value, Quality byte, TimestampUtc) + HistorianAggregateSample (Value, TimestampUtc) POCOs that carry the raw MX quality byte through the pipe unchanged, and the OPC translation happens on the Proxy side via the existing QualityMapper that the live-read path already uses. Decision #13's IPC data-shape contract survives intact — GalaxyDataValue (TagReference + ValueBytes MessagePack + ValueMessagePackType + StatusCode + SourceTimestampUtcUnixMs + ServerTimestampUtcUnixMs) — so no Shared.Contracts wire break vs PR 4. Cluster failover preserved verbatim: HistorianClusterEndpointPicker is the thread-safe pure-logic picker ported verbatim with no SDK dependency (injected DateTime clock, per-node cooldown state, unknown-node-name tolerance, case-insensitive de-dup on configuration-order list), ConnectToAnyHealthyNode iterates the picker's healthy candidates, clones config per attempt, calls the factory, marks healthy on success / failed on exception with the failure message stored for dashboard surfacing, throws "All N healthy historian candidate(s) failed" with the last exception chained when every node exhausts. Process path + Event path use separate HistorianAccess connections (CreateHistoryQuery vs CreateEventQuery vs CreateAnalogSummaryQuery on the SDK surface) guarded by independent _connection/_eventConnection locks — a mid-query failure on one silo resets only that connection, the other stays open. Four SDK paths ported: ReadRawAsync (RetrievalMode.Full, BatchSize from config.MaxValuesPerRead, MoveNext pump, per-sample quality + value decode with the StringValue/Value fallback the v1 code did, limit-based early exit), ReadAggregateAsync (AnalogSummaryQuery + Resolution in ms, ExtractAggregateValue maps Average/Minimum/Maximum/ValueCount/First/Last/StdDev column names — the NodeId to column mapping is moved to the Proxy side since the IPC request carries a string column), ReadAtTimeAsync (per-timestamp HistoryQuery with RetrievalMode.Interpolated + BatchSize=1, returns Quality=0 / Value=null for missing samples), ReadEventsAsync (EventQuery + AddEventFilter("Source",Equal,sourceName) when sourceName is non-null, EventOrder.Ascending, EventCount = maxEvents or config.MaxValuesPerRead); GetHealthSnapshot returns the full runtime-health snapshot (TotalQueries/Successes/Failures + ConsecutiveFailures + LastSuccess/FailureTime + LastError + ProcessConnectionOpen/EventConnectionOpen + ActiveProcessNode/ActiveEventNode + per-node state list). ReadRaw is the only path wired through IPC in PR 5 (HistoryReadRequest/HistoryTagValues/HistoryReadResponse already existed in Shared.Contracts); Aggregate/AtTime/Events/Health are ported-but-not-yet-IPC-exposed — they stay internal to Galaxy.Host for PR 6+ to surface via new contract message kinds (aggregate = OPC UA HistoryReadProcessed, at-time = HistoryReadAtTime, events = HistoryReadEvents, health = admin dashboard IPC query). Galaxy.Host csproj gains aahClientManaged + aahClientCommon references with Private=false (managed wrappers) + None items for aahClient.dll + Historian.CBE.dll + Historian.DPAPI.dll + ArchestrA.CloudHistorian.Contract.dll native satellites staged alongside the host exe via CopyToOutputDirectory=PreserveNewest so aahClientManaged can P/Invoke into them at runtime without an AssemblyResolve hook (cleaner than the v1 HistorianPluginLoader.cs 180-LOC AssemblyResolve + Assembly.LoadFrom dance that existed solely because the plugin was loaded late from Host/bin/Debug/net48/Historian/). Program.cs adds BuildHistorianIfEnabled() that reads OTOPCUA_HISTORIAN_ENABLED (true or 1) + OTOPCUA_HISTORIAN_SERVER + OTOPCUA_HISTORIAN_SERVERS (comma-separated cluster list overrides single-server) + OTOPCUA_HISTORIAN_PORT (default 32568) + OTOPCUA_HISTORIAN_INTEGRATED (default true) + OTOPCUA_HISTORIAN_USER/OTOPCUA_HISTORIAN_PASS + OTOPCUA_HISTORIAN_TIMEOUT_SEC (30) + OTOPCUA_HISTORIAN_MAX_VALUES (10000) + OTOPCUA_HISTORIAN_COOLDOWN_SEC (60), returns null when disabled so MxAccessGalaxyBackend.HistoryReadAsync surfaces a clean "Historian disabled" Success=false instead of a localhost-SDK hang; server.RunAsync finally block now also casts backend to IDisposable.Dispose() so the historian SDK connections get cleanly closed on Ctrl+C. MxAccessGalaxyBackend gains an IHistorianDataSource? historian constructor parameter (defaults null to preserve existing Host.Tests call sites that don't exercise HistoryRead), implements IDisposable that forwards to _historian.Dispose(), and the pragma warning disable CS0618 is locally scoped to the ToDto(HistorianEvent) mapper since the SDK marks Id/Source/DisplayText/Severity obsolete but the replacement surface isn't available in the aahClientManaged version we bind against — every other deprecated-SDK use still surfaces as an error under TreatWarningsAsErrors. Ported from v1 Historian.Aveva unchanged: the CloneConfigWithServerName helper that preserves every config field except ServerName per attempt; the double-checked locking in EnsureConnected/EnsureEventConnected (fast path = Volatile.Read outside lock, slow path acquires lock + re-checks + disposes any raced-in-parallel connection); HandleConnectionError/HandleEventConnectionError that close the dead connection, clear the active-node tracker, MarkFailed the picker entry with the exception message so the node enters cooldown, and log the reset with node= for operator correlation; RecordSuccess/RecordFailure that bump counters under _healthLock. Tests: HistorianClusterEndpointPickerTests (7 cases) — single-node ServerName fallback when ServerNames empty, MarkFailed enters cooldown and skips, cooldown expires after window, MarkHealthy immediately clears, all-in-cooldown returns empty healthy list, Snapshot reports failure count + last error + IsHealthy, case-insensitive de-dup on duplicate hostnames. HistorianWiringTests (2 cases) — HistoryReadAsync returns "Historian disabled" Success=false when historian:null passed; HistoryReadAsync with a fake IHistorianDataSource maps the returned HistorianSample (Value=42.5, Quality=192 Good, Timestamp) to a GalaxyDataValue with StatusCode=0u + SourceTimestampUtcUnixMs matching the sample + MessagePack-encoded value bytes. InternalsVisibleTo("...Host.Tests") added to Galaxy.Host.csproj so tests can reach the internal HistorianClusterEndpointPicker. Full Galaxy.Host.Tests suite: 24 pass / 0 fail (9 new historian + 15 pre-existing MemoryWatchdog/PostMortemMmf/RecyclePolicy/StaPump/EndToEndIpc/Handshake). Full solution build: 0 errors (202 pre-existing warnings). The v1 Historian.Aveva project + Historian.Aveva.Tests still build intact because the archive PR (Stream D.1 destructive delete) is still ahead of us — PR 5 intentionally does not delete either; once PR 2+3 merge and the archive-delete PR lands, a follow-up cleanup can remove Historian.Aveva + its 4 source files + 18 test cases. Alarm subsystem wire-up (OnAlarmEvent raising from MxAccessGalaxyBackend via AlarmExtension primitives) + host-status push (OnHostStatusChanged via a ported GalaxyRuntimeProbeManager) remain PR 6 candidates; they were on the same "Task B.1.h follow-up" list and share the IPC connection-sink wiring with the historian events path — it made PR 5 scope-manageable to do Historian first since that's what has the biggest surface area (981 LOC v1 plus SDK binding) and alarms/host-status have more bespoke integration with the existing MxAccess subscription fan-out.
2026-04-18 01:44:04 -04:00
Phase 2 PR 12 — richer historian quality mapping. Replace MxAccessGalaxyBackend's inline MapHistorianQualityToOpcUa category-only helper (192+→Good, 64-191→Uncertain, 0-63→Bad) with a new public HistorianQualityMapper.Map utility that preserves specific OPC DA subcodes — BadNotConnected(8)→0x808A0000u instead of generic Bad(0x80000000u), UncertainSubNormal(88)→0x40950000u instead of generic Uncertain, Good_LocalOverride(216)→0x00D80000u instead of generic Good, etc. Mirrors v1 QualityMapper.MapToOpcUaStatusCode byte-for-byte without pulling in OPC UA types — the function returns uint32 literals that are the canonical OPC UA StatusCode wire encoding, surfaced directly as DataValueSnapshot.StatusCode on the Proxy side with no additional translation. Unknown subcodes fall back to the family category (255→Good, 150→Uncertain, 50→Bad) so a future SDK change that adds a quality code we don't map yet still gets a sensible bucket. GalaxyDataValue wire shape unchanged (StatusCode stays uint) — this is a pure fidelity upgrade on the Host side. Downstream callers (Admin UI status dashboard, OPC UA clients receiving historian samples) can now distinguish e.g. a transport outage (BadNotConnected) from a sensor fault (BadSensorFailure) from a warm-up delay (BadWaitingForInitialData) without a second round-trip or dashboard heuristic. 21 new tests (HistorianQualityMapperTests): theory with 15 rows covering every specific mapping from the v1 QualityMapper table, plus 6 fallback tests verifying unknown-subcode codes in each family (Good/Uncertain/Bad) collapse to the family default. Galaxy.Host.Tests Unit suite 56/0 (21 new + 35 existing). Galaxy.Host builds clean (0/0). Branches off v2.
2026-04-18 07:11:02 -04:00
Phase 2 PR 5 — port Wonderware Historian SDK into Driver.Galaxy.Host/Backend/Historian/. The full v1 Historian.Aveva code path (HistorianDataSource + HistorianClusterEndpointPicker + IHistorianConnectionFactory + SdkHistorianConnectionFactory) now lives inside Galaxy.Host instead of the previously-required out-of-tree plugin + HistorianPluginLoader AssemblyResolve hack, and MxAccessGalaxyBackend.HistoryReadAsync — which previously returned a Phase 2 Task B.1.h follow-up placeholder — now delegates to the ported HistorianDataSource.ReadRawAsync, maps HistorianSample to GalaxyDataValue via the IPC wire shape, and reports Success=true with per-tag HistoryTagValues arrays. OPC-UA-free surface inside Galaxy.Host: the v1 code returned Opc.Ua.DataValue on the hot path, which would have required dragging OPCFoundation.NetStandard.Opc.Ua.Server into net48 x86 Galaxy.Host and bleeding OPC types across the IPC boundary — instead, the port introduces HistorianSample (Value, Quality byte, TimestampUtc) + HistorianAggregateSample (Value, TimestampUtc) POCOs that carry the raw MX quality byte through the pipe unchanged, and the OPC translation happens on the Proxy side via the existing QualityMapper that the live-read path already uses. Decision #13's IPC data-shape contract survives intact — GalaxyDataValue (TagReference + ValueBytes MessagePack + ValueMessagePackType + StatusCode + SourceTimestampUtcUnixMs + ServerTimestampUtcUnixMs) — so no Shared.Contracts wire break vs PR 4. Cluster failover preserved verbatim: HistorianClusterEndpointPicker is the thread-safe pure-logic picker ported verbatim with no SDK dependency (injected DateTime clock, per-node cooldown state, unknown-node-name tolerance, case-insensitive de-dup on configuration-order list), ConnectToAnyHealthyNode iterates the picker's healthy candidates, clones config per attempt, calls the factory, marks healthy on success / failed on exception with the failure message stored for dashboard surfacing, throws "All N healthy historian candidate(s) failed" with the last exception chained when every node exhausts. Process path + Event path use separate HistorianAccess connections (CreateHistoryQuery vs CreateEventQuery vs CreateAnalogSummaryQuery on the SDK surface) guarded by independent _connection/_eventConnection locks — a mid-query failure on one silo resets only that connection, the other stays open. Four SDK paths ported: ReadRawAsync (RetrievalMode.Full, BatchSize from config.MaxValuesPerRead, MoveNext pump, per-sample quality + value decode with the StringValue/Value fallback the v1 code did, limit-based early exit), ReadAggregateAsync (AnalogSummaryQuery + Resolution in ms, ExtractAggregateValue maps Average/Minimum/Maximum/ValueCount/First/Last/StdDev column names — the NodeId to column mapping is moved to the Proxy side since the IPC request carries a string column), ReadAtTimeAsync (per-timestamp HistoryQuery with RetrievalMode.Interpolated + BatchSize=1, returns Quality=0 / Value=null for missing samples), ReadEventsAsync (EventQuery + AddEventFilter("Source",Equal,sourceName) when sourceName is non-null, EventOrder.Ascending, EventCount = maxEvents or config.MaxValuesPerRead); GetHealthSnapshot returns the full runtime-health snapshot (TotalQueries/Successes/Failures + ConsecutiveFailures + LastSuccess/FailureTime + LastError + ProcessConnectionOpen/EventConnectionOpen + ActiveProcessNode/ActiveEventNode + per-node state list). ReadRaw is the only path wired through IPC in PR 5 (HistoryReadRequest/HistoryTagValues/HistoryReadResponse already existed in Shared.Contracts); Aggregate/AtTime/Events/Health are ported-but-not-yet-IPC-exposed — they stay internal to Galaxy.Host for PR 6+ to surface via new contract message kinds (aggregate = OPC UA HistoryReadProcessed, at-time = HistoryReadAtTime, events = HistoryReadEvents, health = admin dashboard IPC query). Galaxy.Host csproj gains aahClientManaged + aahClientCommon references with Private=false (managed wrappers) + None items for aahClient.dll + Historian.CBE.dll + Historian.DPAPI.dll + ArchestrA.CloudHistorian.Contract.dll native satellites staged alongside the host exe via CopyToOutputDirectory=PreserveNewest so aahClientManaged can P/Invoke into them at runtime without an AssemblyResolve hook (cleaner than the v1 HistorianPluginLoader.cs 180-LOC AssemblyResolve + Assembly.LoadFrom dance that existed solely because the plugin was loaded late from Host/bin/Debug/net48/Historian/). Program.cs adds BuildHistorianIfEnabled() that reads OTOPCUA_HISTORIAN_ENABLED (true or 1) + OTOPCUA_HISTORIAN_SERVER + OTOPCUA_HISTORIAN_SERVERS (comma-separated cluster list overrides single-server) + OTOPCUA_HISTORIAN_PORT (default 32568) + OTOPCUA_HISTORIAN_INTEGRATED (default true) + OTOPCUA_HISTORIAN_USER/OTOPCUA_HISTORIAN_PASS + OTOPCUA_HISTORIAN_TIMEOUT_SEC (30) + OTOPCUA_HISTORIAN_MAX_VALUES (10000) + OTOPCUA_HISTORIAN_COOLDOWN_SEC (60), returns null when disabled so MxAccessGalaxyBackend.HistoryReadAsync surfaces a clean "Historian disabled" Success=false instead of a localhost-SDK hang; server.RunAsync finally block now also casts backend to IDisposable.Dispose() so the historian SDK connections get cleanly closed on Ctrl+C. MxAccessGalaxyBackend gains an IHistorianDataSource? historian constructor parameter (defaults null to preserve existing Host.Tests call sites that don't exercise HistoryRead), implements IDisposable that forwards to _historian.Dispose(), and the pragma warning disable CS0618 is locally scoped to the ToDto(HistorianEvent) mapper since the SDK marks Id/Source/DisplayText/Severity obsolete but the replacement surface isn't available in the aahClientManaged version we bind against — every other deprecated-SDK use still surfaces as an error under TreatWarningsAsErrors. Ported from v1 Historian.Aveva unchanged: the CloneConfigWithServerName helper that preserves every config field except ServerName per attempt; the double-checked locking in EnsureConnected/EnsureEventConnected (fast path = Volatile.Read outside lock, slow path acquires lock + re-checks + disposes any raced-in-parallel connection); HandleConnectionError/HandleEventConnectionError that close the dead connection, clear the active-node tracker, MarkFailed the picker entry with the exception message so the node enters cooldown, and log the reset with node= for operator correlation; RecordSuccess/RecordFailure that bump counters under _healthLock. Tests: HistorianClusterEndpointPickerTests (7 cases) — single-node ServerName fallback when ServerNames empty, MarkFailed enters cooldown and skips, cooldown expires after window, MarkHealthy immediately clears, all-in-cooldown returns empty healthy list, Snapshot reports failure count + last error + IsHealthy, case-insensitive de-dup on duplicate hostnames. HistorianWiringTests (2 cases) — HistoryReadAsync returns "Historian disabled" Success=false when historian:null passed; HistoryReadAsync with a fake IHistorianDataSource maps the returned HistorianSample (Value=42.5, Quality=192 Good, Timestamp) to a GalaxyDataValue with StatusCode=0u + SourceTimestampUtcUnixMs matching the sample + MessagePack-encoded value bytes. InternalsVisibleTo("...Host.Tests") added to Galaxy.Host.csproj so tests can reach the internal HistorianClusterEndpointPicker. Full Galaxy.Host.Tests suite: 24 pass / 0 fail (9 new historian + 15 pre-existing MemoryWatchdog/PostMortemMmf/RecyclePolicy/StaPump/EndToEndIpc/Handshake). Full solution build: 0 errors (202 pre-existing warnings). The v1 Historian.Aveva project + Historian.Aveva.Tests still build intact because the archive PR (Stream D.1 destructive delete) is still ahead of us — PR 5 intentionally does not delete either; once PR 2+3 merge and the archive-delete PR lands, a follow-up cleanup can remove Historian.Aveva + its 4 source files + 18 test cases. Alarm subsystem wire-up (OnAlarmEvent raising from MxAccessGalaxyBackend via AlarmExtension primitives) + host-status push (OnHostStatusChanged via a ported GalaxyRuntimeProbeManager) remain PR 6 candidates; they were on the same "Task B.1.h follow-up" list and share the IPC connection-sink wiring with the historian events path — it made PR 5 scope-manageable to do Historian first since that's what has the biggest surface area (981 LOC v1 plus SDK binding) and alarms/host-status have more bespoke integration with the existing MxAccess subscription fan-out.
2026-04-18 01:44:04 -04:00
Phase 2 PR 10 — HistoryReadAtTime IPC surface. New Shared.Contracts messages HistoryReadAtTimeRequest/Response (MessageKind 0x64/0x65), IGalaxyBackend gains HistoryReadAtTimeAsync, Stub/DbBacked return canonical pending error, MxAccessGalaxyBackend delegates to _historian.ReadAtTimeAsync (ported in PR 5, exposed now) — request timestamp array is flow-encoded as Unix ms to avoid MessagePack DateTime quirks then re-hydrated to DateTime on the Host side. Per-sample mapping uses the same ToWire(HistorianSample) helper as ReadRawAsync so the category→StatusCode mapping stays consistent (Quality byte 192+ → Good 0u, 64-191 → Uncertain, 0-63 → Bad 0x80000000u). Guards: null historian → "Historian disabled" (symmetric with other history paths); empty timestamp array short-circuits to Success=true, Values=[] without an SDK round-trip; SDK exception → Success=false with the message chained. Proxy-side IHistoryProvider.ReadAtTimeAsync capability doesn't exist in Core.Abstractions yet (OPC UA HistoryReadAtTime service is supported but the current IHistoryProvider only has ReadRawAsync + ReadProcessedAsync) — this PR adds the Host-side surface so a future Core.Abstractions extension can wire it through without needing another IPC change. Tests (4 new): disabled-error when historian null, empty-timestamp short-circuit without SDK call, Unix-ms↔DateTime round-trip with Good samples at two distinct timestamps, missing sample (Quality=0) maps to 0x80000000u Bad category. Galaxy.Host.Tests Unit suite: 31 pass / 0 fail (4 new at-time + 27 pre-existing). Galaxy.Host builds clean. Branches off v2.
2026-04-18 07:03:25 -04:00
Phase 2 PR 11 — HistoryReadEvents IPC (alarm history). New Shared.Contracts messages HistoryReadEventsRequest/Response + GalaxyHistoricalEvent DTO (MessageKind 0x66/0x67). IGalaxyBackend gains HistoryReadEventsAsync, Stub/DbBacked return canonical pending error, MxAccessGalaxyBackend delegates to _historian.ReadEventsAsync (ported in PR 5) and maps HistorianEventDto → GalaxyHistoricalEvent — Guid.ToString() for EventId wire shape, DateTime → Unix ms for both EventTime (when the event fired in the process) and ReceivedTime (when the Historian persisted it), DisplayText + Severity pass through. SourceName is string? — null means 'all sources' (passed straight through to HistorianDataSource.ReadEventsAsync which adds the AddEventFilter('Source', Equal, ...) only when non-null). Distinct from the live GalaxyAlarmEvent type because historical rows carry both timestamps and lack StateTransition (Historian logs instantaneous events, not the OPC UA Part 9 alarm lifecycle; translating to OPC UA event lifecycle is the alarm-subsystem's job). Guards: null historian → Historian-disabled error; SDK exception → Success=false with message chained. Tests (3 new): disabled-error when historian null, maps HistorianEventDto with full field set (Id/Source/EventTime/ReceivedTime/DisplayText/Severity=900) to GalaxyHistoricalEvent, null SourceName passes through unchanged (verifies the 'all sources' contract). Galaxy.Host.Tests Unit suite 34 pass / 0 fail. Galaxy.Host builds clean. Branches off phase-2-pr10-history-attime since both extend the MessageKind enum; fast-forwards if PR 10 merges first.
2026-04-18 07:08:16 -04:00
Phase 2 PR 7 — wire IHistoryProvider.ReadProcessedAsync end-to-end. PR 5 ported HistorianDataSource.ReadAggregateAsync into Galaxy.Host but left it internal — GalaxyProxyDriver.ReadProcessedAsync still threw NotSupportedException, so OPC UA clients issuing HistoryReadProcessed requests against the v2 topology got rejected at the driver boundary. This PR closes that gap by adding two new Shared.Contracts messages (HistoryReadProcessedRequest/Response, MessageKind 0x62/0x63), routing them through GalaxyFrameHandler, implementing HistoryReadProcessedAsync on all three IGalaxyBackend implementations (Stub/DbBacked return the canonical "pending" Success=false, MxAccessGalaxyBackend delegates to _historian.ReadAggregateAsync), mapping HistorianAggregateSample → GalaxyDataValue at the IPC boundary (null bucket Value → BadNoData 0x800E0000u, otherwise Good 0u), and flipping GalaxyProxyDriver.ReadProcessedAsync from the NotSupported throw to a real IPC call with OPC UA HistoryAggregateType enum mapped to Wonderware AnalogSummary column name on the Proxy side (Average → "Average", Minimum → "Minimum", Maximum → "Maximum", Count → "ValueCount", Total → NotSupported since there's no direct SDK column for sum). Decision #13 IPC data-shape stays intact — HistoryReadProcessedResponse carries GalaxyDataValue[] with the same MessagePack value + OPC UA StatusCode + timestamps shape as the other history responses, so the Proxy's existing ToSnapshot helper handles the conversion without a new code path. MxAccessGalaxyBackend.HistoryReadProcessedAsync guards: null historian → "Historian disabled" (symmetric with HistoryReadAsync); IntervalMs <= 0 → "HistoryReadProcessed requires IntervalMs > 0" (prevents division-by-zero inside the SDK's Resolution parameter); exception during SDK call → Success=false Values=[] with the message so the Proxy surfaces it as InvalidOperationException with a clean error chain. Tests — HistoryReadProcessedTests (new, 4 cases): disabled-error when historian null, rejects zero interval, maps Good sample with Value=12.34 and the Proxy-supplied AggregateColumn + IntervalMs flow unchanged through to the fake IHistorianDataSource, maps null Value bucket to 0x800E0000u BadNoData with null ValueBytes. AggregateColumnMappingTests (new, 5 cases in Proxy.Tests): theory covers all 4 supported HistoryAggregateType enum values → correct column string, and asserts Total throws NotSupportedException with a message that steers callers to Average/Minimum/Maximum/Count (the SDK's AnalogSummaryQueryResult doesn't expose a sum column — the closest is Average × ValueCount which is the responsibility of a caller-side aggregation rather than an extra IPC round-trip). InternalsVisibleTo added to Galaxy.Proxy csproj so Proxy.Tests can reach the internal MapAggregateToColumn static. Builds — Galaxy.Host (net48 x86) + Galaxy.Proxy (net10) both 0 errors, full solution 201 warnings (pre-existing) / 0 errors. Test counts — Host.Tests Unit suite: 28 pass (4 new processed + 9 PR5 historian + 15 pre-existing); Proxy.Tests Unit suite: 14 pass (5 new column-mapping + 9 pre-existing). Deferred to a later PR — ReadAtTime + ReadEvents + Health IPC surfaces (HistorianDataSource has them ported in PR 5 but they need additional contract messages and would push this PR past a comfortable review size); the alarm subsystem wire-up (OnAlarmEvent raising from MxAccessGalaxyBackend) which overlaps the ReadEventsAsync IPC work since both pull from HistorianAccess.CreateEventQuery on the SDK side; the Proxy-side quality-byte refinement where HistorianDataSource's per-sample raw quality byte gets decoded through the existing QualityMapper instead of the category-only mapping in ToWire(HistorianSample) — doesn't change correctness today since Good/Uncertain/Bad categories are all the Admin UI and OPC UA clients surface, but richer OPC DA status codes (BadNotConnected, UncertainSubNormal, etc.) are available on the wire and the Proxy could promote them before handing DataValueSnapshot to ISubscribable consumers. This PR branches off phase-2-pr5-historian because it directly extends the Historian IPC surface added there; if PR 5 merges first PR 7 fast-forwards, otherwise it needs a rebase after PR 5 lands.
2026-04-18 05:53:01 -04:00
Phase 2 PR 8 — wire gateway-level host-status push from MxAccessGalaxyBackend. PR 4 built the IPC infrastructure for OnHostStatusChanged (MessageKind.RuntimeStatusChange frame + ConnectionSink forwarding through FrameWriter) but no backend actually raised the event; the #pragma warning disable CS0067 around MxAccessGalaxyBackend.OnHostStatusChanged declared the event for interface symmetry while acknowledging the wire-up was Phase 2 follow-up. This PR closes the gateway-level signal: MxAccessClient.ConnectionStateChanged (already raised on false→true Register and true→false Unregister transitions, including the reconnect path in MonitorLoopAsync) now drives OnHostStatusChanged with a synthetic HostConnectivityStatus tagged HostName=MxAccessClient.ClientName, RuntimeStatus="Running" on reconnect + "Stopped" on disconnect, LastObservedUtcUnixMs set to the transition moment. The Admin UI's existing IHostConnectivityProbe subscriber on GalaxyProxyDriver (HostStatusChangedEventArgs) already handles the full translation — OnHostConnectivityUpdate parses "Running"/"Stopped"/"Faulted" into the Core.Abstractions HostState enum and fires OnHostStatusChanged downstream, so this single backend-side event wire-up produces an end-to-end signal with no further Proxy changes required. Per-platform and per-AppEngine ScanState probing (the 472 LOC GalaxyRuntimeProbeManager state machine in v1 that advises <Host>.ScanState on every deployed $WinPlatform + $AppEngine gobject, tracks Unknown → Running → Stopped transitions, handles the on-change-only delivery quirk of ScanState, and surfaces IsHostStopped(gobjectId) for the node manager's Read path to short-circuit on-demand reads against known-stopped runtimes) remains deferred to a follow-up PR — the gateway-level signal gives operators the top-level transport-health rung of the status ladder, which is what matters when the Galaxy COM proxy itself goes down (vs a specific platform going down). MxAccessClient.ClientName property exposes the previously-private _clientName field so the backend can tag its pushes with a stable gateway identity — operators configure this via OTOPCUA_GALAXY_CLIENT_NAME env var (default "OtOpcUa-Galaxy.Host" per Program.cs). MxAccessGalaxyBackend constructor subscribes the new _onConnectionStateChanged field before returning + Dispose unsubscribes it via _mx.ConnectionStateChanged -= _onConnectionStateChanged to prevent the backend's own dispose from leaving a dangling handler on the MxAccessClient (same shape as MxAccessClient.SubscriptionReplayFailed PR 6 dispose discipline). #pragma warning disable CS0067 removed from around OnHostStatusChanged since the event is now raised; the directive is narrowed to cover only OnAlarmEvent which stays unraised pending the alarm subsystem port (PR 9 candidate). Tests — HostStatusPushTests (new, 2 cases): ConnectionStateChanged_raises_OnHostStatusChanged_with_gateway_name fires mx.ConnectAsync → mx.DisconnectAsync and asserts two notifications in order with HostName="GatewayClient" (the clientName passed to MxAccessClient ctor), RuntimeStatus="Running" then "Stopped", LastObservedUtcUnixMs > 0; Dispose_unsubscribes_so_post_dispose_state_changes_do_not_fire_events asserts that after backend.Dispose() a subsequent mx.DisconnectAsync does not bump the count on a registered OnHostStatusChanged handler — guards against the subscription-leak regression where a lingering backend instance would accumulate cross-reconnect notifications for a dead writer. Host.Tests csproj gains a Reference to lib/ArchestrA.MxAccess.dll (identical to the reference PR 6 adds — conflict-free cherry-pick/merge since both PRs stage the same <Reference> node; git will collapse to one when either lands first). Full Galaxy.Host.Tests Unit suite: 26 pass / 0 fail (2 new host-status + 9 PR5 historian + 15 pre-existing PostMortemMmf/RecyclePolicy/StaPump/MemoryWatchdog/EndToEndIpc/Handshake). Galaxy.Host builds clean (0 errors, 0 warnings). Branch base — PR 8 is on phase-2-pr5-historian rather than phase-2-pr4-findings because the constructor path on MxAccessGalaxyBackend gained a new historian parameter in PR 5 and the Dispose implementation needs to coordinate the two unsubscribes; targeting the earlier base would leave a trivial conflict on Dispose.
2026-04-18 06:03:16 -04:00
Phase 2 — port MXAccess COM client to Galaxy.Host + MxAccessGalaxyBackend (3rd IGalaxyBackend) + live MXAccess smoke + Phase 2 exit-gate doc + adversarial review. The full Galaxy data-plane now flows through the v2 IPC topology end-to-end against live ArchestrA.MxAccess.dll, on this dev box, with 30/30 Host tests + 9/9 Proxy tests + 963/963 solution tests passing alongside the unchanged 494 v1 IntegrationTests baseline. Backend/MxAccess/Vtq is a focused port of v1's Vtq value-timestamp-quality DTO. Backend/MxAccess/IMxProxy abstracts LMXProxyServer (port of v1's IMxProxy with the same Register/Unregister/AddItem/RemoveItem/AdviseSupervisory/UnAdviseSupervisory/Write surface + OnDataChange + OnWriteComplete events); MxProxyAdapter is the concrete COM-backed implementation that does Marshal.ReleaseComObject-loop on Unregister, must be constructed on an STA thread. Backend/MxAccess/MxAccessClient is the focused port of v1's MxAccessClient partials — Connect/Disconnect/Read/Write/Subscribe/Unsubscribe through the new Sta/StaPump (the real Win32 GetMessage pump from the previous commit), ConcurrentDictionary handle tracking, OnDataChange event marshalling to per-tag callbacks, ReadAsync implemented as the canonical subscribe → first-OnDataChange → unsubscribe one-shot pattern. Galaxy.Host csproj flipped to x86 PlatformTarget + Prefer32Bit=true with the ArchestrA.MxAccess HintPath ..\..\lib\ArchestrA.MxAccess.dll reference (lib/ already contains the production DLL). Backend/MxAccessGalaxyBackend is the third IGalaxyBackend implementation (alongside StubGalaxyBackend and DbBackedGalaxyBackend): combines GalaxyRepository (Discover) with MxAccessClient (Read/Write/Subscribe), MessagePack-deserializes inbound write values, MessagePack-serializes outbound read values into ValueBytes, decodes ArrayDimension/SecurityClassification/category_id with the same v1 mapping. Program.cs selects between stub|db|mxaccess via OTOPCUA_GALAXY_BACKEND env var (default = mxaccess); OTOPCUA_GALAXY_ZB_CONN overrides the ZB connection string; OTOPCUA_GALAXY_CLIENT_NAME sets the Wonderware client identity; the StaPump and MxAccessClient lifecycles are tied to the server.RunAsync try/finally so a clean Ctrl+C tears down the COM proxy via Marshal.ReleaseComObject before the pump's WM_QUIT. Live MXAccess smoke tests (MxAccessLiveSmokeTests, net48 x86) — skipped when ZB unreachable or aaBootstrap not running, otherwise verify (1) MxAccessClient.ConnectAsync returns a positive LMXProxyServer handle on the StaPump, (2) MxAccessGalaxyBackend.OpenSession + Discover returns at least one gobject with attributes, (3) MxAccessGalaxyBackend.ReadValues against the first discovered attribute returns a response with the correct TagReference shape (value + quality vary by what's running, so we don't assert specific values). All 3 pass on this dev box. EndToEndIpcTests + IpcHandshakeIntegrationTests moved from Galaxy.Proxy.Tests (net10) to Galaxy.Host.Tests (net48 x86) — the previous test placement silently dropped them at xUnit discovery because Host became net48 x86 and net10 process can't load it. Rewritten to use Shared's FrameReader/FrameWriter directly instead of going through Proxy's GalaxyIpcClient (functionally equivalent — same wire protocol, framing primitives + dispatcher are the production code path verbatim). 7 IPC tests now run cleanly: Hello+heartbeat round-trip, wrong-secret rejection, OpenSession session-id assignment, Discover error-response surfacing, WriteValues per-tag bad status, Subscribe id assignment, Recycle grace window. Phase 2 exit-gate doc (docs/v2/implementation/exit-gate-phase-2.md) supersedes the partial-exit doc with the as-built state — Streams A/B/C complete; D/E gated only on the legacy-Host removal + parity-test rewrite cycle that fundamentally requires multi-day debug iteration; full adversarial-review section ranking 8 findings (2 high, 3 medium, 3 low) all explicitly deferred to Stream D/E or v2.1 with rationale; Stream-D removal checklist gives the next-session entry point with two policy options for the 494 v1 tests (rewrite-to-use-Proxy vs archive-and-write-smaller-v2-parity-suite). Cannot one-shot Stream D.1 in any single session because deleting OtOpcUa.Host requires the v1 IntegrationTests cycle to be retargeted first; that's the structural blocker, not "needs more code" — and the plan itself budgets 3-4 weeks for it.
2026-04-18 00:23:24 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 2 PR 6 — close the 2 low findings carried forward from PR 4. Low finding #1 ($Heartbeat probe handle leak in MonitorLoopAsync): the probe calls _proxy.AddItem(connectionHandle, "$Heartbeat") on every monitor tick that observes the connection is past StaleThreshold, but previously discarded the returned item handle — so every probe (one per MonitorInterval, default 5s) leaked one item handle into the MXAccess proxy's internal handle table. Fix: capture the item handle, call RemoveItem(connectionHandle, probeHandle) in the InvokeAsync's finally block so it runs on the same pump turn as the AddItem, best-effort RemoveItem swallow so a dying proxy doesn't throw secondary exceptions out of the probe path. Probe ok becomes probeHandle > 0 so any AddItem that returns 0 (MXAccess's "could not create") counts as a failed probe, matching v1 behavior. Low finding #2 (subscription replay silently swallowed per-tag failures): after a reconnect, the replay loop iterates the pre-reconnect subscription snapshot and calls SubscribeOnPumpAsync for each; previously those failures went into a bare catch { /* skip */ } so an operator had no signal when specific tags failed to re-subscribe — the first indication downstream was a quality drop on OPC UA clients. Fix: new SubscriptionReplayFailedEventArgs (TagReference + Exception) + SubscriptionReplayFailed event on MxAccessClient that fires once per tag that fails to re-subscribe, Log.Warning per failure with the reconnect counter + tag reference, and a summary log line at the end of the replay loop ("{failed} of {total} failed" or "{total} re-subscribed cleanly"). Serilog using + ILogger Log = Serilog.Log.ForContext<MxAccessClient>() added. Tests — MxAccessClientMonitorLoopTests (new file, 2 cases): Heartbeat_probe_calls_RemoveItem_for_every_AddItem constructs a CountingProxy IMxProxy that tracks AddItem/RemoveItem pair counts scoped to the "$Heartbeat" address, runs the client with MonitorInterval=150ms + StaleThreshold=50ms for 700ms, asserts HeartbeatAddCount > 1, HeartbeatAddCount == HeartbeatRemoveCount, OutstandingHeartbeatHandles == 0 after dispose; SubscriptionReplayFailed_fires_for_each_tag_that_fails_to_replay uses a ReplayFailingProxy that throws on the next $Heartbeat probe (to trigger the reconnect path) and throws on the replay-time AddItem for specified tag names ("BadTag.A", "BadTag.B"), subscribes GoodTag.X + BadTag.A + BadTag.B before triggering probe failure, collects SubscriptionReplayFailed args into a ConcurrentBag, asserts exactly 2 events fired and both bad tags are represented — GoodTag.X replays cleanly so it does not fire. Host.Tests csproj gains a Reference to lib/ArchestrA.MxAccess.dll because IMxProxy's MxDataChangeHandler delegate signature mentions MXSTATUS_PROXY and the compiler resolves all delegate parameter types when a test class implements the interface, even if the test code never names the type. No regressions: full Galaxy.Host.Tests Unit suite 26 pass / 0 fail (2 new monitor-loop tests + 9 PR5 historian + 15 pre-existing PostMortemMmf/RecyclePolicy/StaPump/MemoryWatchdog/EndToEndIpc/Handshake). Galaxy.Host builds clean (0 errors, 0 warnings) — the new Serilog.Log.ForContext usage picks up the existing Serilog package ref that PR 4 pulled in for the monitor-loop infrastructure. Both findings were flagged as non-blocking for PR 4 merge and are now resolved alongside whichever merge order the reviewer picks; this PR branches off phase-2-pr4-findings so it can rebase cleanly if PR 4 lands first or be re-based onto master after PR 4 merges.
2026-04-18 02:06:15 -04:00
Phase 2 — port MXAccess COM client to Galaxy.Host + MxAccessGalaxyBackend (3rd IGalaxyBackend) + live MXAccess smoke + Phase 2 exit-gate doc + adversarial review. The full Galaxy data-plane now flows through the v2 IPC topology end-to-end against live ArchestrA.MxAccess.dll, on this dev box, with 30/30 Host tests + 9/9 Proxy tests + 963/963 solution tests passing alongside the unchanged 494 v1 IntegrationTests baseline. Backend/MxAccess/Vtq is a focused port of v1's Vtq value-timestamp-quality DTO. Backend/MxAccess/IMxProxy abstracts LMXProxyServer (port of v1's IMxProxy with the same Register/Unregister/AddItem/RemoveItem/AdviseSupervisory/UnAdviseSupervisory/Write surface + OnDataChange + OnWriteComplete events); MxProxyAdapter is the concrete COM-backed implementation that does Marshal.ReleaseComObject-loop on Unregister, must be constructed on an STA thread. Backend/MxAccess/MxAccessClient is the focused port of v1's MxAccessClient partials — Connect/Disconnect/Read/Write/Subscribe/Unsubscribe through the new Sta/StaPump (the real Win32 GetMessage pump from the previous commit), ConcurrentDictionary handle tracking, OnDataChange event marshalling to per-tag callbacks, ReadAsync implemented as the canonical subscribe → first-OnDataChange → unsubscribe one-shot pattern. Galaxy.Host csproj flipped to x86 PlatformTarget + Prefer32Bit=true with the ArchestrA.MxAccess HintPath ..\..\lib\ArchestrA.MxAccess.dll reference (lib/ already contains the production DLL). Backend/MxAccessGalaxyBackend is the third IGalaxyBackend implementation (alongside StubGalaxyBackend and DbBackedGalaxyBackend): combines GalaxyRepository (Discover) with MxAccessClient (Read/Write/Subscribe), MessagePack-deserializes inbound write values, MessagePack-serializes outbound read values into ValueBytes, decodes ArrayDimension/SecurityClassification/category_id with the same v1 mapping. Program.cs selects between stub|db|mxaccess via OTOPCUA_GALAXY_BACKEND env var (default = mxaccess); OTOPCUA_GALAXY_ZB_CONN overrides the ZB connection string; OTOPCUA_GALAXY_CLIENT_NAME sets the Wonderware client identity; the StaPump and MxAccessClient lifecycles are tied to the server.RunAsync try/finally so a clean Ctrl+C tears down the COM proxy via Marshal.ReleaseComObject before the pump's WM_QUIT. Live MXAccess smoke tests (MxAccessLiveSmokeTests, net48 x86) — skipped when ZB unreachable or aaBootstrap not running, otherwise verify (1) MxAccessClient.ConnectAsync returns a positive LMXProxyServer handle on the StaPump, (2) MxAccessGalaxyBackend.OpenSession + Discover returns at least one gobject with attributes, (3) MxAccessGalaxyBackend.ReadValues against the first discovered attribute returns a response with the correct TagReference shape (value + quality vary by what's running, so we don't assert specific values). All 3 pass on this dev box. EndToEndIpcTests + IpcHandshakeIntegrationTests moved from Galaxy.Proxy.Tests (net10) to Galaxy.Host.Tests (net48 x86) — the previous test placement silently dropped them at xUnit discovery because Host became net48 x86 and net10 process can't load it. Rewritten to use Shared's FrameReader/FrameWriter directly instead of going through Proxy's GalaxyIpcClient (functionally equivalent — same wire protocol, framing primitives + dispatcher are the production code path verbatim). 7 IPC tests now run cleanly: Hello+heartbeat round-trip, wrong-secret rejection, OpenSession session-id assignment, Discover error-response surfacing, WriteValues per-tag bad status, Subscribe id assignment, Recycle grace window. Phase 2 exit-gate doc (docs/v2/implementation/exit-gate-phase-2.md) supersedes the partial-exit doc with the as-built state — Streams A/B/C complete; D/E gated only on the legacy-Host removal + parity-test rewrite cycle that fundamentally requires multi-day debug iteration; full adversarial-review section ranking 8 findings (2 high, 3 medium, 3 low) all explicitly deferred to Stream D/E or v2.1 with rationale; Stream-D removal checklist gives the next-session entry point with two policy options for the 494 v1 tests (rewrite-to-use-Proxy vs archive-and-write-smaller-v2-parity-suite). Cannot one-shot Stream D.1 in any single session because deleting OtOpcUa.Host requires the v1 IntegrationTests cycle to be retargeted first; that's the structural blocker, not "needs more code" — and the plan itself budgets 3-4 weeks for it.
2026-04-18 00:23:24 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 2 PR 8 — wire gateway-level host-status push from MxAccessGalaxyBackend. PR 4 built the IPC infrastructure for OnHostStatusChanged (MessageKind.RuntimeStatusChange frame + ConnectionSink forwarding through FrameWriter) but no backend actually raised the event; the #pragma warning disable CS0067 around MxAccessGalaxyBackend.OnHostStatusChanged declared the event for interface symmetry while acknowledging the wire-up was Phase 2 follow-up. This PR closes the gateway-level signal: MxAccessClient.ConnectionStateChanged (already raised on false→true Register and true→false Unregister transitions, including the reconnect path in MonitorLoopAsync) now drives OnHostStatusChanged with a synthetic HostConnectivityStatus tagged HostName=MxAccessClient.ClientName, RuntimeStatus="Running" on reconnect + "Stopped" on disconnect, LastObservedUtcUnixMs set to the transition moment. The Admin UI's existing IHostConnectivityProbe subscriber on GalaxyProxyDriver (HostStatusChangedEventArgs) already handles the full translation — OnHostConnectivityUpdate parses "Running"/"Stopped"/"Faulted" into the Core.Abstractions HostState enum and fires OnHostStatusChanged downstream, so this single backend-side event wire-up produces an end-to-end signal with no further Proxy changes required. Per-platform and per-AppEngine ScanState probing (the 472 LOC GalaxyRuntimeProbeManager state machine in v1 that advises <Host>.ScanState on every deployed $WinPlatform + $AppEngine gobject, tracks Unknown → Running → Stopped transitions, handles the on-change-only delivery quirk of ScanState, and surfaces IsHostStopped(gobjectId) for the node manager's Read path to short-circuit on-demand reads against known-stopped runtimes) remains deferred to a follow-up PR — the gateway-level signal gives operators the top-level transport-health rung of the status ladder, which is what matters when the Galaxy COM proxy itself goes down (vs a specific platform going down). MxAccessClient.ClientName property exposes the previously-private _clientName field so the backend can tag its pushes with a stable gateway identity — operators configure this via OTOPCUA_GALAXY_CLIENT_NAME env var (default "OtOpcUa-Galaxy.Host" per Program.cs). MxAccessGalaxyBackend constructor subscribes the new _onConnectionStateChanged field before returning + Dispose unsubscribes it via _mx.ConnectionStateChanged -= _onConnectionStateChanged to prevent the backend's own dispose from leaving a dangling handler on the MxAccessClient (same shape as MxAccessClient.SubscriptionReplayFailed PR 6 dispose discipline). #pragma warning disable CS0067 removed from around OnHostStatusChanged since the event is now raised; the directive is narrowed to cover only OnAlarmEvent which stays unraised pending the alarm subsystem port (PR 9 candidate). Tests — HostStatusPushTests (new, 2 cases): ConnectionStateChanged_raises_OnHostStatusChanged_with_gateway_name fires mx.ConnectAsync → mx.DisconnectAsync and asserts two notifications in order with HostName="GatewayClient" (the clientName passed to MxAccessClient ctor), RuntimeStatus="Running" then "Stopped", LastObservedUtcUnixMs > 0; Dispose_unsubscribes_so_post_dispose_state_changes_do_not_fire_events asserts that after backend.Dispose() a subsequent mx.DisconnectAsync does not bump the count on a registered OnHostStatusChanged handler — guards against the subscription-leak regression where a lingering backend instance would accumulate cross-reconnect notifications for a dead writer. Host.Tests csproj gains a Reference to lib/ArchestrA.MxAccess.dll (identical to the reference PR 6 adds — conflict-free cherry-pick/merge since both PRs stage the same <Reference> node; git will collapse to one when either lands first). Full Galaxy.Host.Tests Unit suite: 26 pass / 0 fail (2 new host-status + 9 PR5 historian + 15 pre-existing PostMortemMmf/RecyclePolicy/StaPump/MemoryWatchdog/EndToEndIpc/Handshake). Galaxy.Host builds clean (0 errors, 0 warnings). Branch base — PR 8 is on phase-2-pr5-historian rather than phase-2-pr4-findings because the constructor path on MxAccessGalaxyBackend gained a new historian parameter in PR 5 and the Dispose implementation needs to coordinate the two unsubscribes; targeting the earlier base would leave a trivial conflict on Dispose.
2026-04-18 06:03:16 -04:00