Phase 7 follow-up #247 — Galaxy.Host historian writer + SQLite sink activation #194

Merged
dohertj2 merged 1 commits from phase-7-fu-247-galaxy-historian-writer into v2 2026-04-20 22:21:04 -04:00
Owner

Closes the historian leg of Phase 7. Scripted alarm transitions now batch-flow through the existing Galaxy.Host pipe + queue durably in a local SQLite store-and-forward when Galaxy is the registered driver, instead of being dropped into NullAlarmHistorianSink.

GalaxyHistorianWriter (Driver.Galaxy.Proxy.Ipc)

IAlarmHistorianWriter implementation. Translates AlarmHistorianEventHistorianAlarmEventDto (Stream D contract), batches via the existing GalaxyIpcClient.CallAsync round-trip on MessageKind.HistorianAlarmEventRequest / Response, maps per-event HistorianAlarmEventOutcomeDto bytes back to HistorianWriteOutcome (Ack / RetryPlease / PermanentFail) so the SQLite drain worker knows what to ack vs dead-letter vs retry. Empty-batch fast path. Pipe-level transport faults bubble up as GalaxyIpcException which the SQLite sink translates to whole-batch RetryPlease per its catch contract.

GalaxyProxyDriver implements IAlarmHistorianWriter

Marker interface lets Phase7Composer discover it via type check. WriteBatchAsync delegates to a thin GalaxyHistorianWriter wrapping the driver's existing _client. Throws InvalidOperationException if InitializeAsync hasn't connected yet — the SQLite drain worker treats that as a transient batch failure and retries.

Phase7Composer.ResolveHistorianSink

Replaces the injected sink dep when any registered driver implements IAlarmHistorianWriter. Constructs SqliteStoreAndForwardSink at %ProgramData%/OtOpcUa/alarm-historian-queue.db (falls back to %TEMP% when ProgramData unavailable), starts the 2s drain timer, owns the sink disposable for clean teardown. When no driver provides the writer, keeps the NullAlarmHistorianSink wired by Program.cs (#246).

DisposeAsync now disposes in the right order: bridge → engines → owned sink → injected fallback.

Tests — 7 new

GalaxyHistorianWriterMappingTests: ToDto round-trips every field; preserves null Comment; per-byte outcome enum mapping (Ack / RetryPlease / PermanentFail) via [Theory]; unknown byte throws; ctor null-guard. The IPC round-trip itself is covered by the live Host suite (task #240) which constructs a real pipe.

Server.Phase7 tests: 34/34 still pass; Galaxy.Proxy tests: 25/25 (+7 = 32 total).

Phase 7 production wiring chain — COMPLETE

  • #243 composition kernel
  • #245 scripted-alarm IReadable adapter
  • #244 driver bridge
  • #246 Program.cs wire-in
  • #247 this — Galaxy.Host historian writer + SQLite sink activation

What unblocks now: task #240 live OPC UA E2E smoke. With a Galaxy driver registered, scripted alarm transitions flow end-to-end through engine → SQLite queue → drain worker → Galaxy.Host IPC → Aveva Historian alarm schema. Without Galaxy, NullSink keeps the engines functional and the queue dormant.

Closes the historian leg of Phase 7. Scripted alarm transitions now batch-flow through the existing Galaxy.Host pipe + queue durably in a local SQLite store-and-forward when Galaxy is the registered driver, instead of being dropped into NullAlarmHistorianSink. ## `GalaxyHistorianWriter` (Driver.Galaxy.Proxy.Ipc) `IAlarmHistorianWriter` implementation. Translates `AlarmHistorianEvent` → `HistorianAlarmEventDto` (Stream D contract), batches via the existing `GalaxyIpcClient.CallAsync` round-trip on `MessageKind.HistorianAlarmEventRequest` / `Response`, maps per-event `HistorianAlarmEventOutcomeDto` bytes back to `HistorianWriteOutcome` (`Ack` / `RetryPlease` / `PermanentFail`) so the SQLite drain worker knows what to ack vs dead-letter vs retry. Empty-batch fast path. Pipe-level transport faults bubble up as `GalaxyIpcException` which the SQLite sink translates to whole-batch RetryPlease per its catch contract. ## `GalaxyProxyDriver implements IAlarmHistorianWriter` Marker interface lets `Phase7Composer` discover it via type check. `WriteBatchAsync` delegates to a thin `GalaxyHistorianWriter` wrapping the driver's existing `_client`. Throws `InvalidOperationException` if `InitializeAsync` hasn't connected yet — the SQLite drain worker treats that as a transient batch failure and retries. ## `Phase7Composer.ResolveHistorianSink` Replaces the injected sink dep when any registered driver implements `IAlarmHistorianWriter`. Constructs `SqliteStoreAndForwardSink` at `%ProgramData%/OtOpcUa/alarm-historian-queue.db` (falls back to `%TEMP%` when ProgramData unavailable), starts the 2s drain timer, owns the sink disposable for clean teardown. When no driver provides the writer, keeps the `NullAlarmHistorianSink` wired by Program.cs (#246). `DisposeAsync` now disposes in the right order: bridge → engines → owned sink → injected fallback. ## Tests — 7 new `GalaxyHistorianWriterMappingTests`: `ToDto` round-trips every field; preserves null Comment; per-byte outcome enum mapping (`Ack` / `RetryPlease` / `PermanentFail`) via `[Theory]`; unknown byte throws; ctor null-guard. The IPC round-trip itself is covered by the live Host suite (task #240) which constructs a real pipe. **Server.Phase7 tests: 34/34 still pass; Galaxy.Proxy tests: 25/25 (+7 = 32 total).** ## Phase 7 production wiring chain — COMPLETE - ✅ #243 composition kernel - ✅ #245 scripted-alarm IReadable adapter - ✅ #244 driver bridge - ✅ #246 Program.cs wire-in - ✅ **#247 this — Galaxy.Host historian writer + SQLite sink activation** What unblocks now: **task #240** live OPC UA E2E smoke. With a Galaxy driver registered, scripted alarm transitions flow end-to-end through engine → SQLite queue → drain worker → Galaxy.Host IPC → Aveva Historian alarm schema. Without Galaxy, NullSink keeps the engines functional and the queue dormant.
dohertj2 added 1 commit 2026-04-20 22:20:51 -04:00
Closes the historian leg of Phase 7. Scripted alarm transitions now batch-flow
through the existing Galaxy.Host pipe + queue durably in a local SQLite store-
and-forward when Galaxy is the registered driver, instead of being dropped into
NullAlarmHistorianSink.

## GalaxyHistorianWriter (Driver.Galaxy.Proxy.Ipc)

IAlarmHistorianWriter implementation. Translates AlarmHistorianEvent →
HistorianAlarmEventDto (Stream D contract), batches via the existing
GalaxyIpcClient.CallAsync round-trip on MessageKind.HistorianAlarmEventRequest /
Response, maps per-event HistorianAlarmEventOutcomeDto bytes back to
HistorianWriteOutcome (Ack/RetryPlease/PermanentFail) so the SQLite drain
worker knows what to ack vs dead-letter vs retry. Empty-batch fast path.
Pipe-level transport faults (broken pipe, host crash) bubble up as
GalaxyIpcException which the SQLite sink's drain worker translates to
whole-batch RetryPlease per its catch contract.

## GalaxyProxyDriver implements IAlarmHistorianWriter

Marker interface lets Phase7Composer discover it via type check at compose
time. WriteBatchAsync delegates to a thin GalaxyHistorianWriter wrapping the
driver's existing _client. Throws InvalidOperationException if InitializeAsync
hasn't connected yet — the SQLite drain worker treats that as a transient
batch failure and retries.

## Phase7Composer.ResolveHistorianSink

Replaces the injected sink dep when any registered driver implements
IAlarmHistorianWriter. Constructs SqliteStoreAndForwardSink at
%ProgramData%/OtOpcUa/alarm-historian-queue.db (falls back to %TEMP% when
ProgramData unavailable, e.g. dev), starts the 2s drain timer, owns the sink
disposable for clean teardown. When no driver provides the writer, keeps the
NullAlarmHistorianSink wired by Program.cs (#246).

DisposeAsync now also disposes the owned SQLite sink in the right order:
bridge → engines → owned sink → injected fallback.

## Tests — 7 new GalaxyHistorianWriterMappingTests

ToDto round-trips every field; preserves null Comment; per-byte outcome enum
mapping (Ack / RetryPlease / PermanentFail) via [Theory]; unknown byte throws;
ctor null-guard. The IPC round-trip itself is covered by the live Host suite
(task #240) which constructs a real pipe.

Server.Phase7 tests: 34/34 still pass; Galaxy.Proxy tests: 25/25 (+7 = 32 total).

## Phase 7 production wiring chain — COMPLETE
-  #243 composition kernel
-  #245 scripted-alarm IReadable adapter
-  #244 driver bridge
-  #246 Program.cs wire-in
-  #247 this — Galaxy.Host historian writer + SQLite sink activation

What unblocks now: task #240 live OPC UA E2E smoke. With a Galaxy driver
registered, scripted alarm transitions flow end-to-end through the engine →
SQLite queue → drain worker → Galaxy.Host IPC → Aveva Historian alarm schema.
Without Galaxy, NullSink keeps the engines functional and the queue dormant.
dohertj2 merged commit efdf04320a into v2 2026-04-20 22:21:04 -04:00
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
Phase 2 official close-out. Closes task #209. The 2026-04-18 exit-gate-phase-2-final.md captured Phase 2 state at PR 2 merge — four High/Medium adversarial findings still OPEN, Historian port + alarm subsystem + v1 archive deletion all deferred. Since then: PR 4 closed all four findings end-to-end (High 1 Read subscription-leak, High 2 no reconnect loop, Medium 3 SubscribeAsync doesn't push frames, Medium 4 WriteValuesAsync doesn't await OnWriteComplete — mapped + resolved inline in the new doc), PR 12 landed the richer historian quality mapper, PR 13 shipped GalaxyRuntimeProbeManager with per-Platform/AppEngine ScanState subscriptions + StateChanged events forwarded through the existing OnHostStatusChanged IPC frame, PR 14 wired the alarm subsystem (GalaxyAlarmTracker advising the four alarm-state attributes per IsAlarm=true attribute, raising AlarmTransition events forwarded through OnAlarmEvent IPC frames), Phase 3 PR 18 deleted the v1 source trees, and PR 61 closed V1_ARCHIVE_STATUS.md. Phase 2 is functionally done; this commit is the bookkeeping pass. New exit-gate-phase-2-closed.md at docs/v2/implementation/ — five-stream status table (A/B/C/D/E all complete with the specific close commits named), full resolution table for every 2026-04-18 adversarial finding mapped to the PR 4 resolution, cross-cutting deferrals table marking every one resolved (Historian SDK plugin port → done, subscription push frames → done under Medium 3, Historian-backed HistoryRead → done, alarm subsystem wire-up → done, reconnect-without-recycle → done under High 2, v1 archive deletion → done). Fresh 2026-04-20 test baseline captured from the current v2 tip: 1844 passing + 29 infra-gated skips across 21 test projects, including the net48 x86 Galaxy.Host.Tests suite (107 pass) that exercises the MXAccess COM path on the dev box. Flake observed — Configuration.Tests 70/71 on first full-solution run, 71/71 on retry; logged as a known non-stable flake rather than chased because it did not reproduce. The prior exit-gate-phase-2-final.md is kept in place (historical record of the 2026-04-18 snapshot) but gets a superseded-by banner at the top pointing at the new close-out doc so future readers land on current status first. docs/v2/plan.md Phase 2 section header gains the ✅ CLOSED 2026-04-20 marker + a link to the close-out doc so the top-level plan index reflects reality. "What Phase 2 closed means for Phase 3 and later" section in the new doc captures the downstream contract: Galaxy now runs as a first-class v2 driver with the same capability-interface shape as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient; no v1 code path remains; the 2026-04-13 stability findings persist as named regression tests under tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs so any future refactor reintroducing them trips the test. "Outstanding — not Phase 2 blockers" section lists the four pending non-Phase-2 tasks (#177, #194, #195, #199) so nobody mistakes them for Phase 2 tail work.
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
AbCip whole-UDT read optimization (#194) — declaration-driven member grouping collapses N per-member reads into one parent-UDT read + client-side decode. Closes task #194. On a batch that includes multiple members of the same hand-declared UDT tag, ReadAsync now issues one libplctag read on the parent + decodes each member from the runtime's buffer at its computed byte offset. A 6-member Motor UDT read goes from 6 libplctag round-trips to 1 — the Rockwell-suggested pattern for minimizing CIP request overhead on batch reads of UDT state (decision #11's follow-through on what the template decoder from task #179 was meant to enable). AbCipUdtMemberLayout is a pure-function helper that computes declared-member byte offsets under Logix natural-alignment rules (SInt 1-byte / Int 2-byte / DInt + Real + Dt 4-byte / LInt + ULInt + LReal 8-byte; alignment pad inserted before each member as needed). Opts out for BOOL / String / Structure members — BOOL storage in Logix UDTs packs into a hidden host byte whose position can't be computed from declaration-only info, and String members need length-prefix + STRING[82] fan-out which libplctag already handles via a per-tag DecodeValue path. The CIP Template Object shape from task #179 (when populated via FetchUdtShapeAsync) carries real offsets for those members — layering that richer path on top of the planner is a separate follow-up and does not change this PR's conservative behaviour. AbCipUdtReadPlanner is the scheduling function ReadAsync consults each batch — pure over (requests, tagsByName), emits Groups + Fallbacks. A group is formed when (a) the reference resolves to "parent.member"; (b) parent is a Structure tag with declared Members; (c) the layout helper succeeds on those members; (d) the specific member appears in the computed offset map; (e) at least two members of the same parent appear in the batch — single-member groups demote to the fallback path because one whole-UDT read vs one per-member read is equivalent cost but more client-side work. Original batch indices are preserved through the plan so out-of-order batches write decoded values back at the right output slot; the caller's result array order is invariant. IAbCipTagRuntime.DecodeValueAt(AbCipDataType, int offset, int? bitIndex) is the new hot-path method — LibplctagTagRuntime delegates to libplctag's offset-aware Get*(offset) calls (GetInt32, GetFloat32, etc.) that were always there; previously every call passed offset 0. DecodeValue(type, bitIndex) stays as the shorthand + forwards to DecodeValueAt with offset 0, preserving the existing single-tag read path + every test that exercises it. FakeAbCipTag gains a ValuesByOffset dictionary so tests can drive multi-member decoding by setting offset→value before the read fires; unmapped offsets fall back to the existing Value field so the 200+ existing tests that never set ValuesByOffset keep working unchanged. AbCipDriver.ReadAsync refactored: planner splits the batch, ReadGroupAsync handles each UDT group (one EnsureTagRuntimeAsync on the parent + one ReadAsync + N DecodeValueAt calls), ReadSingleAsync handles each fallback (the pre-#194 per-tag path, now extracted + threaded through). A per-group failure stamps the mapped libplctag status across every grouped member only — sibling groups + fallback refs are unaffected. Health-surface updates happen once per successful group rather than once per member to avoid ping-ponging the DriverState bookkeeping. Five AbCipUdtMemberLayoutTests: packed atomics get natural-alignment offsets including 8-byte pad before LInt; SInts pack without padding; BOOL/String/Structure opt out + return null; empty member list returns null. Six AbCipUdtReadPlannerTests: two members group; single-member demotes to fallback; unknown references fall back without poisoning groups; atomic top-level tags fall back untouched; UDTs containing BOOL don't group; original indices survive out-of-order batches. Five AbCipDriverWholeUdtReadTests (real driver + fake runtime): two grouped members trigger exactly one parent read + one fake runtime (proving the optimization engages); each member decodes at its own offset via ValuesByOffset; parent-read non-zero status stamps Bad across the group; mixed UDT-member + atomic top-level batch produces 2 runtimes + 2 reads (not 3); single-member-of-UDT still uses the member-level runtime (proving demotion works). Driver builds 0 errors; AbCip.Tests 227/227 (was 211, +16 new).
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
AbCip IAlarmSource via ALMD projection (#177) — feature-flagged OFF by default; when enabled, polls declared ALMD UDT member fields + raises OnAlarmEvent on 0→1 + 1→0 transitions. Closes task #177. The AB CIP driver now implements IAlarmSource so the generic-driver alarm dispatch path (PR 14's sinks + the Server.Security.AuthorizationGate AlarmSubscribe/AlarmAck invoker wrapping) can treat AB-backed alarms uniformly with Galaxy + OpcUaClient + FOCAS. Projection is ALMD-only in this pass: the Logix ALMD (digital alarm) instruction's UDT shape is well-understood (InFaulted + Acked + Severity + In + Cfg_ProgTime at stable member names) so the polled-read + state-diff pattern fits without concessions. ALMA (analog alarm) deferred to a follow-up because its HHLimit/HLimit/LLimit/LLLimit threshold + In value semantics deserve their own design pass — raising on threshold-crossing is not the same shape as raising on InFaulted-edge. AbCipDriverOptions gains two knobs: EnableAlarmProjection (default false) + AlarmPollInterval (default 1s). Explicit opt-in because projection semantics don't exactly mirror Rockwell FT Alarm & Events; shops running FT Live should leave this off + take alarms through the native A&E route. AbCipAlarmProjection is the state machine: per-subscription background loop polls the source-node set via the driver's public ReadAsync — which gains the #194 whole-UDT optimization for free when ALMDs are declared with their standard member set, so one poll tick reads (N alarms × 2 members) = N libplctag round-trips rather than 2N. Per-tick state diff: compare InFaulted + Severity against last-seen, fire raise (0→1) / clear (1→0) with AlarmSeverity bucketed via the 1-1000 Logix severity scale (≤250 Low, ≤500 Medium, ≤750 High, rest Critical — matches OpcUaClient's MapSeverity shape). ConditionId is {sourceNode}#active — matches a single active-branch per alarm which is all ALMD supports; when Cfg_ProgTime-based branch identity becomes interesting (re-raise after ack with new timestamp), a richer ConditionId pass can land. Subscribe-while-disabled returns a handle wrapping id=0 — capability negotiation (the server queries IAlarmSource presence at driver-load time) still succeeds, the alarm surface just never fires. Unsubscribe cancels the sub's CTS + awaits its loop; ShutdownAsync cancels every sub on its way out so a driver reload doesn't leak poll tasks. AcknowledgeAsync routes through the driver's existing WriteAsync path — per-ack writes {SourceNodeId}.Acked = true (the simpler semantic; operators whose ladder watches AckCmd + rising-edge can wire a client-side pulse until a driver-level edge-mode knob lands). Best-effort — per-ack faults are swallowed so one bad ack doesn't poison the whole batch. Six new AbCipAlarmProjectionTests: detector flags ALMD signature + skips non-signature UDTs + atomics; severity mapping matches OPC UA A&C bucket boundaries; feature-flag OFF returns a handle but never touches the fake runtime (proving no background polling happens); feature-flag ON fires a raise event on 0→1; clear event fires on 1→0 after a prior raise; unsubscribe stops the poll loop (ReadCount doesn't grow past cancel + at most one straggler read). Driver builds 0 errors; AbCip.Tests 233/233 (was 227, +6 new). Task #177 closed — the last pending AB CIP follow-up is now #194 (already shipped). Remaining pending fleet-wide: #150 (Galaxy MXAccess failover hardware) + #199 (UnsTab Playwright smoke).
Sign in to join this conversation.