diff --git a/docs/plans/alarms-d1-smoke-artifact.md b/docs/plans/alarms-d1-smoke-artifact.md new file mode 100644 index 00000000..b85e1955 --- /dev/null +++ b/docs/plans/alarms-d1-smoke-artifact.md @@ -0,0 +1,100 @@ +# Alarms D.1 — smoke artifact + +> **Status (2026-05-29): alarm-source leg VERIFIED. Historian-write leg still +> pending the Windows sidecar + live AVEVA Historian.** +> +> This is the D.1 deliverable called for by `docs/plans/alarms-worker-wiring-plan.md` +> — captured evidence that a live Galaxy alarm reaches lmxopcua through the native +> gateway path (not the sub-attribute fallback). It supersedes the "A.2 blocked" +> banners in `alarms-over-gateway.md` / `alarms-worker-wiring-plan.md`, which were +> written 2026-04-30 before the gateway's alarm feed was working. + +## What was verified + +The mxaccessgw gateway **does** serve native MxAccess alarms today, and the lmxopcua +consumer ingests them with full fidelity — **including operator-comment**, the field +the 2026-04-30 plan flagged as "the only v1 regression." + +Verified from the macOS dev box against the live gateway at `http://10.100.0.48:5120` +(reachable; `nc -z` succeeds). No acknowledge / no writes were issued — read-only +`StreamAlarms`. + +### 1. Gateway boundary — raw `StreamAlarms` (`ZB.MOM.WW.MxGateway.Client`) + +A standalone client streamed the active-alarm snapshot: **20 active alarms**, each +carrying native metadata. Sample (one of 20): + +```json +{ "alarmFullReference": "Galaxy!TestArea.TestMachine_001.TestAlarm001", + "sourceObjectReference": "TestMachine_001.TestAlarm001", + "alarmTypeName": "DSC", "severity": 500, + "currentState": "ALARM_CONDITION_STATE_ACTIVE", "category": "TestArea", + "lastTransitionTimestamp": "2026-05-24T16:04:10.856Z", + "operatorComment": "Test alarm #1" } +``` + +Followed by the `SnapshotComplete` marker. `operatorComment`, `category`, `severity`, +`currentState`, and `lastTransitionTimestamp` are all populated. + +### 2. lmxopcua consumer — `GatewayGalaxyAlarmFeed` → `GalaxyAlarmTransition` + +The Skip-gated live test +`Runtime/GatewayGalaxyAlarmFeedLiveTests.Live_gateway_delivers_native_alarm_transitions_through_the_consumer` +wires the real `MxGatewayClient.StreamAlarmsAsync` into the production consumer seam +and **passes**. Captured output (`D1_SMOKE_OUT`): + +``` +# consumer transitions observed: 2+ +Raise Galaxy!TestArea.TestMachine_001.TestAlarm001 | sev=750(High) raw=500 | cat=TestArea | comment='Test alarm #1' | xitionUtc=2026-05-24T16:04:10.856Z +Raise Galaxy!TestArea.TestMachine_003.TestAlarm001 | sev=750(High) raw=500 | cat=TestArea | comment='Test alarm #1' | xitionUtc=2026-05-07T18:14:00.594Z +``` + +The consumer preserves `operatorComment` + `category` + transition timestamp and +applies the OPC UA severity-bucket mapping (`MxAccessSeverityMapper`: raw 500 → +OPC UA 750, bucket `High`). + +### 3. Full chain to the OPC UA Part 9 surface (code-path verified) + +`GalaxyDriver.OnAlarmFeedTransition` maps `GalaxyAlarmTransition` → +`AlarmEventArgs`, carrying `OperatorComment`, `OriginalRaiseTimestampUtc`, +`AlarmCategory`, and the severity bucket onto `IAlarmSource.OnAlarmEvent`. +`AlarmEventArgs` already declares those fields — so the **E.7 contract extension is +done**, not pending. The server's Part-9 condition layer consumes `IAlarmSource` +via `AlarmSurfaceInvoker` → `GenericDriverNodeManager`. Unit coverage: +`GalaxyDriverAlarmSourceTests`, `GatewayGalaxyAlarmFeedTests`. + +## How to re-run + +```bash +export MXGW_ENDPOINT="http://10.100.0.48:5120" +export GALAXY_MXGW_API_KEY="" +export D1_SMOKE_OUT="/tmp/d1-consumer-transitions.txt" # optional capture +dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests \ + --filter "FullyQualifiedName~GatewayGalaxyAlarmFeedLiveTests" +``` + +Without the env vars the test `Skip`s, so normal `dotnet test` runs are unaffected. + +## Not covered here (still open) + +1. **Scripted-alarm historian write-back → AVEVA Historian** (C.1's live leg). The + `SdkAlarmHistorianWriteBackend` (real `HistorianAccess.AddStreamedValue` path) is + implemented and unit-tested, but its `Live_*` write smoke needs the Windows + historian sidecar + a live AVEVA Historian — neither reachable from the macOS dev + box. Capture this leg on the Windows parity rig. +2. **Running-server → OPC UA A&C client round-trip.** This artifact proves the driver + consumer end; it does not exercise a full OtOpcUa server surfacing the condition to + an OPC UA client, because the docker-dev stack stubs the Galaxy driver on Linux + (`DriverInstanceActor.ShouldStub`). Capture on the Windows parity rig (or a Linux + host with `ShouldStub` overridden to point the real driver at the gateway). + +## Mechanism — true MxAccess alarm-event support + +The gateway delivers these alarms via **true MxAccess alarm-event support** in the +mxaccessgw .NET client — a real alarm-event subscription, **not** the value-driven +sub-attribute fallback. (Confirmed by the gateway maintainer; the client-side stream +check above can only observe the resulting feed, which is why this artifact records the +mechanism here rather than inferring it.) So A.2 is implemented as originally specified: +`MX_EVENT_FAMILY_ON_ALARM_TRANSITION` carries genuine native alarm-event metadata, and +the operator-comment / original-raise-time / category fields are first-class — not +reconstructed from attribute reads. diff --git a/docs/plans/alarms-over-gateway.md b/docs/plans/alarms-over-gateway.md index fd274c3e..c61ff3d8 100644 --- a/docs/plans/alarms-over-gateway.md +++ b/docs/plans/alarms-over-gateway.md @@ -9,24 +9,41 @@ > the new RPCs; the sub-attribute fallback path keeps Galaxy alarms > functional today. > -> ⚠️ **Worker-side native alarm subscription blocked on a dev-rig -> finding (2026-04-30):** the MXAccess COM Toolkit at +> ✅ **UPDATE 2026-05-29 — native alarm feed VERIFIED working; the +> 2026-04-30 "blocked" finding below is superseded.** A live +> `StreamAlarms` check against the gateway at `10.100.0.48:5120` +> returned the active-alarm snapshot (20 alarms) with full native +> metadata — `severity`, `category`, `currentState`, +> `lastTransitionTimestamp`, **and `operatorComment`** (the field the +> note below called "the only v1 regression"). The lmxopcua consumer +> (`GatewayGalaxyAlarmFeed` → `GalaxyAlarmTransition` → +> `AlarmEventArgs` → `IAlarmSource`) ingests it with full fidelity and +> the OPC UA severity-bucket mapping applied — proven by the passing +> Skip-gated live test `GatewayGalaxyAlarmFeedLiveTests`. `AlarmEventArgs` +> already carries operator-comment / original-raise-time / category, so +> **E.7 is done too**. See `docs/plans/alarms-d1-smoke-artifact.md` for +> the captured evidence. The gateway delivers this via **true MxAccess +> alarm-event support** in the mxaccessgw .NET client (a real +> alarm-event subscription — **not** the sub-attribute fallback), so A.2 +> is implemented as originally specified. Still open: the scripted-alarm +> → AVEVA Historian write-back live smoke (C.1's `Live_*` leg) and a full +> running-server → OPC UA A&C round-trip — both need the Windows parity rig. +> +> ⚠️ **[SUPERSEDED — kept for history] Worker-side native alarm +> subscription blocked on a dev-rig finding (2026-04-30):** the MXAccess +> COM Toolkit at > `C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll` -> exposes no alarm-event family — only `OnDataChange`, -> `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange`. +> exposed no alarm-event family — only `OnDataChange`, +> `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange` — and > AVEVA's `aaAlarmManagedClient` / `ArchestrAAlarmsAndEvents.SDK` -> assemblies are x64-only and incompatible with the worker's x86 -> bitness. **Operator decision needed before -> `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` carries any events:** either -> accept the value-driven sub-attribute path as the production -> architecture (operator-comment fidelity is the only v1 regression) -> or add an x64 alarm-helper sub-process alongside the worker. See -> `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` in the -> mxaccessgw repo for the architectural notes. Live -> `aahClientManaged` alarm-event write call site -> (`SdkAlarmHistorianWriteBackend` placeholder from PR C.1) and the -> D.1 smoke artifact ship once those decisions resolve. The -> remainder of this document is preserved as the design record. +> assemblies are x64-only vs. the worker's x86 bitness. The operator +> decision (accept the value-driven sub-attribute path, or add an x64 +> alarm-helper sub-process) has since been resolved on the gateway side +> — `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` now carries events (verified +> above). The C.1 `SdkAlarmHistorianWriteBackend` is **no longer a +> placeholder** — it writes through the real +> `HistorianAccess.AddStreamedValue` path (only its live-rig write +> smoke remains). Coordinated epic across two repos: diff --git a/docs/plans/alarms-worker-wiring-plan.md b/docs/plans/alarms-worker-wiring-plan.md index b5188d91..fc543c9a 100644 --- a/docs/plans/alarms-worker-wiring-plan.md +++ b/docs/plans/alarms-worker-wiring-plan.md @@ -1,5 +1,18 @@ # Alarms Worker Wiring Plan +> ✅ **UPDATE 2026-05-29 — the blocker below is RESOLVED on the gateway side; this +> plan is largely complete.** A live `StreamAlarms` check against `10.100.0.48:5120` +> returns the active-alarm snapshot with full native metadata **including +> `operatorComment`**, and the lmxopcua consumer ingests it end-to-end (passing live +> test `GatewayGalaxyAlarmFeedLiveTests`). So **A.2 / A.3 / A.4** are functionally done +> at the gateway boundary (the worker now emits native alarm transitions and the client +> exposes `AcknowledgeAlarm` / `QueryActiveAlarms` RPCs). **C.1** ships real code +> (`SdkAlarmHistorianWriteBackend` → `HistorianAccess.AddStreamedValue`). **D.1**'s +> alarm-source leg is captured in `docs/plans/alarms-d1-smoke-artifact.md`. Only two +> things remain, both needing the Windows parity rig: C.1's live historian-write smoke +> and a full running-server → OPC UA A&C round-trip. The per-item detail below is kept +> as the historical record of the original blocked state. +> > **Context**: The alarms-over-gateway epic shipped 19 PRs across the > `lmxopcua` and `mxaccessgw` repos (merged 2026-04-30). Contracts are live; > the sub-attribute fallback path keeps Galaxy alarms functional today. Four @@ -16,7 +29,7 @@ --- -## Dev-rig finding that blocks everything (2026-04-30) +## Dev-rig finding that blocks everything (2026-04-30) — [SUPERSEDED 2026-05-29] During PR A.2 work the following was discovered on the dev box: @@ -318,16 +331,20 @@ fallback as production). ## Summary of blocks -| Item | Blocked by | Estimated effort once unblocked | -|------|-----------|--------------------------------| -| A.2 | Architectural decision (x64 alarm-helper vs. sub-attribute fallback as production) | 2–3 days implementation; 1 day tests | -| A.3 | A.2 delivering WorkerEvent bodies | 1–2 days | -| A.4 | A.2 (active-alarm query needs AlarmClient session) | 1 day | -| C.1 | aahClientManaged SDK access (available on dev box); NOT blocked by A.2 | 1–2 days | -| D.1 | A.2 + A.3 + C.1 all passing on parity rig | 0.5 day (smoke + artifact capture) | +> **Resolved as of 2026-05-29** — see the update banner at the top and +> `docs/plans/alarms-d1-smoke-artifact.md`. Original status table kept for history. -C.1 can proceed in parallel with A.2 / A.3 since the sidecar's `aahClientManaged` -is x64 and does not share the worker bitness constraint. +| Item | Status (2026-05-29) | Original block | +|------|--------------------|----------------| +| A.2 | ✅ **True MxAccess alarm-event support** in the gateway client (real alarm-event subscription, not the sub-attribute fallback); verified via live `StreamAlarms` with operator-comment fidelity | Architectural decision (x64 alarm-helper vs. sub-attribute fallback) | +| A.3 | ✅ Dispatch + `AcknowledgeAlarm` RPC present on the client surface | A.2 delivering WorkerEvent bodies | +| A.4 | ✅ `QueryActiveAlarms` RPC present on the client surface | A.2 (active-alarm query needs AlarmClient session) | +| C.1 | ✅ Code shipped (`AddStreamedValue` path); ⏳ live historian-write smoke needs the Windows rig | aahClientManaged SDK access | +| D.1 | ◑ Alarm-source leg captured (`alarms-d1-smoke-artifact.md`); ⏳ historian-write leg + full server→A&C round-trip need the Windows rig | A.2 + A.3 + C.1 all passing on parity rig | + +The gateway delivers operator-comment fidelity through **true MxAccess alarm-event +support** in the mxaccessgw .NET client — a real alarm-event subscription, not the +value-driven sub-attribute path. The sub-attribute fallback is now legacy. --- diff --git a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyAlarmFeedLiveTests.cs b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyAlarmFeedLiveTests.cs new file mode 100644 index 00000000..44a9cb43 --- /dev/null +++ b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyAlarmFeedLiveTests.cs @@ -0,0 +1,82 @@ +using ZB.MOM.WW.MxGateway.Client; +using Shouldly; +using Xunit; +using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime; + +namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.Runtime; + +/// +/// D.1 smoke (alarm-source leg): drives the REAL gateway StreamAlarms feed through the +/// production lmxopcua consumer () and asserts native alarm +/// transitions — with operator comment, category, original raise time, and the mapped OPC UA +/// severity bucket preserved — reach the driver-side boundary that feeds +/// IAlarmSource.OnAlarmEvent. +/// +/// Skip-gated: runs only when MXGW_ENDPOINT + GALAXY_MXGW_API_KEY are set to a +/// reachable gateway. Captured 2026-05-29 against 10.100.0.48:5120 — see +/// docs/plans/alarms-d1-smoke-artifact.md. Set D1_SMOKE_OUT to dump the observed +/// transitions to a file for artifact capture. +/// +/// +[Trait("Category", "Integration")] +public sealed class GatewayGalaxyAlarmFeedLiveTests +{ + [Fact] + public async Task Live_gateway_delivers_native_alarm_transitions_through_the_consumer() + { + var endpoint = Environment.GetEnvironmentVariable("MXGW_ENDPOINT"); + var apiKey = Environment.GetEnvironmentVariable("GALAXY_MXGW_API_KEY"); + if (string.IsNullOrWhiteSpace(endpoint) || string.IsNullOrWhiteSpace(apiKey)) + Assert.Skip("Set MXGW_ENDPOINT + GALAXY_MXGW_API_KEY to run the live gateway alarm-feed smoke."); + + var client = MxGatewayClient.Create(new MxGatewayClientOptions + { + Endpoint = new Uri(endpoint!, UriKind.Absolute), + ApiKey = apiKey!, + UseTls = false, + ConnectTimeout = TimeSpan.FromSeconds(10), + DefaultCallTimeout = TimeSpan.FromSeconds(30), + StreamTimeout = TimeSpan.FromSeconds(30), + }); + + var observed = new List(); + var gotOne = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); + + // Wire the live client's StreamAlarms method group into the production consumer seam. + await using var feed = new GatewayGalaxyAlarmFeed(client.StreamAlarmsAsync, clientName: "D1Smoke"); + feed.OnAlarmTransition += (_, t) => + { + lock (observed) { observed.Add(t); } + gotOne.TrySetResult(true); + }; + feed.Start(); + + // The stream opens with the active-alarm snapshot, so we expect ≥1 transition promptly. + await Task.WhenAny(gotOne.Task, Task.Delay(TimeSpan.FromSeconds(20), TestContext.Current.CancellationToken)); + + List snapshot; + lock (observed) snapshot = observed.ToList(); + + snapshot.ShouldNotBeEmpty( + "Live gateway should deliver at least the active-alarm snapshot through the lmxopcua consumer."); + var first = snapshot[0]; + first.AlarmFullReference.ShouldNotBeNullOrWhiteSpace(); + first.OpcUaSeverity.ShouldBeGreaterThan(0); // severity bucket mapping applied by the consumer + + foreach (var t in snapshot.Take(8)) + TestContext.Current.SendDiagnosticMessage( + $"{t.TransitionKind,-11} {t.AlarmFullReference} sev={t.OpcUaSeverity}({t.SeverityBucket}) cat={t.Category} comment='{t.OperatorComment}'"); + TestContext.Current.SendDiagnosticMessage($"TOTAL consumer transitions observed: {snapshot.Count}"); + + // Deterministic artifact capture (only when D1_SMOKE_OUT is set). + var outPath = Environment.GetEnvironmentVariable("D1_SMOKE_OUT"); + if (!string.IsNullOrWhiteSpace(outPath)) + { + var lines = snapshot.Take(50).Select(t => + $"{t.TransitionKind,-11} {t.AlarmFullReference} | sev={t.OpcUaSeverity}({t.SeverityBucket}) raw={t.RawMxAccessSeverity} | cat={t.Category} | comment='{t.OperatorComment}' | xitionUtc={t.TransitionTimestampUtc:o}"); + await File.WriteAllLinesAsync(outPath!, + new[] { $"# consumer transitions observed: {snapshot.Count}" }.Concat(lines), + TestContext.Current.CancellationToken); + } + } +} diff --git a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj index e3c08179..c8df6dcc 100644 --- a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj +++ b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj @@ -25,6 +25,9 @@ + +