diff --git a/docs/plans/2026-06-11-alarm-followups.md b/docs/plans/2026-06-11-alarm-followups.md new file mode 100644 index 00000000..c5eb8902 --- /dev/null +++ b/docs/plans/2026-06-11-alarm-followups.md @@ -0,0 +1,291 @@ +# Alarm Ack/Shelve Follow-ups Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task. + +**Goal:** Resolve the six follow-ups left by the T17–T24 inbound-alarm-ack work: the redundancy double-emit, the Timed-shelve UI, two T21 minors, the docker-dev rig cleanup, and two pre-existing Layer-1 gaps (Galaxy reconnect, the live-pill). + +**Architecture:** The double-emit is fixed by **primary-only emission at the source** — both `ScriptedAlarmHostActor` (the `alerts` publish) and the per-node `HistorianAdapterActor` (the sink write) subscribe to the existing `redundancy-state` DPS topic, cache the local `RedundancyRole`, and act only when `Primary` (mirroring `OpcUaPublishActor`). OPC UA node writes + inbound-command processing stay ungated for warm-standby. The other five items are localized fixes (one Galaxy-driver lifecycle bug, one broadcaster-health interface extension + 2 razors, two small Alerts.razor additions, one mechanical refactor, one operational cleanup). + +**Tech Stack:** .NET 10, Akka.NET (cluster, DistributedPubSub, TestKit/xunit2), Blazor Server (InteractiveServer, NO bUnit), xUnit + Shouldly, OPC Foundation UA .NET Standard, Serilog. + +**Design of record:** `docs/plans/2026-06-11-alarm-followups-design.md` (committed master `bcb9f45c`). + +**Hard rules:** stage by explicit path (never `git add .`); never stage `sql_login.txt` / `src/Server/.../Host/pki/`; never echo the gateway API key into a **new** tracked file; no force-push, no `--no-verify`; **no Configuration entity / EF migration change**. Build on a feature branch off master. + +--- + +### Task 0: Branch + baseline + +**Classification:** trivial +**Estimated implement time:** ~1 min +**Parallelizable with:** none + +**Files:** (none — git only) + +**Steps:** +1. `git checkout master && git switch -c feat/alarm-followups` (off `bcb9f45c`). +2. Confirm clean tree + green baseline build: `dotnet build ZB.MOM.WW.OtOpcUa.slnx` → 0 errors. +3. No commit (branch only). + +--- + +### Task 1: Redundancy emit-gate in `ScriptedAlarmHostActor` (A1) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 2, Task 3, Task 4, Task 5 + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs` + +**Context:** `OnEngineEmission` (≈ line 247–279) does two things per emission: `_publishActor.Tell(AlarmStateUpdate(...))` (line 261, OPC UA node write) and `_mediator.Tell(new Publish(AlertsTopic, evt))` (line 278, cluster-wide `alerts`). Both central nodes run this → the `alerts` publish doubles. Gate **only line 278** on `Primary`. + +**Reference pattern to mirror** (`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs`): `RedundancyStateTopic = "redundancy-state"` (line 30); `Subscribe(RedundancyStateTopic, Self)` in PreStart (line 156); `Receive(HandleRedundancyStateChanged)` (line 147); `HandleRedundancyStateChanged` reads `msg.Nodes.FirstOrDefault(n => n.NodeId == localNode)?.Role` (lines 335–351). `RedundancyRole` lives in `ZB.MOM.WW.OtOpcUa.Commons.Messages.Redundancy` (`Primary`/`Secondary`/`Detached`). The host already knows its local node id (it's used elsewhere; if not, derive it the same way `OpcUaPublishActor` resolves `_localNode`). + +**Step 1: Failing TestKit tests** (extend the existing harness; Runtime.Tests = xunit v2 + Akka.TestKit.Xunit2): +- `Emission_is_published_to_alerts_by_default_before_any_redundancy_state` — a fresh host (no `RedundancyStateChanged` yet) DOES publish the `AlarmTransitionEvent` to `alerts` (subscribe a probe to `AlertsTopic`). +- `Secondary_node_suppresses_alerts_publish_but_still_writes_opcua` — after a `RedundancyStateChanged` marking the local node `Secondary`, an emission does NOT publish to `alerts` but DOES still `Tell` the `OpcUaPublishActor.AlarmStateUpdate` (probe the publish-actor seam the test already uses for T9/T19). +- `Primary_node_publishes_alerts` — after a `Primary` `RedundancyStateChanged`, the emission publishes to `alerts`. +- `Inbound_AlarmCommand_is_processed_regardless_of_role` — a `Secondary` host still drives the engine for an inbound `AlarmCommand` (the existing T19 ack test, asserted under a Secondary role). + +**Step 2:** Run them — expect FAIL (no gate yet; Secondary still publishes). + +**Step 3: Implement.** Add a cached `RedundancyRole? _localRole = null;` (null = unknown ⇒ treat as Primary/emit). In PreStart, `_mediator.Tell(new Subscribe(OpcUaPublishActor.RedundancyStateTopic, Self))` (reuse the const; if cross-project reference is awkward, introduce a shared `Commons` const for `"redundancy-state"` and point both at it — surface that deviation). Add `Receive` + a `SubscribeAck` no-op. In the handler, set `_localRole` from the snapshot for the local node. In `OnEngineEmission`, before line 278: +```csharp +// Warm-standby dedup: only the Primary (driver-role leader) publishes the cluster-wide +// transition + drives historization. Default-emit until told we are Secondary/Detached so +// single-node deploys + the boot window never drop transitions. The OPC UA node write +// above (warm address space) and inbound command processing stay ungated. +if (_localRole is RedundancyRole.Secondary or RedundancyRole.Detached) + return; +_mediator.Tell(new Publish(AlertsTopic, evt)); +``` +Leave line 261 (the `_publishActor.Tell`) and `OnAlarmCommand` untouched. + +**Step 4:** Run `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter ScriptedAlarmHostActor` → all green. + +**Step 5: Commit** by explicit path (`ScriptedAlarmHostActor.cs` + the test). + +> High-risk: concurrency (role cached on the actor thread, read in the emission path which is also marshalled onto the actor thread — confirm `OnEngineEmission` runs on the actor thread, it does via `Self.Tell(EngineEmission)`) + redundancy semantics. Do NOT gate the OPC UA write or commands. + +--- + +### Task 2: Redundancy historize-gate in `HistorianAdapterActor` (A2) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 1, Task 3, Task 4, Task 5 + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs` +- Modify (only if the actor needs the local node id / mediator wired): `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs` (≈ line 146, where it's spawned `historian-adapter`) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Historian/HistorianAdapterActorTests.cs` (create if absent; else extend the existing historian test) + +**Context:** `HistorianAdapterActor` is **per-node** and consumes the `alerts` topic, so one publish historizes once per node → double DB writes. Gate the sink write on `Primary`, same pattern as Task 1. + +**Step 1: Failing tests** (TestKit + a fake `IAlarmHistorianSink` that records writes): +- `Default_before_redundancy_state_historizes` — a fresh adapter writes to the sink (treat unknown as Primary). +- `Secondary_node_does_not_historize` — after a `Secondary` `RedundancyStateChanged`, an incoming alarm event is NOT written to the sink. +- `Primary_node_historizes` — after `Primary`, it IS written. + +**Step 2:** Run — FAIL. + +**Step 3: Implement** the same `_localRole` cache + `redundancy-state` subscription + `Receive` as Task 1, and guard the sink write: `if (_localRole is RedundancyRole.Secondary or RedundancyRole.Detached) return;` before `_sink.Write...`. The actor will need its local node id; resolve it the way `OpcUaPublishActor`/`ServiceCollectionExtensions` already does (pass it into `Props` if not already available — check how `OpcUaPublishActor` gets `_localNode` and mirror; thread it through `HistorianAdapterActor.Props` + the registration at `ServiceCollectionExtensions.cs:146`). + +**Step 4:** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter Historian` → green. Also confirm the full Runtime.Tests still pass. + +**Step 5: Commit** by explicit path. + +> High-risk: data-historization correctness + concurrency. If wiring the node id into `Props` ripples beyond the two files, surface it before expanding. + +--- + +### Task 3: Galaxy reconnect recreates a faulted session + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 1, Task 2, Task 4, Task 5 + +**Files:** +- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/GalaxyMxSession.cs` (≈ line 58–69) +- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs` (≈ line 289, `ReopenAsync`) +- Test: the Galaxy driver test project (find `*ReconnectSupervisor*` / `*Session*` tests under `tests/Drivers/...Driver.Galaxy.Tests/`; mirror their seam) + +**Context:** `GalaxyMxSession.ConnectAsync` has `if (_session is not null) return;` — so when the gRPC session is Faulted/NotFound the field is still a non-null dead handle and `ConnectAsync` is a silent no-op. `GalaxyDriver.ReopenAsync` calls `ConnectAsync` expecting a reconnect → no-op → `ReconnectSupervisor.RecoveryLoopAsync` (`Runtime/ReconnectSupervisor.cs:158-186`) sees reopen "succeed", replay fails, loops forever. + +**Step 1: Failing test.** Assert the reconnect path **creates a new session** after a faulted one: drive `GalaxyMxSession` (or a fake `MxGatewayClient`/session seam) to a connected state, simulate fault/dispose, call the reopen/recreate path, and assert `OpenSessionAsync` + `RegisterAsync` were invoked **again** (a second create), not a no-op. Mirror the existing session/reconnect test construction (read how they fake `MxGatewayClient`/`OpenSessionAsync`). + +**Step 2:** Run — FAIL (current code no-ops, second create never happens). + +**Step 3: Implement.** Add a recreate path to `GalaxyMxSession`, e.g.: +```csharp +/// Disposes the current (faulted/stale) session + owned client so the next +/// rebuilds a fresh session instead of no-op'ing on the dead handle. +public async Task RecreateAsync(MxGatewayClientOptions clientOptions, CancellationToken ct) +{ + ObjectDisposedException.ThrowIf(_disposed, this); + await DisposeSessionAsync().ConfigureAwait(false); // dispose+null _session, _serverHandle, _ownedClient + await ConnectAsync(clientOptions, ct).ConfigureAwait(false); +} +``` +(Factor a private `DisposeSessionAsync()` that safely tears down the gRPC session/channel + nulls the fields — reuse whatever `DisposeAsync` already does for teardown, minus marking `_disposed`.) Then in `GalaxyDriver.ReopenAsync` call `RecreateAsync` instead of `ConnectAsync` so every reopen rebuilds. Keep the supervisor's backoff loop unchanged — it now actually recovers. + +**Step 4:** Run the Galaxy reconnect/session tests → green. `dotnet build` the Galaxy driver clean. + +**Step 5: Commit** by explicit path. + +> High-risk: driver session lifecycle + gRPC channel teardown + concurrency with the supervisor loop. Confirm `DisposeSessionAsync` is safe to call repeatedly and from the supervisor thread. + +--- + +### Task 4: Broadcaster connection-health signal + bridges + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 1, Task 2, Task 3, Task 5 + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/IInProcessBroadcaster.cs` +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/InProcessBroadcaster.cs` (the impl) +- Modify: the two bridge actors that publish into the broadcaster — `ScriptLogSignalRBridge` + `AlertSignalRBridge` (find under `AdminUI/` or `Runtime/`; they `Tell`/`Subscribe` the DPS topics and push to the broadcaster) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Client.UI.Tests/` (or wherever `InProcessBroadcaster`/broadcaster tests live — find `InProcessBroadcasterTests`) + +**Step 1: Failing unit test** for the new health signal: a new `InProcessBroadcaster` reports `IsConnected == false` (or a sensible default); calling `SetConnected(true)` raises `ConnectionStateChanged` with `true` and flips `IsConnected`; `SetConnected(false)` flips back + raises. + +**Step 2:** Run — FAIL (members don't exist). + +**Step 3: Implement.** Extend the interface: +```csharp +bool IsConnected { get; } +event Action? ConnectionStateChanged; +void SetConnected(bool connected); +``` +Implement in `InProcessBroadcaster` (raise only on change). Then in each bridge actor, call `_broadcaster.SetConnected(true)` when its DPS `SubscribeAck` lands (subscription live) and `SetConnected(false)` on `PostStop`/failure. (Read how the bridge subscribes; mirror the `SubscribeAck` handling already there.) Default `IsConnected`: choose `true` once subscribed; before the first ack, `false` — the razor will reconcile on the `ConnectionStateChanged` event. + +**Step 4:** Run the broadcaster tests → green; AdminUI builds clean (`TreatWarningsAsErrors`). + +**Step 5: Commit** by explicit path. + +--- + +### Task 5: `CorrelationId` wrapper for the alarm commands (3b) + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 1, Task 2, Task 3, Task 4 + +**Files:** +- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/AcknowledgeAlarmCommand.cs` + `ShelveAlarmCommand.cs` (+ their `*Result` records) +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (the alarm handlers) +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Clients/AdminOperationsClient.cs` +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AdminOperationsActorTests.cs` (the 5 alarm tests) + +**Context:** the alarm commands use a bare `Guid CorrelationId`; the project's other control-plane commands (`StartDeployment`/`RestartDriver`/`ReconnectDriver`) use the `CorrelationId` wrapper type. Make them consistent. + +**Steps:** +1. Find the `CorrelationId` wrapper type + how `RestartDriver` uses it (`CorrelationId.NewId()`, the record field type, the reply echo). +2. Change `AcknowledgeAlarmCommand`/`ShelveAlarmCommand` (+ `*Result`) `CorrelationId` field type `Guid` → `CorrelationId`. Update `AdminOperationsClient.AcknowledgeAlarmAsync`/`ShelveAlarmAsync` to mint `CorrelationId.NewId()`. Update the `AdminOperationsActor` handlers' reply construction. +3. Update the 5 `AdminOperationsActorTests` to the wrapper type. +4. `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests --filter AdminOperations` → green; build clean. +5. **Commit** by explicit path. + +--- + +### Task 6: `Alerts.razor` — Timed-shelve picker + chip auto-clear + live-pill + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 7 +**Blocked by:** Task 4 (needs the broadcaster `ConnectionStateChanged`/`IsConnected`) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Alerts.razor` + +**Context:** all three changes are in one file → bundled so they don't contend. NO bUnit; proven by docker-dev live-verify (Task 8). + +**Steps (no failing test — razor; verify by build + Task 8):** +1. **Timed-shelve picker:** add a small duration input to the row's shelve control (a minutes number box is simplest, default e.g. 5). Add a "Shelve (timed)" action that calls `IAdminOperationsClient.ShelveAlarmAsync(alarmId, user, ShelveKind.Timed, unshelveAtUtc: , comment: null, ct)`. Keep the existing OneShot Shelve + Unshelve buttons. Dispose the CTS (`using var cts = …`) like the existing handlers. +2. **Chip auto-clear:** after `ShowOpResult`, start a `~8s` auto-clear (mirror `DriverStatusPanel.razor`'s timer: a `Task.Delay(8000)` continuation that, if the chip is still the same one, clears `_opResult*` + `InvokeAsync(StateHasChanged)`). Read `DriverStatusPanel.razor` for the exact pattern (cancellation on a newer action). +3. **Live-pill:** replace the set-once `_connected = true` (≈ line 132) — in `OnInitializedAsync` read `Alarms.IsConnected` and subscribe `Alarms.ConnectionStateChanged += OnConnChanged;` (handler sets `_connected` + `InvokeAsync(StateHasChanged)`); unsubscribe in `Dispose`. (`Alarms` = the injected `IInProcessBroadcaster`.) +4. `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` → 0 warnings/errors. +5. **Commit** by explicit path. + +--- + +### Task 7: `ScriptLog.razor` live-pill + reconnect-overlay check + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** Task 6 +**Blocked by:** Task 4 + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/ScriptLog.razor` (≈ line 122) +- Read/verify (no edit unless missing): the Blazor host shell (`App.razor` / `_Host.cshtml` / `MainLayout`) for the default reconnect overlay (`components-reconnect-modal` / `
`) + +**Steps:** +1. Same live-pill fix as Task 6 step 3, against `ScriptLogs` (the injected `IInProcessBroadcaster`): read `IsConnected`, subscribe `ConnectionStateChanged`, update + `StateHasChanged`, unsubscribe in `Dispose`. +2. **Verify the dead-circuit overlay exists:** grep the host shell for the Blazor reconnect modal markup. If present, note it in the commit message (the dead-circuit case is covered by Blazor's built-in reconnection UI — the component can't self-update a dead circuit). If ABSENT, add the standard `components-reconnect-modal` markup. Do NOT fake liveness from a dead component. +3. `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` → clean. +4. **Commit** by explicit path. + +--- + +### Task 8: Live-verify on docker-dev + +**Classification:** verification +**Estimated implement time:** ~ (manual) +**Parallelizable with:** none +**Blocked by:** Task 1, Task 2, Task 3, Task 6, Task 7 (Task 5 is internal; include it in the build) + +**Steps:** rebuild docker-dev central nodes on the new image (`docker compose -f docker-dev/docker-compose.yml up -d --build central-1 central-2`), then on `/alerts` (login disabled — agent drives): +1. **Double-emit fixed:** confirm each t12-overheat transition now appears **ONCE** (not twice). Cross-check both nodes' logs: only the **driver-leader (Primary)** node logs the alerts publish; the Secondary suppresses it but still writes its OPC UA node (its address space stays current). (Find the Primary via the `redundancy-state`/ServiceLevel — or just observe single rows.) +2. **Timed-shelve:** drive the new timed-shelve control → a "Shelved" transition appears (operator = `multi-role-test`); after the duration the SDK auto-unshelves (or verify the `UnshelveAtUtc` was set). +3. **Chip auto-clear:** the Ack/Shelve result chip clears after ~8s. +4. **Live-pill:** the pill shows "live" while connected; (optional) kill the bridge / restart a node and confirm the pill flips / the Blazor reconnect overlay shows. +5. **Galaxy reconnect** (best-effort, may need the gateway): if feasible, restart the Galaxy gateway and confirm the driver recreates the session + recovers without a container restart (watch the logs for a fresh `OpenSessionAsync`). If not feasible live, rely on the Task 3 unit test + note it. +Defects → new fix tasks. **Agent drives; no sign-in needed (docker-dev login disabled).** + +--- + +### Task 9: docker-dev rig cleanup + +**Classification:** trivial (operational) +**Estimated implement time:** ~ (manual) +**Parallelizable with:** none +**Blocked by:** Task 8 (the rig artifacts are the live-verify vehicle) + +**Steps:** remove the seed artifacts left for verification, then redeploy: +1. In the docker-dev AdminUI (or via DB), delete the `t12-overheat` scripted alarm, the `SC-ba675b168a85` predicate script, and the `layer0-logcheck` vtag + script. Revert filler-02's `cycle-time-s` script to `return ctx.GetTag("TestMachine_002.TestDuration").Value;`. +2. Redeploy: `POST http://localhost:9200/api/deployments` with `X-Api-Key: docker-dev-deploy-key` → `202`. +3. Confirm `/alerts` + `/scripted-alarms` + `/scripts` no longer list the removed artifacts. No code commit (DB/deploy only). Note completion in Task 10's commit message. + +> Operational only — no source change. If the user wants the rig kept, skip this task and record it as deferred. + +--- + +### Task 10: Docs + finish branch + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** none +**Blocked by:** Task 9 + +**Files:** `docs/Redundancy.md` (alarm-emission + historization are now Primary-gated), `docs/ScriptedAlarms.md` / `docs/AlarmTracking.md` (the redundancy dedup note), `docs/Client.CLI.md` only if affected, and a one-line note wherever the live-pill/Galaxy-reconnect behaviour is documented. Keep terse. + +**Steps:** +1. Update the docs above to reflect: Primary-only alarm emission + historization under redundancy; the Galaxy reconnect now recreates a faulted session; the live-pill reflects feed health. +2. Run the FULL suite: `dotnet test ZB.MOM.WW.OtOpcUa.slnx` — confirm all affected unit suites green; the only failures should be the known pre-existing macOS/integration ones (OpcUaServer.IntegrationTests PKI, Host.IntegrationTests deploy-Rejected, AbLegacy/AbCip fixtures). Verify the **Galaxy** suite is green (Task 3). +3. **Commit** docs by explicit path. +4. Run **superpowers-extended-cc:finishing-a-development-branch** (verify tests → present the 4 options → merge). + +--- + +## Execution notes + +- **Parallel dispatch:** Tasks **1, 2, 3, 4, 5** are mutually parallelizable (disjoint files across Runtime/ScriptedAlarms, Runtime/Historian, Driver.Galaxy, AdminUI/Hubs+bridges, Commons+ControlPlane). Dispatch their implementers concurrently. **6 and 7** wait on **4** (broadcaster health) and are ∥ each other (Alerts.razor vs ScriptLog.razor). **8** waits on 1/2/3/6/7; **9** on 8; **10** on 9. +- **One writer per file:** `Alerts.razor` is touched ONLY by Task 6 (its three changes are bundled for exactly this reason). `ScriptLog.razor` only by Task 7. +- **High-risk tasks (1, 2, 3):** serial spec→code review each. **Standard (4, 6):** parallel spec+code review. **Small (5, 7):** code review only. +- **Live-verify (8) is the integration checkpoint** — the single-alerts-row proof is the whole point of item 1. +- TDD where there's logic (1, 2, 3, 4, 5); razor (6, 7) proven by docker-dev `/run` (login disabled, agent drives). diff --git a/docs/plans/2026-06-11-alarm-followups.md.tasks.json b/docs/plans/2026-06-11-alarm-followups.md.tasks.json new file mode 100644 index 00000000..f4576d1d --- /dev/null +++ b/docs/plans/2026-06-11-alarm-followups.md.tasks.json @@ -0,0 +1,23 @@ +{ + "planPath": "docs/plans/2026-06-11-alarm-followups.md", + "designPath": "docs/plans/2026-06-11-alarm-followups-design.md", + "branch": "feat/alarm-followups", + "baseBranch": "master", + "baseSha": "bcb9f45c", + "status": "pending", + "note": "Resolves the 6 T17-T24 follow-ups. Double-emit = primary-only emission (T1 ScriptedAlarmHostActor alerts-gate + T2 HistorianAdapterActor historize-gate, both on Primary via redundancy-state; OPC UA write + commands stay ungated). T3 Galaxy reconnect recreate. T4 broadcaster health signal. T5 CorrelationId wrapper. T6 Alerts.razor (Timed picker + chip auto-clear + live-pill). T7 ScriptLog.razor live-pill. T8 live-verify. T9 rig cleanup. T10 docs+finish. T1-T5 mutually parallel; T6/T7 after T4 (∥ each other); T8 after 1/2/3/6/7; T9 after 8; T10 after 9. NO bUnit (razor proven by docker-dev /run). NO Configuration/EF change.", + "tasks": [ + {"id": 235, "planTask": 0, "subject": "T0: Branch + baseline", "classification": "trivial", "status": "pending", "blockedBy": []}, + {"id": 236, "planTask": 1, "subject": "T1: Redundancy emit-gate in ScriptedAlarmHostActor (A1)", "classification": "high-risk", "status": "pending", "blockedBy": [235], "parallelizableWith": [237, 238, 239, 240]}, + {"id": 237, "planTask": 2, "subject": "T2: Redundancy historize-gate in HistorianAdapterActor (A2)", "classification": "high-risk", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 238, 239, 240]}, + {"id": 238, "planTask": 3, "subject": "T3: Galaxy reconnect recreates a faulted session", "classification": "high-risk", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 237, 239, 240]}, + {"id": 239, "planTask": 4, "subject": "T4: Broadcaster connection-health signal + bridges", "classification": "standard", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 237, 238, 240]}, + {"id": 240, "planTask": 5, "subject": "T5: CorrelationId wrapper for the alarm commands (3b)", "classification": "small", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 237, 238, 239]}, + {"id": 241, "planTask": 6, "subject": "T6: Alerts.razor — Timed-shelve picker + chip auto-clear + live-pill", "classification": "standard", "status": "pending", "blockedBy": [239], "parallelizableWith": [242]}, + {"id": 242, "planTask": 7, "subject": "T7: ScriptLog.razor live-pill + reconnect-overlay check", "classification": "small", "status": "pending", "blockedBy": [239], "parallelizableWith": [241]}, + {"id": 243, "planTask": 8, "subject": "T8: Live-verify on docker-dev", "classification": "verification", "status": "pending", "blockedBy": [236, 237, 238, 241, 242]}, + {"id": 244, "planTask": 9, "subject": "T9: docker-dev rig cleanup", "classification": "trivial", "status": "pending", "blockedBy": [243]}, + {"id": 245, "planTask": 10, "subject": "T10: Docs + finish branch", "classification": "small", "status": "pending", "blockedBy": [244]} + ], + "lastUpdated": "2026-06-11" +}