docs(plan): alarm follow-ups implementation plan (T0-T10)
This commit is contained in:
@@ -0,0 +1,291 @@
|
||||
# Alarm Ack/Shelve Follow-ups Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Resolve the six follow-ups left by the T17–T24 inbound-alarm-ack work: the redundancy double-emit, the Timed-shelve UI, two T21 minors, the docker-dev rig cleanup, and two pre-existing Layer-1 gaps (Galaxy reconnect, the live-pill).
|
||||
|
||||
**Architecture:** The double-emit is fixed by **primary-only emission at the source** — both `ScriptedAlarmHostActor` (the `alerts` publish) and the per-node `HistorianAdapterActor` (the sink write) subscribe to the existing `redundancy-state` DPS topic, cache the local `RedundancyRole`, and act only when `Primary` (mirroring `OpcUaPublishActor`). OPC UA node writes + inbound-command processing stay ungated for warm-standby. The other five items are localized fixes (one Galaxy-driver lifecycle bug, one broadcaster-health interface extension + 2 razors, two small Alerts.razor additions, one mechanical refactor, one operational cleanup).
|
||||
|
||||
**Tech Stack:** .NET 10, Akka.NET (cluster, DistributedPubSub, TestKit/xunit2), Blazor Server (InteractiveServer, NO bUnit), xUnit + Shouldly, OPC Foundation UA .NET Standard, Serilog.
|
||||
|
||||
**Design of record:** `docs/plans/2026-06-11-alarm-followups-design.md` (committed master `bcb9f45c`).
|
||||
|
||||
**Hard rules:** stage by explicit path (never `git add .`); never stage `sql_login.txt` / `src/Server/.../Host/pki/`; never echo the gateway API key into a **new** tracked file; no force-push, no `--no-verify`; **no Configuration entity / EF migration change**. Build on a feature branch off master.
|
||||
|
||||
---
|
||||
|
||||
### Task 0: Branch + baseline
|
||||
|
||||
**Classification:** trivial
|
||||
**Estimated implement time:** ~1 min
|
||||
**Parallelizable with:** none
|
||||
|
||||
**Files:** (none — git only)
|
||||
|
||||
**Steps:**
|
||||
1. `git checkout master && git switch -c feat/alarm-followups` (off `bcb9f45c`).
|
||||
2. Confirm clean tree + green baseline build: `dotnet build ZB.MOM.WW.OtOpcUa.slnx` → 0 errors.
|
||||
3. No commit (branch only).
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Redundancy emit-gate in `ScriptedAlarmHostActor` (A1)
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 2, Task 3, Task 4, Task 5
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs`
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs`
|
||||
|
||||
**Context:** `OnEngineEmission` (≈ line 247–279) does two things per emission: `_publishActor.Tell(AlarmStateUpdate(...))` (line 261, OPC UA node write) and `_mediator.Tell(new Publish(AlertsTopic, evt))` (line 278, cluster-wide `alerts`). Both central nodes run this → the `alerts` publish doubles. Gate **only line 278** on `Primary`.
|
||||
|
||||
**Reference pattern to mirror** (`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs`): `RedundancyStateTopic = "redundancy-state"` (line 30); `Subscribe(RedundancyStateTopic, Self)` in PreStart (line 156); `Receive<RedundancyStateChanged>(HandleRedundancyStateChanged)` (line 147); `HandleRedundancyStateChanged` reads `msg.Nodes.FirstOrDefault(n => n.NodeId == localNode)?.Role` (lines 335–351). `RedundancyRole` lives in `ZB.MOM.WW.OtOpcUa.Commons.Messages.Redundancy` (`Primary`/`Secondary`/`Detached`). The host already knows its local node id (it's used elsewhere; if not, derive it the same way `OpcUaPublishActor` resolves `_localNode`).
|
||||
|
||||
**Step 1: Failing TestKit tests** (extend the existing harness; Runtime.Tests = xunit v2 + Akka.TestKit.Xunit2):
|
||||
- `Emission_is_published_to_alerts_by_default_before_any_redundancy_state` — a fresh host (no `RedundancyStateChanged` yet) DOES publish the `AlarmTransitionEvent` to `alerts` (subscribe a probe to `AlertsTopic`).
|
||||
- `Secondary_node_suppresses_alerts_publish_but_still_writes_opcua` — after a `RedundancyStateChanged` marking the local node `Secondary`, an emission does NOT publish to `alerts` but DOES still `Tell` the `OpcUaPublishActor.AlarmStateUpdate` (probe the publish-actor seam the test already uses for T9/T19).
|
||||
- `Primary_node_publishes_alerts` — after a `Primary` `RedundancyStateChanged`, the emission publishes to `alerts`.
|
||||
- `Inbound_AlarmCommand_is_processed_regardless_of_role` — a `Secondary` host still drives the engine for an inbound `AlarmCommand` (the existing T19 ack test, asserted under a Secondary role).
|
||||
|
||||
**Step 2:** Run them — expect FAIL (no gate yet; Secondary still publishes).
|
||||
|
||||
**Step 3: Implement.** Add a cached `RedundancyRole? _localRole = null;` (null = unknown ⇒ treat as Primary/emit). In PreStart, `_mediator.Tell(new Subscribe(OpcUaPublishActor.RedundancyStateTopic, Self))` (reuse the const; if cross-project reference is awkward, introduce a shared `Commons` const for `"redundancy-state"` and point both at it — surface that deviation). Add `Receive<RedundancyStateChanged>` + a `SubscribeAck` no-op. In the handler, set `_localRole` from the snapshot for the local node. In `OnEngineEmission`, before line 278:
|
||||
```csharp
|
||||
// Warm-standby dedup: only the Primary (driver-role leader) publishes the cluster-wide
|
||||
// transition + drives historization. Default-emit until told we are Secondary/Detached so
|
||||
// single-node deploys + the boot window never drop transitions. The OPC UA node write
|
||||
// above (warm address space) and inbound command processing stay ungated.
|
||||
if (_localRole is RedundancyRole.Secondary or RedundancyRole.Detached)
|
||||
return;
|
||||
_mediator.Tell(new Publish(AlertsTopic, evt));
|
||||
```
|
||||
Leave line 261 (the `_publishActor.Tell`) and `OnAlarmCommand` untouched.
|
||||
|
||||
**Step 4:** Run `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter ScriptedAlarmHostActor` → all green.
|
||||
|
||||
**Step 5: Commit** by explicit path (`ScriptedAlarmHostActor.cs` + the test).
|
||||
|
||||
> High-risk: concurrency (role cached on the actor thread, read in the emission path which is also marshalled onto the actor thread — confirm `OnEngineEmission` runs on the actor thread, it does via `Self.Tell(EngineEmission)`) + redundancy semantics. Do NOT gate the OPC UA write or commands.
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Redundancy historize-gate in `HistorianAdapterActor` (A2)
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 1, Task 3, Task 4, Task 5
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs`
|
||||
- Modify (only if the actor needs the local node id / mediator wired): `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs` (≈ line 146, where it's spawned `historian-adapter`)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Historian/HistorianAdapterActorTests.cs` (create if absent; else extend the existing historian test)
|
||||
|
||||
**Context:** `HistorianAdapterActor` is **per-node** and consumes the `alerts` topic, so one publish historizes once per node → double DB writes. Gate the sink write on `Primary`, same pattern as Task 1.
|
||||
|
||||
**Step 1: Failing tests** (TestKit + a fake `IAlarmHistorianSink` that records writes):
|
||||
- `Default_before_redundancy_state_historizes` — a fresh adapter writes to the sink (treat unknown as Primary).
|
||||
- `Secondary_node_does_not_historize` — after a `Secondary` `RedundancyStateChanged`, an incoming alarm event is NOT written to the sink.
|
||||
- `Primary_node_historizes` — after `Primary`, it IS written.
|
||||
|
||||
**Step 2:** Run — FAIL.
|
||||
|
||||
**Step 3: Implement** the same `_localRole` cache + `redundancy-state` subscription + `Receive<RedundancyStateChanged>` as Task 1, and guard the sink write: `if (_localRole is RedundancyRole.Secondary or RedundancyRole.Detached) return;` before `_sink.Write...`. The actor will need its local node id; resolve it the way `OpcUaPublishActor`/`ServiceCollectionExtensions` already does (pass it into `Props` if not already available — check how `OpcUaPublishActor` gets `_localNode` and mirror; thread it through `HistorianAdapterActor.Props` + the registration at `ServiceCollectionExtensions.cs:146`).
|
||||
|
||||
**Step 4:** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter Historian` → green. Also confirm the full Runtime.Tests still pass.
|
||||
|
||||
**Step 5: Commit** by explicit path.
|
||||
|
||||
> High-risk: data-historization correctness + concurrency. If wiring the node id into `Props` ripples beyond the two files, surface it before expanding.
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Galaxy reconnect recreates a faulted session
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 1, Task 2, Task 4, Task 5
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/GalaxyMxSession.cs` (≈ line 58–69)
|
||||
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs` (≈ line 289, `ReopenAsync`)
|
||||
- Test: the Galaxy driver test project (find `*ReconnectSupervisor*` / `*Session*` tests under `tests/Drivers/...Driver.Galaxy.Tests/`; mirror their seam)
|
||||
|
||||
**Context:** `GalaxyMxSession.ConnectAsync` has `if (_session is not null) return;` — so when the gRPC session is Faulted/NotFound the field is still a non-null dead handle and `ConnectAsync` is a silent no-op. `GalaxyDriver.ReopenAsync` calls `ConnectAsync` expecting a reconnect → no-op → `ReconnectSupervisor.RecoveryLoopAsync` (`Runtime/ReconnectSupervisor.cs:158-186`) sees reopen "succeed", replay fails, loops forever.
|
||||
|
||||
**Step 1: Failing test.** Assert the reconnect path **creates a new session** after a faulted one: drive `GalaxyMxSession` (or a fake `MxGatewayClient`/session seam) to a connected state, simulate fault/dispose, call the reopen/recreate path, and assert `OpenSessionAsync` + `RegisterAsync` were invoked **again** (a second create), not a no-op. Mirror the existing session/reconnect test construction (read how they fake `MxGatewayClient`/`OpenSessionAsync`).
|
||||
|
||||
**Step 2:** Run — FAIL (current code no-ops, second create never happens).
|
||||
|
||||
**Step 3: Implement.** Add a recreate path to `GalaxyMxSession`, e.g.:
|
||||
```csharp
|
||||
/// <summary>Disposes the current (faulted/stale) session + owned client so the next
|
||||
/// <see cref="ConnectAsync"/> rebuilds a fresh session instead of no-op'ing on the dead handle.</summary>
|
||||
public async Task RecreateAsync(MxGatewayClientOptions clientOptions, CancellationToken ct)
|
||||
{
|
||||
ObjectDisposedException.ThrowIf(_disposed, this);
|
||||
await DisposeSessionAsync().ConfigureAwait(false); // dispose+null _session, _serverHandle, _ownedClient
|
||||
await ConnectAsync(clientOptions, ct).ConfigureAwait(false);
|
||||
}
|
||||
```
|
||||
(Factor a private `DisposeSessionAsync()` that safely tears down the gRPC session/channel + nulls the fields — reuse whatever `DisposeAsync` already does for teardown, minus marking `_disposed`.) Then in `GalaxyDriver.ReopenAsync` call `RecreateAsync` instead of `ConnectAsync` so every reopen rebuilds. Keep the supervisor's backoff loop unchanged — it now actually recovers.
|
||||
|
||||
**Step 4:** Run the Galaxy reconnect/session tests → green. `dotnet build` the Galaxy driver clean.
|
||||
|
||||
**Step 5: Commit** by explicit path.
|
||||
|
||||
> High-risk: driver session lifecycle + gRPC channel teardown + concurrency with the supervisor loop. Confirm `DisposeSessionAsync` is safe to call repeatedly and from the supervisor thread.
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Broadcaster connection-health signal + bridges
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 1, Task 2, Task 3, Task 5
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/IInProcessBroadcaster.cs`
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/InProcessBroadcaster.cs` (the impl)
|
||||
- Modify: the two bridge actors that publish into the broadcaster — `ScriptLogSignalRBridge` + `AlertSignalRBridge` (find under `AdminUI/` or `Runtime/`; they `Tell`/`Subscribe` the DPS topics and push to the broadcaster)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Client.UI.Tests/` (or wherever `InProcessBroadcaster`/broadcaster tests live — find `InProcessBroadcasterTests`)
|
||||
|
||||
**Step 1: Failing unit test** for the new health signal: a new `InProcessBroadcaster<T>` reports `IsConnected == false` (or a sensible default); calling `SetConnected(true)` raises `ConnectionStateChanged` with `true` and flips `IsConnected`; `SetConnected(false)` flips back + raises.
|
||||
|
||||
**Step 2:** Run — FAIL (members don't exist).
|
||||
|
||||
**Step 3: Implement.** Extend the interface:
|
||||
```csharp
|
||||
bool IsConnected { get; }
|
||||
event Action<bool>? ConnectionStateChanged;
|
||||
void SetConnected(bool connected);
|
||||
```
|
||||
Implement in `InProcessBroadcaster<T>` (raise only on change). Then in each bridge actor, call `_broadcaster.SetConnected(true)` when its DPS `SubscribeAck` lands (subscription live) and `SetConnected(false)` on `PostStop`/failure. (Read how the bridge subscribes; mirror the `SubscribeAck` handling already there.) Default `IsConnected`: choose `true` once subscribed; before the first ack, `false` — the razor will reconcile on the `ConnectionStateChanged` event.
|
||||
|
||||
**Step 4:** Run the broadcaster tests → green; AdminUI builds clean (`TreatWarningsAsErrors`).
|
||||
|
||||
**Step 5: Commit** by explicit path.
|
||||
|
||||
---
|
||||
|
||||
### Task 5: `CorrelationId` wrapper for the alarm commands (3b)
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** Task 1, Task 2, Task 3, Task 4
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/AcknowledgeAlarmCommand.cs` + `ShelveAlarmCommand.cs` (+ their `*Result` records)
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (the alarm handlers)
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Clients/AdminOperationsClient.cs`
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AdminOperationsActorTests.cs` (the 5 alarm tests)
|
||||
|
||||
**Context:** the alarm commands use a bare `Guid CorrelationId`; the project's other control-plane commands (`StartDeployment`/`RestartDriver`/`ReconnectDriver`) use the `CorrelationId` wrapper type. Make them consistent.
|
||||
|
||||
**Steps:**
|
||||
1. Find the `CorrelationId` wrapper type + how `RestartDriver` uses it (`CorrelationId.NewId()`, the record field type, the reply echo).
|
||||
2. Change `AcknowledgeAlarmCommand`/`ShelveAlarmCommand` (+ `*Result`) `CorrelationId` field type `Guid` → `CorrelationId`. Update `AdminOperationsClient.AcknowledgeAlarmAsync`/`ShelveAlarmAsync` to mint `CorrelationId.NewId()`. Update the `AdminOperationsActor` handlers' reply construction.
|
||||
3. Update the 5 `AdminOperationsActorTests` to the wrapper type.
|
||||
4. `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests --filter AdminOperations` → green; build clean.
|
||||
5. **Commit** by explicit path.
|
||||
|
||||
---
|
||||
|
||||
### Task 6: `Alerts.razor` — Timed-shelve picker + chip auto-clear + live-pill
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 7
|
||||
**Blocked by:** Task 4 (needs the broadcaster `ConnectionStateChanged`/`IsConnected`)
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Alerts.razor`
|
||||
|
||||
**Context:** all three changes are in one file → bundled so they don't contend. NO bUnit; proven by docker-dev live-verify (Task 8).
|
||||
|
||||
**Steps (no failing test — razor; verify by build + Task 8):**
|
||||
1. **Timed-shelve picker:** add a small duration input to the row's shelve control (a minutes number box is simplest, default e.g. 5). Add a "Shelve (timed)" action that calls `IAdminOperationsClient.ShelveAlarmAsync(alarmId, user, ShelveKind.Timed, unshelveAtUtc: <computed now+duration>, comment: null, ct)`. Keep the existing OneShot Shelve + Unshelve buttons. Dispose the CTS (`using var cts = …`) like the existing handlers.
|
||||
2. **Chip auto-clear:** after `ShowOpResult`, start a `~8s` auto-clear (mirror `DriverStatusPanel.razor`'s timer: a `Task.Delay(8000)` continuation that, if the chip is still the same one, clears `_opResult*` + `InvokeAsync(StateHasChanged)`). Read `DriverStatusPanel.razor` for the exact pattern (cancellation on a newer action).
|
||||
3. **Live-pill:** replace the set-once `_connected = true` (≈ line 132) — in `OnInitializedAsync` read `Alarms.IsConnected` and subscribe `Alarms.ConnectionStateChanged += OnConnChanged;` (handler sets `_connected` + `InvokeAsync(StateHasChanged)`); unsubscribe in `Dispose`. (`Alarms` = the injected `IInProcessBroadcaster<AlarmTransitionEvent>`.)
|
||||
4. `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` → 0 warnings/errors.
|
||||
5. **Commit** by explicit path.
|
||||
|
||||
---
|
||||
|
||||
### Task 7: `ScriptLog.razor` live-pill + reconnect-overlay check
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~3 min
|
||||
**Parallelizable with:** Task 6
|
||||
**Blocked by:** Task 4
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/ScriptLog.razor` (≈ line 122)
|
||||
- Read/verify (no edit unless missing): the Blazor host shell (`App.razor` / `_Host.cshtml` / `MainLayout`) for the default reconnect overlay (`components-reconnect-modal` / `<div id="components-reconnect-modal">`)
|
||||
|
||||
**Steps:**
|
||||
1. Same live-pill fix as Task 6 step 3, against `ScriptLogs` (the injected `IInProcessBroadcaster<ScriptLogEntry>`): read `IsConnected`, subscribe `ConnectionStateChanged`, update + `StateHasChanged`, unsubscribe in `Dispose`.
|
||||
2. **Verify the dead-circuit overlay exists:** grep the host shell for the Blazor reconnect modal markup. If present, note it in the commit message (the dead-circuit case is covered by Blazor's built-in reconnection UI — the component can't self-update a dead circuit). If ABSENT, add the standard `components-reconnect-modal` markup. Do NOT fake liveness from a dead component.
|
||||
3. `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` → clean.
|
||||
4. **Commit** by explicit path.
|
||||
|
||||
---
|
||||
|
||||
### Task 8: Live-verify on docker-dev
|
||||
|
||||
**Classification:** verification
|
||||
**Estimated implement time:** ~ (manual)
|
||||
**Parallelizable with:** none
|
||||
**Blocked by:** Task 1, Task 2, Task 3, Task 6, Task 7 (Task 5 is internal; include it in the build)
|
||||
|
||||
**Steps:** rebuild docker-dev central nodes on the new image (`docker compose -f docker-dev/docker-compose.yml up -d --build central-1 central-2`), then on `/alerts` (login disabled — agent drives):
|
||||
1. **Double-emit fixed:** confirm each t12-overheat transition now appears **ONCE** (not twice). Cross-check both nodes' logs: only the **driver-leader (Primary)** node logs the alerts publish; the Secondary suppresses it but still writes its OPC UA node (its address space stays current). (Find the Primary via the `redundancy-state`/ServiceLevel — or just observe single rows.)
|
||||
2. **Timed-shelve:** drive the new timed-shelve control → a "Shelved" transition appears (operator = `multi-role-test`); after the duration the SDK auto-unshelves (or verify the `UnshelveAtUtc` was set).
|
||||
3. **Chip auto-clear:** the Ack/Shelve result chip clears after ~8s.
|
||||
4. **Live-pill:** the pill shows "live" while connected; (optional) kill the bridge / restart a node and confirm the pill flips / the Blazor reconnect overlay shows.
|
||||
5. **Galaxy reconnect** (best-effort, may need the gateway): if feasible, restart the Galaxy gateway and confirm the driver recreates the session + recovers without a container restart (watch the logs for a fresh `OpenSessionAsync`). If not feasible live, rely on the Task 3 unit test + note it.
|
||||
Defects → new fix tasks. **Agent drives; no sign-in needed (docker-dev login disabled).**
|
||||
|
||||
---
|
||||
|
||||
### Task 9: docker-dev rig cleanup
|
||||
|
||||
**Classification:** trivial (operational)
|
||||
**Estimated implement time:** ~ (manual)
|
||||
**Parallelizable with:** none
|
||||
**Blocked by:** Task 8 (the rig artifacts are the live-verify vehicle)
|
||||
|
||||
**Steps:** remove the seed artifacts left for verification, then redeploy:
|
||||
1. In the docker-dev AdminUI (or via DB), delete the `t12-overheat` scripted alarm, the `SC-ba675b168a85` predicate script, and the `layer0-logcheck` vtag + script. Revert filler-02's `cycle-time-s` script to `return ctx.GetTag("TestMachine_002.TestDuration").Value;`.
|
||||
2. Redeploy: `POST http://localhost:9200/api/deployments` with `X-Api-Key: docker-dev-deploy-key` → `202`.
|
||||
3. Confirm `/alerts` + `/scripted-alarms` + `/scripts` no longer list the removed artifacts. No code commit (DB/deploy only). Note completion in Task 10's commit message.
|
||||
|
||||
> Operational only — no source change. If the user wants the rig kept, skip this task and record it as deferred.
|
||||
|
||||
---
|
||||
|
||||
### Task 10: Docs + finish branch
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** none
|
||||
**Blocked by:** Task 9
|
||||
|
||||
**Files:** `docs/Redundancy.md` (alarm-emission + historization are now Primary-gated), `docs/ScriptedAlarms.md` / `docs/AlarmTracking.md` (the redundancy dedup note), `docs/Client.CLI.md` only if affected, and a one-line note wherever the live-pill/Galaxy-reconnect behaviour is documented. Keep terse.
|
||||
|
||||
**Steps:**
|
||||
1. Update the docs above to reflect: Primary-only alarm emission + historization under redundancy; the Galaxy reconnect now recreates a faulted session; the live-pill reflects feed health.
|
||||
2. Run the FULL suite: `dotnet test ZB.MOM.WW.OtOpcUa.slnx` — confirm all affected unit suites green; the only failures should be the known pre-existing macOS/integration ones (OpcUaServer.IntegrationTests PKI, Host.IntegrationTests deploy-Rejected, AbLegacy/AbCip fixtures). Verify the **Galaxy** suite is green (Task 3).
|
||||
3. **Commit** docs by explicit path.
|
||||
4. Run **superpowers-extended-cc:finishing-a-development-branch** (verify tests → present the 4 options → merge).
|
||||
|
||||
---
|
||||
|
||||
## Execution notes
|
||||
|
||||
- **Parallel dispatch:** Tasks **1, 2, 3, 4, 5** are mutually parallelizable (disjoint files across Runtime/ScriptedAlarms, Runtime/Historian, Driver.Galaxy, AdminUI/Hubs+bridges, Commons+ControlPlane). Dispatch their implementers concurrently. **6 and 7** wait on **4** (broadcaster health) and are ∥ each other (Alerts.razor vs ScriptLog.razor). **8** waits on 1/2/3/6/7; **9** on 8; **10** on 9.
|
||||
- **One writer per file:** `Alerts.razor` is touched ONLY by Task 6 (its three changes are bundled for exactly this reason). `ScriptLog.razor` only by Task 7.
|
||||
- **High-risk tasks (1, 2, 3):** serial spec→code review each. **Standard (4, 6):** parallel spec+code review. **Small (5, 7):** code review only.
|
||||
- **Live-verify (8) is the integration checkpoint** — the single-alerts-row proof is the whole point of item 1.
|
||||
- TDD where there's logic (1, 2, 3, 4, 5); razor (6, 7) proven by docker-dev `/run` (login disabled, agent drives).
|
||||
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-11-alarm-followups.md",
|
||||
"designPath": "docs/plans/2026-06-11-alarm-followups-design.md",
|
||||
"branch": "feat/alarm-followups",
|
||||
"baseBranch": "master",
|
||||
"baseSha": "bcb9f45c",
|
||||
"status": "pending",
|
||||
"note": "Resolves the 6 T17-T24 follow-ups. Double-emit = primary-only emission (T1 ScriptedAlarmHostActor alerts-gate + T2 HistorianAdapterActor historize-gate, both on Primary via redundancy-state; OPC UA write + commands stay ungated). T3 Galaxy reconnect recreate. T4 broadcaster health signal. T5 CorrelationId wrapper. T6 Alerts.razor (Timed picker + chip auto-clear + live-pill). T7 ScriptLog.razor live-pill. T8 live-verify. T9 rig cleanup. T10 docs+finish. T1-T5 mutually parallel; T6/T7 after T4 (∥ each other); T8 after 1/2/3/6/7; T9 after 8; T10 after 9. NO bUnit (razor proven by docker-dev /run). NO Configuration/EF change.",
|
||||
"tasks": [
|
||||
{"id": 235, "planTask": 0, "subject": "T0: Branch + baseline", "classification": "trivial", "status": "pending", "blockedBy": []},
|
||||
{"id": 236, "planTask": 1, "subject": "T1: Redundancy emit-gate in ScriptedAlarmHostActor (A1)", "classification": "high-risk", "status": "pending", "blockedBy": [235], "parallelizableWith": [237, 238, 239, 240]},
|
||||
{"id": 237, "planTask": 2, "subject": "T2: Redundancy historize-gate in HistorianAdapterActor (A2)", "classification": "high-risk", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 238, 239, 240]},
|
||||
{"id": 238, "planTask": 3, "subject": "T3: Galaxy reconnect recreates a faulted session", "classification": "high-risk", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 237, 239, 240]},
|
||||
{"id": 239, "planTask": 4, "subject": "T4: Broadcaster connection-health signal + bridges", "classification": "standard", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 237, 238, 240]},
|
||||
{"id": 240, "planTask": 5, "subject": "T5: CorrelationId wrapper for the alarm commands (3b)", "classification": "small", "status": "pending", "blockedBy": [235], "parallelizableWith": [236, 237, 238, 239]},
|
||||
{"id": 241, "planTask": 6, "subject": "T6: Alerts.razor — Timed-shelve picker + chip auto-clear + live-pill", "classification": "standard", "status": "pending", "blockedBy": [239], "parallelizableWith": [242]},
|
||||
{"id": 242, "planTask": 7, "subject": "T7: ScriptLog.razor live-pill + reconnect-overlay check", "classification": "small", "status": "pending", "blockedBy": [239], "parallelizableWith": [241]},
|
||||
{"id": 243, "planTask": 8, "subject": "T8: Live-verify on docker-dev", "classification": "verification", "status": "pending", "blockedBy": [236, 237, 238, 241, 242]},
|
||||
{"id": 244, "planTask": 9, "subject": "T9: docker-dev rig cleanup", "classification": "trivial", "status": "pending", "blockedBy": [243]},
|
||||
{"id": 245, "planTask": 10, "subject": "T10: Docs + finish branch", "classification": "small", "status": "pending", "blockedBy": [244]}
|
||||
],
|
||||
"lastUpdated": "2026-06-11"
|
||||
}
|
||||
Reference in New Issue
Block a user