docs(plan): alarm historian follow-ups (pending #1-6) implementation plan
This commit is contained in:
@@ -0,0 +1,456 @@
|
||||
# Alarm Historian Follow-ups Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Close the six code follow-ups left open by the alarm-followups round-2 work (`f64f7ce6`): honor the per-alarm `HistorizeToAveva` opt-out at the durable historian write, expose drain/capacity/retention config knobs, validate the historian config at startup, record the real operator for shelve/enable/disable transitions, and fix two `SqliteStoreAndForwardSink` thread-safety nits.
|
||||
|
||||
**Architecture:** All six are bounded hardening fixes on the already-shipped historian + scripted-alarm surface — no new components. The load-bearing one (T1) threads a `HistorizeToAveva` flag through the engine emission → the DPS-serialized `AlarmTransitionEvent` → a gate in `HistorianAdapterActor` that suppresses **only the durable write** (the live `/alerts` UI keeps every transition). The rest are localized: options/registration hardening (T2), a one-line `TransitionUser` widening (T3), and field-visibility fixes on the SQLite sink (T4).
|
||||
|
||||
**Tech Stack:** C# / .NET 10, Akka.NET (cluster DistributedPubSub + TestKit), xUnit + Shouldly, Serilog, EF Core (untouched here), SQLite store-and-forward.
|
||||
|
||||
**Source of truth:** `pending.md` items 1–6 (item 7, the docker-dev rig cleanup, is an operational deferral and is **out of scope**). Round-2 design: `docs/plans/2026-06-11-alarm-followups-round2-design.md`.
|
||||
|
||||
**Base:** Branch off `master` @ `f64f7ce6`.
|
||||
|
||||
**Hard rules (carried from round 2):** stage by explicit path (never `git add .`); never stage `sql_login.txt` or `src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`; never echo the gateway API key into a new tracked file; no force-push, no `--no-verify`; **NO Configuration entity / EF migration change** — the historian queue is a standalone SQLite file, NOT the Config DB; commit per task by explicit path.
|
||||
|
||||
**Same-assembly build contention:** T1, T2, T3 all touch the `Runtime` assembly (and T1/T3 share `ScriptedAlarmHostActor.cs`) → they MUST run serially (build/test collide on `obj/`). T4 touches only `Core.AlarmHistorian` → it is the one task safe to run concurrently with the Runtime chain.
|
||||
|
||||
---
|
||||
|
||||
## Task 0: Branch
|
||||
|
||||
**Classification:** trivial
|
||||
**Estimated implement time:** ~1 min
|
||||
**Parallelizable with:** none
|
||||
|
||||
**Files:** none (git only)
|
||||
|
||||
**Step 1: Create the feature branch off master**
|
||||
|
||||
```bash
|
||||
git checkout master
|
||||
git status --short # expect only untracked pending.md
|
||||
git checkout -b feat/alarm-historian-followups
|
||||
git rev-parse --short HEAD # expect f64f7ce6
|
||||
```
|
||||
|
||||
Expected: on `feat/alarm-historian-followups`, HEAD `f64f7ce6`.
|
||||
|
||||
(Do NOT commit `pending.md` — it is an untracked working note and stays untracked.)
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Honor `HistorizeToAveva` opt-out at the durable write (pending #1)
|
||||
|
||||
**Classification:** high-risk
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** Task 4
|
||||
|
||||
**Why high-risk:** changes a DistributedPubSub-serialized data contract (`AlarmTransitionEvent`) and the warm-redundant historian gate. Mis-gating either double-writes, drops rows, or (the trap pending.md calls out) silently suppresses the **live UI** instead of just the durable write.
|
||||
|
||||
**The design (locked):**
|
||||
- The flag is **already** on `ScriptedAlarmDefinition.HistorizeToAveva` (default `true`) and reaches the engine via `ToDefinition` (`ScriptedAlarmHostActor.cs:477`). It is NOT yet on the emitted event, so the historian can't see it.
|
||||
- Carry it: `ScriptedAlarmEvent` (engine) → `AlarmTransitionEvent` (Commons, DPS-serialized) → gate in `HistorianAdapterActor`.
|
||||
- **Gate the durable write ONLY.** `ScriptedAlarmHostActor.OnEngineEmission` must keep publishing every transition to `alerts` (its existing Primary gate is unchanged) so the live `/alerts` UI is unaffected. The `HistorizeToAveva` check lives solely in `HistorianAdapterActor.Receive<AlarmTransitionEvent>`.
|
||||
- **Default `true`, non-nullable.** On a rolling restart Akka's JSON serializer applies `default(bool)` = `false` to an old-format message's missing field (the same quirk the `AlarmTypeName` null-coalesce in `Translate` guards). This is safe here because the node that *writes* is always the Primary (or a boot-window node), and that node also *published* the event — so the written event always carries the publisher's own same-version flag value. A cross-version old→new flow only reaches the Secondary, which never writes. Document this reasoning in a code comment.
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs` (`ScriptedAlarmEvent` record ~837; `BuildEmission` ~592)
|
||||
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Alerts/AlarmTransitionEvent.cs` (record ~19-29)
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` (`OnEngineEmission` ~289-301)
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs` (ctor `Receive<AlarmTransitionEvent>` ~74)
|
||||
- Test: `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs`
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Historian/HistorianAdapterActorTests.cs`
|
||||
|
||||
**Step 1: Write the failing engine test (flag carried through emission)**
|
||||
|
||||
In `ScriptedAlarmEngineTests.cs`, add a test that an alarm defined with `HistorizeToAveva: false` emits a `ScriptedAlarmEvent` whose `HistorizeToAveva` is `false` (and `true` when the definition is `true`). Follow the existing engine-test setup pattern in that file (register a definition, drive the predicate active, capture `OnEvent`). Assert:
|
||||
|
||||
```csharp
|
||||
emitted.HistorizeToAveva.ShouldBeFalse();
|
||||
```
|
||||
|
||||
Run (expect FAIL — `ScriptedAlarmEvent` has no `HistorizeToAveva`):
|
||||
```bash
|
||||
dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests --filter "FullyQualifiedName~HistorizeToAveva"
|
||||
```
|
||||
|
||||
**Step 2: Add the field to `ScriptedAlarmEvent` + populate in `BuildEmission`**
|
||||
|
||||
`ScriptedAlarmEvent` record (append after `Comment`):
|
||||
```csharp
|
||||
public sealed record ScriptedAlarmEvent(
|
||||
string AlarmId,
|
||||
string EquipmentPath,
|
||||
string AlarmName,
|
||||
AlarmKind Kind,
|
||||
AlarmSeverity Severity,
|
||||
string Message,
|
||||
AlarmConditionState Condition,
|
||||
EmissionKind Emission,
|
||||
DateTime TimestampUtc,
|
||||
string? Comment = null,
|
||||
bool HistorizeToAveva = true);
|
||||
```
|
||||
|
||||
`BuildEmission` — add to the `new ScriptedAlarmEvent(...)` initializer (the definition is on `state.Definition`):
|
||||
```csharp
|
||||
Comment: kind switch
|
||||
{
|
||||
EmissionKind.Acknowledged => condition.LastAckComment,
|
||||
EmissionKind.Confirmed => condition.LastConfirmComment,
|
||||
EmissionKind.CommentAdded => condition.Comments.Count == 0 ? null : condition.Comments[^1].Text,
|
||||
_ => null,
|
||||
},
|
||||
HistorizeToAveva: state.Definition.HistorizeToAveva);
|
||||
```
|
||||
|
||||
**Step 3: Add the field to `AlarmTransitionEvent` (Commons contract)**
|
||||
|
||||
Append after `Comment` (additive, default `true` — see the rolling-restart note above), and add the `<param>` doc line:
|
||||
```csharp
|
||||
/// <param name="HistorizeToAveva">When <c>false</c>, the durable historian sink suppresses this transition (the live <c>alerts</c> fan-out is unaffected). Defaults to <c>true</c>. On a rolling restart an old-format message deserializes this as <c>false</c> (CLR default); that is safe because the writing node is always the same-version publisher — see <c>HistorianAdapterActor</c>.</param>
|
||||
public sealed record AlarmTransitionEvent(
|
||||
string AlarmId,
|
||||
string EquipmentPath,
|
||||
string AlarmName,
|
||||
string TransitionKind,
|
||||
int Severity,
|
||||
string Message,
|
||||
string User,
|
||||
DateTime TimestampUtc,
|
||||
string AlarmTypeName = "AlarmCondition",
|
||||
string? Comment = null,
|
||||
bool HistorizeToAveva = true);
|
||||
```
|
||||
|
||||
**Step 4: Populate it in `ScriptedAlarmHostActor.OnEngineEmission`**
|
||||
|
||||
Add to the `new AlarmTransitionEvent(...)` initializer (after `Comment: e.Comment`):
|
||||
```csharp
|
||||
AlarmTypeName: e.Kind.ToString(),
|
||||
Comment: e.Comment,
|
||||
HistorizeToAveva: e.HistorizeToAveva);
|
||||
```
|
||||
**Do NOT touch the Primary `_localRole` gate below it** — the `alerts` publish stays ungated by `HistorizeToAveva` (live UI must see every transition).
|
||||
|
||||
**Step 5: Gate the durable write in `HistorianAdapterActor`**
|
||||
|
||||
Change the `Receive<AlarmTransitionEvent>` handler (currently `~74`) to also require the flag:
|
||||
```csharp
|
||||
// HistorizeToAveva=false is a per-alarm opt-out of DURABLE historization only — the live
|
||||
// `alerts` fan-out (browser UI) already happened upstream. Gate the sink write, not the publish.
|
||||
Receive<AlarmTransitionEvent>(t => { if (ShouldHistorize() && t.HistorizeToAveva) _ = EnqueueAsync(Translate(t)); });
|
||||
```
|
||||
|
||||
**Step 6: Write the failing historian-gate test, then make it pass**
|
||||
|
||||
In `HistorianAdapterActorTests.cs`: add a `bool historizeToAveva = true` parameter to the `SampleTransition` helper (thread it into the `AlarmTransitionEvent`), then add:
|
||||
```csharp
|
||||
/// <summary>Per-alarm opt-out (pending #1): a Primary node MUST NOT write to the durable sink when the
|
||||
/// transition carries HistorizeToAveva=false — even though it would otherwise historize. The live alerts
|
||||
/// fan-out is upstream and unaffected.</summary>
|
||||
[Fact]
|
||||
public void Primary_node_does_not_historize_when_opted_out()
|
||||
{
|
||||
var (actor, sink) = CreateActor();
|
||||
TellRedundancyRole(actor, RedundancyRole.Primary);
|
||||
actor.Tell(SampleTransition(historizeToAveva: false));
|
||||
// Give the actor time to (not) enqueue.
|
||||
ExpectNoMsg(Settle);
|
||||
sink.EnqueueCount.ShouldBe(0);
|
||||
}
|
||||
```
|
||||
(Match the existing helpers — `CreateActor`, `TellRedundancyRole`, `Settle`, `ExpectNoMsg`. If `ExpectNoMsg` isn't the local idiom, mirror the existing `Secondary_node_does_not_historize` test's settle-then-assert-0 shape.)
|
||||
|
||||
**Step 7: Run the full affected suites**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~HistorianAdapterActor|FullyQualifiedName~ScriptedAlarmHostActor"
|
||||
```
|
||||
Expected: PASS. The existing `ScriptedAlarmHostActorTests` already define alarms with `HistorizeToAveva: true`/`false`, so confirm those still pass (the `alerts` publish must still fire for the `false` alarm).
|
||||
|
||||
**Step 8: Commit**
|
||||
|
||||
```bash
|
||||
git add src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs \
|
||||
src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Alerts/AlarmTransitionEvent.cs \
|
||||
src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs \
|
||||
src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs \
|
||||
tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs \
|
||||
tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Historian/HistorianAdapterActorTests.cs
|
||||
git commit -m "feat(historian): honor per-alarm HistorizeToAveva opt-out at the durable write"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Historian config knobs + startup validation (pending #2, #3, #4)
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** Task 4
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/AlarmHistorianOptions.cs`
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs` (`AddAlarmHistorian` ~67-89)
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.json` (the existing disabled `AlarmHistorian` example section)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Historian/AlarmHistorianRegistrationTests.cs`
|
||||
|
||||
**Step 1: Write the failing tests**
|
||||
|
||||
In `AlarmHistorianRegistrationTests.cs`, add:
|
||||
- A binding test: a config with `DrainIntervalSeconds`, `Capacity`, `DeadLetterRetentionDays` set binds them onto `AlarmHistorianOptions`.
|
||||
- `Validate()` tests (pure, no DI): `{ Enabled = true, SharedSecret = "" }` → returns a warning mentioning `SharedSecret`; `{ Enabled = true, DatabasePath = "alarm-historian.db" }` (relative) → returns a warning mentioning the relative path; `{ Enabled = true, SharedSecret = "x", DatabasePath = "/abs/h.db" }` → returns empty; `{ Enabled = false, SharedSecret = "" }` → returns empty (disabled is never a misconfig).
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public void Validate_warns_on_empty_shared_secret_when_enabled()
|
||||
{
|
||||
var opts = new AlarmHistorianOptions { Enabled = true, SharedSecret = "", DatabasePath = "/var/h.db" };
|
||||
opts.Validate().ShouldContain(w => w.Contains("SharedSecret"));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Validate_warns_on_relative_database_path_when_enabled()
|
||||
{
|
||||
var opts = new AlarmHistorianOptions { Enabled = true, SharedSecret = "s", DatabasePath = "alarm-historian.db" };
|
||||
opts.Validate().ShouldContain(w => w.Contains("DatabasePath"));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Validate_is_silent_when_disabled()
|
||||
{
|
||||
new AlarmHistorianOptions { Enabled = false, SharedSecret = "" }.Validate().ShouldBeEmpty();
|
||||
}
|
||||
```
|
||||
|
||||
Run (expect FAIL — no `Validate`, no new options):
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~AlarmHistorianRegistration"
|
||||
```
|
||||
|
||||
**Step 2: Add the knobs + `Validate()` to `AlarmHistorianOptions`**
|
||||
|
||||
```csharp
|
||||
/// <summary>Seconds between drain-worker ticks. Defaults to 5.</summary>
|
||||
public int DrainIntervalSeconds { get; init; } = 5;
|
||||
|
||||
/// <summary>Maximum queued rows before the sink evicts the oldest. Defaults to 1,000,000
|
||||
/// (<c>SqliteStoreAndForwardSink.DefaultCapacity</c>).</summary>
|
||||
public long Capacity { get; init; } = 1_000_000;
|
||||
|
||||
/// <summary>Days to retain dead-lettered rows before purge. Defaults to 30.</summary>
|
||||
public int DeadLetterRetentionDays { get; init; } = 30;
|
||||
|
||||
/// <summary>Returns operator-facing misconfiguration warnings for an <c>Enabled</c> historian
|
||||
/// (empty when disabled or correctly configured). Pure — the registration logs each entry.</summary>
|
||||
/// <returns>Zero or more human-readable warning messages.</returns>
|
||||
public IReadOnlyList<string> Validate()
|
||||
{
|
||||
var warnings = new List<string>();
|
||||
if (!Enabled) return warnings;
|
||||
if (string.IsNullOrWhiteSpace(SharedSecret))
|
||||
warnings.Add("AlarmHistorian:SharedSecret is empty while the historian is enabled — the Wonderware sidecar Hello frame will carry an empty secret.");
|
||||
if (!Path.IsPathRooted(DatabasePath))
|
||||
warnings.Add($"AlarmHistorian:DatabasePath '{DatabasePath}' is relative — it resolves against the process working directory (e.g. System32 for a Windows service). Set an absolute path.");
|
||||
return warnings;
|
||||
}
|
||||
```
|
||||
(Add `using System.Collections.Generic;` and `using System.IO;` if the file lacks them.)
|
||||
|
||||
**Step 3: Thread the knobs + log warnings in `AddAlarmHistorian`**
|
||||
|
||||
Inside the existing `if (opts is not { Enabled: true }) return services;` guard's downstream block:
|
||||
```csharp
|
||||
foreach (var warning in opts.Validate())
|
||||
Serilog.Log.Logger.ForContext<SqliteStoreAndForwardSink>().Warning("{HistorianConfigWarning}", warning);
|
||||
|
||||
services.AddSingleton<IAlarmHistorianSink>(sp =>
|
||||
{
|
||||
var sink = new SqliteStoreAndForwardSink(
|
||||
opts.DatabasePath,
|
||||
writerFactory(opts, sp),
|
||||
Serilog.Log.Logger.ForContext<SqliteStoreAndForwardSink>(),
|
||||
batchSize: opts.BatchSize,
|
||||
capacity: opts.Capacity,
|
||||
deadLetterRetention: TimeSpan.FromDays(opts.DeadLetterRetentionDays));
|
||||
sink.StartDrainLoop(TimeSpan.FromSeconds(opts.DrainIntervalSeconds));
|
||||
return sink;
|
||||
});
|
||||
return services;
|
||||
```
|
||||
|
||||
**Step 4: Update the appsettings example**
|
||||
|
||||
In `Host/appsettings.json`, extend the disabled `AlarmHistorian` example section with the three new keys (`DrainIntervalSeconds`, `Capacity`, `DeadLetterRetentionDays`) so operators see them. Keep `Enabled: false`. Make a targeted edit to that section only.
|
||||
|
||||
**Step 5: Run + commit**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~AlarmHistorianRegistration"
|
||||
git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/AlarmHistorianOptions.cs \
|
||||
src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs \
|
||||
src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.json \
|
||||
tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Historian/AlarmHistorianRegistrationTests.cs
|
||||
git commit -m "feat(historian): drain/capacity/retention config knobs + startup config-warning validation"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Record the real operator for shelve/enable/disable transitions (pending #5)
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~3 min
|
||||
**Parallelizable with:** Task 4
|
||||
|
||||
**The design:** every shelve/unshelve/enable/disable op in `Part9StateMachine` appends an audit `AlarmComment` carrying the acting `user` (`ApplyOneShotShelve` → `"ShelveOneShot"`, `ApplyTimedShelve` → `"ShelveTimed"`, `ApplyUnshelve` → `"Unshelve"`, `ApplyEnable` → `"Enable"`, `ApplyDisable` → `"Disable"`), and auto-unshelve appends with `user="system"`. So the operator is already the **last** `Comments` entry on the emitted condition — exactly the `CommentAdded` case. No Core/state change needed; widen the `TransitionUser` switch. (`Acknowledged`/`Confirmed` keep using `LastAckUser`/`LastConfirmUser` — leave them.)
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` (`TransitionUser` ~508-514)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs`
|
||||
|
||||
**Step 1: Write the failing test**
|
||||
|
||||
Mirror the existing Acknowledge/AddComment command tests (they drive a command via the `alarm-commands` topic and `FishForMessage<AlarmTransitionEvent>` on the `alerts` probe). Add a test that an operator shelve command (`Operation: "OneShotShelve"`, `User: "carol"`) on an active alarm yields a transition with `evt.User.ShouldBe("carol")` (it currently returns `"system"`). Use the exact command-operation string the existing inbound tests use (check the host actor's command dispatch for the canonical `OneShotShelve`/`Shelve` spelling).
|
||||
|
||||
Run (expect FAIL — currently `"system"`):
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~ScriptedAlarmHostActor"
|
||||
```
|
||||
|
||||
**Step 2: Widen `TransitionUser`**
|
||||
|
||||
```csharp
|
||||
private static string TransitionUser(ScriptedAlarmEvent e) => e.Emission switch
|
||||
{
|
||||
EmissionKind.Acknowledged => e.Condition.LastAckUser ?? "system",
|
||||
EmissionKind.Confirmed => e.Condition.LastConfirmUser ?? "system",
|
||||
// Shelve / unshelve / enable / disable / comment ops each append the acting user as the last
|
||||
// audit entry on the emitted condition (auto-unshelve appends "system"); read it from there.
|
||||
EmissionKind.CommentAdded
|
||||
or EmissionKind.Shelved
|
||||
or EmissionKind.Unshelved
|
||||
or EmissionKind.Enabled
|
||||
or EmissionKind.Disabled => e.Condition.Comments.Count > 0 ? e.Condition.Comments[^1].User : "system",
|
||||
_ => "system",
|
||||
};
|
||||
```
|
||||
|
||||
**Step 3: Run + commit**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~ScriptedAlarmHostActor"
|
||||
git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs \
|
||||
tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs
|
||||
git commit -m "fix(alarms): historize the real operator for shelve/unshelve/enable/disable transitions"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: `SqliteStoreAndForwardSink` thread-safety nits (pending #6)
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~3 min
|
||||
**Parallelizable with:** Task 1, Task 2, Task 3 (different assembly — `Core.AlarmHistorian`)
|
||||
|
||||
**The fixes (visibility-only, no functional behavior change):**
|
||||
1. `_backoffIndex` (`~74`) is written by `BumpBackoff`/`ResetBackoff` (drain thread) and read by `CurrentBackoff` (status query thread) without a memory barrier → mark `volatile`.
|
||||
2. The capacity-eviction log (`~659-661`) reads `_evictedCount` outside `_statusLock` (it's incremented inside the lock on `658`) → capture the total inside the lock and log the captured value.
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs`
|
||||
|
||||
**Step 1: Make `_backoffIndex` volatile**
|
||||
|
||||
```csharp
|
||||
private volatile int _backoffIndex;
|
||||
```
|
||||
|
||||
**Step 2: Capture `_evictedCount` under the lock for the log message**
|
||||
|
||||
Replace lines ~658-661:
|
||||
```csharp
|
||||
Interlocked.Add(ref _queuedRowCount, -toEvict);
|
||||
long lifetimeEvicted;
|
||||
lock (_statusLock) { _evictedCount += toEvict; lifetimeEvicted = _evictedCount; }
|
||||
_logger.Warning(
|
||||
"Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room (lifetime evictions: {Total})",
|
||||
_capacity, toEvict, lifetimeEvicted);
|
||||
```
|
||||
|
||||
**Step 3: Build + run the sink suite (guards no regression)**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests
|
||||
```
|
||||
Expected: PASS (no behavior change; these are visibility/lock-scope fixes). No new test — thread-safety here isn't deterministically unit-testable; correctness is by inspection + the existing suite.
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs
|
||||
git commit -m "fix(historian): volatile _backoffIndex + read _evictedCount under lock (thread-safety)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Full-suite gate + docs + close out
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** none (depends on Tasks 1-4)
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/ScriptedAlarms.md` and/or `docs/AlarmTracking.md` (historian config + `HistorizeToAveva` opt-out note)
|
||||
- Modify: `pending.md` (strike the now-resolved items 1-6; keep item 7)
|
||||
|
||||
**Step 1: Build the whole solution**
|
||||
|
||||
```bash
|
||||
dotnet build ZB.MOM.WW.OtOpcUa.slnx
|
||||
```
|
||||
Expected: 0 errors.
|
||||
|
||||
**Step 2: Run the full suite, capturing the real exit code**
|
||||
|
||||
```bash
|
||||
dotnet test ZB.MOM.WW.OtOpcUa.slnx 2>&1 | tee /tmp/followups-test.log; echo "EXIT=${PIPESTATUS[0]}"
|
||||
```
|
||||
Expected: only the known pre-existing env/integration failures (AbCip/AbLegacy fixtures, OpcUaServer DualEndpoint/PKI, Host EquipmentNamespace deploy-Rejected) — NO new failures in `Core.ScriptedAlarms.Tests`, `Core.AlarmHistorian.Tests`, or `Runtime.Tests`. If anything in those three regresses, stop and fix before finishing. (Do NOT pipe to `tail` — it masks the exit code.)
|
||||
|
||||
**Step 3: Document the new behavior**
|
||||
|
||||
In the historian section of `docs/ScriptedAlarms.md` (and/or `docs/AlarmTracking.md`): note that (a) `HistorizeToAveva=false` now suppresses the durable historian write while the live `/alerts` UI still shows the transition; (b) the `AlarmHistorian` section exposes `DrainIntervalSeconds`, `Capacity`, `DeadLetterRetentionDays`; (c) an enabled historian logs a startup warning on empty `SharedSecret` or a relative `DatabasePath`.
|
||||
|
||||
**Step 4: Strike resolved items from `pending.md`**
|
||||
|
||||
Remove items 1-6 from `pending.md` (or mark them resolved in this branch) and leave item 7 (docker-dev rig cleanup, operational deferral). Note: `pending.md` is untracked — update it but it need not be committed unless the user asks.
|
||||
|
||||
**Step 5: Commit the docs**
|
||||
|
||||
```bash
|
||||
git add docs/ScriptedAlarms.md docs/AlarmTracking.md
|
||||
git commit -m "docs(historian): HistorizeToAveva opt-out + config knobs + startup validation"
|
||||
```
|
||||
|
||||
**Step 6: Finish the branch**
|
||||
|
||||
Use **superpowers-extended-cc:finishing-a-development-branch** (verify tests → present the 4 options → execute the user's choice).
|
||||
|
||||
---
|
||||
|
||||
## Dependency spine
|
||||
|
||||
```
|
||||
T0 (branch)
|
||||
├─ T1 (HistorizeToAveva opt-out) [high-risk] ─┐
|
||||
│ └─ T2 (config knobs + validation) [standard] │ (Runtime serial chain;
|
||||
│ └─ T3 (TransitionUser) [small] │ T1/T3 share a file)
|
||||
└─ T4 (sink thread-safety) [small] ─┘ (Core.AlarmHistorian — parallel)
|
||||
└──────────────┬───────────────┘
|
||||
T5 (gate + docs + finish) [blocked by T1,T2,T3,T4]
|
||||
```
|
||||
|
||||
T4 is the only task safe to run concurrently with the Runtime chain. T1 → T2 → T3 serialize (same assembly; T1 and T3 edit `ScriptedAlarmHostActor.cs`).
|
||||
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-11-alarm-historian-followups.md",
|
||||
"designPath": "docs/plans/2026-06-11-alarm-followups-round2-design.md",
|
||||
"branch": "feat/alarm-historian-followups",
|
||||
"baseBranch": "master",
|
||||
"baseSha": "f64f7ce6",
|
||||
"status": "pending",
|
||||
"note": "pending.md code follow-ups items 1-6 (item 7 docker-dev rig cleanup is an operational deferral, out of scope). T1 (HistorizeToAveva opt-out): carry the flag engine→AlarmTransitionEvent→HistorianAdapterActor and gate the DURABLE WRITE ONLY (live /alerts unaffected); non-nullable default-true is rolling-restart-safe by the writer==publisher invariant. T2 (DrainInterval/Capacity/DeadLetterRetention knobs + Validate() startup warnings for empty SharedSecret / relative DatabasePath). T3 (TransitionUser reads Comments[^1].User for shelve/unshelve/enable/disable — ops already append the operator; no Core change). T4 (volatile _backoffIndex + _evictedCount under lock — Core.AlarmHistorian, parallel-safe). Same-assembly contention: T1→T2→T3 serialize (Runtime; T1/T3 share ScriptedAlarmHostActor.cs); T4 runs concurrently. NO Configuration/EF change. Build on feat branch off master.",
|
||||
"tasks": [
|
||||
{"id": 255, "planTask": 0, "subject": "T0: Branch off master", "classification": "trivial", "status": "pending", "blockedBy": []},
|
||||
{"id": 256, "planTask": 1, "subject": "T1: Honor HistorizeToAveva opt-out at the durable write (pending #1)", "classification": "high-risk", "status": "pending", "blockedBy": [255], "parallelizableWith": [259]},
|
||||
{"id": 257, "planTask": 2, "subject": "T2: Historian config knobs + startup validation (pending #2,#3,#4)", "classification": "standard", "status": "pending", "blockedBy": [256], "parallelizableWith": [259]},
|
||||
{"id": 258, "planTask": 3, "subject": "T3: Real operator for shelve/enable/disable transitions (pending #5)", "classification": "small", "status": "pending", "blockedBy": [257], "parallelizableWith": [259]},
|
||||
{"id": 259, "planTask": 4, "subject": "T4: SqliteStoreAndForwardSink thread-safety nits (pending #6)", "classification": "small", "status": "pending", "blockedBy": [255], "parallelizableWith": [256, 257, 258]},
|
||||
{"id": 260, "planTask": 5, "subject": "T5: Full-suite gate + docs + close out", "classification": "small", "status": "pending", "blockedBy": [256, 257, 258, 259]}
|
||||
],
|
||||
"lastUpdated": "2026-06-11"
|
||||
}
|
||||
Reference in New Issue
Block a user