43 lines
7.5 KiB
Markdown
43 lines
7.5 KiB
Markdown
# Native Alarms — Execution Resume Notes
|
||
|
||
**Skill in progress:** `superpowers-extended-cc:executing-plans` on `docs/plans/2026-05-29-native-alarms.md`.
|
||
**To resume:** `/superpowers-extended-cc:executing-plans docs/plans/2026-05-29-native-alarms.md` (reads `…md.tasks.json`).
|
||
|
||
## Workspace
|
||
- **Worktree:** `/Users/dohertj2/Desktop/scadalink-design-native-alarms` (branch `feat/native-alarms`, off `main` @ `09e19db` which holds the design + plan).
|
||
- Do all work here; `main` checkout stays untouched. Build: `dotnet build ZB.MOM.WW.ScadaBridge.slnx`.
|
||
- The shared MS SQL container `scadabridge-mssql` is up (the ConfigDB MsSql migration-fixture tests use it).
|
||
|
||
## Progress: Tasks 1–18 done & committed; 19–28 pending
|
||
Commits (oldest→newest): `696da92` T1, `edc2dac` T2, `ea14ace` T3, `9134419` T4, `63f1ec2` T5, `aedd17c` T6, `fc05ba1` T7, `e5392d2` T8, `ba27873` T9, `d3b3d15` T10, `1fbb814`+`0d30b7d` T11, `c741170` T12, `b44a844` T13, `24fd7be` T14, `fda7ac9` T15, `6d31858` T16, `bca21ff` T17, `20b41b8` T18.
|
||
|
||
**Cadence is batches of 3 (user choice on resume).** Batch 4 = T13–15 ✅, Batch 5 = T16–18 ✅. Full SiteRuntime.Tests green (313/313); Communication.Tests green (200/200). Next: **Batch 6 = T19–21** (gRPC mapping → mgmt command contracts → ManagementActor handlers). Then 22 (CLI), 23–25 (UI), 26 (seed), 27 (docs), 28 (live integration).
|
||
|
||
## Decisions / deviations — Batch 5 (T16–18)
|
||
- **T16:** Connection protocol IS in `FlattenedConfiguration.Connections[name].Protocol` → `ResolveNativeKind` maps protocol-contains-"Mx" → `NativeMxAccess` else `NativeOpcUa`; passed into NativeAlarmActor. Added `_latestAlarmEvents` (enriched event per AlarmName) + extracted `BuildAlarmStatesSnapshot()` used by both `HandleSubscribeDebugView` and `HandleDebugSnapshot` (enriched events ∪ Normal-projection fallback for computed alarms that haven't fired). Native actors skipped when `_dclManager == null` (isolated tests). **Beyond the plan's Files list (justified):** redeploy/undeploy clear — added `native_alarm_state` DELETE to `SiteStorageService.RemoveDeployedConfigAsync` transaction (undeploy) + `ClearNativeAlarmsForInstanceAsync` next to `ClearStaticOverridesAsync` in `DeploymentManagerActor` redeploy path. Native state survives failover (rehydrate) but resets on redeploy — mirrors static-override semantics.
|
||
- **T17:** Test-only (as the plan predicted) — `AlarmStateChanged.Condition` getter already defaults to `ForComputed(State, Priority)` from T2, so computed alarms carry the unified condition without code change. Added regression `AlarmActor_ComputedAlarm_CarriesUnifiedConditionState`.
|
||
- **T18:** Proto regen done via the documented macOS manual flow (uncomment `<Protobuf>` → delete vendored → build → copy `obj/Debug/net10.0/Protos/*.cs` → re-comment). csproj nets to no change. Only `Sitestream.cs` changed (service `SitestreamGrpc.cs` untouched — message-only change). `confirmed` is proto `bool` per plan (null→false fidelity loss accepted). New fields 8–21 on `AlarmStateUpdate`.
|
||
|
||
## Decisions / deviations made during execution — Batch 4 (T13–15)
|
||
- **T15:** `NativeAlarmActor` ctor has an optional trailing `AlarmKind nativeKind = AlarmKind.NativeOpcUa` (additive — keeps the 7-arg call working). T16 will pass `NativeMxAccess` when the connection protocol is MxGateway. Persistence is **fire-and-forget** (`ContinueWith` OnlyOnFaulted logs) — never blocks the actor. State keyed by `SourceReference`; `AlarmName` on the emitted `AlarmStateChanged` is set to the `SourceReference`. Snapshot path: `Snapshot` buffers, `SnapshotComplete` atomic-swaps (dropped → emit `Active=false`). Live path ignores older `TransitionTime`; retention drops a condition once `!Active && Acknowledged`. `NativeAlarmSourceUnavailable` = log + retain (no emit). Subscribe retry via `ScheduleTellOnceCancelable` at `NativeAlarmRetryIntervalMs`.
|
||
|
||
## Known-flaky baseline (NOT my regressions)
|
||
- 5 `StaleTagMonitor*Tests` in `ZB.MOM.WW.ScadaBridge.Commons.Tests` are timing-flaky under load. User approved treating as known-flaky; do not "fix". Watch only for NEW failures.
|
||
|
||
## Decisions / deviations made during execution (carry forward)
|
||
- **T2:** `AlarmStateChanged.Condition` is a computed-default property (getter falls back to `AlarmConditionStateFactory.ForComputed(State, Priority)`); additions are init-props (additive). `AlarmConditionStateFactory` lives in `Commons/Types/Alarms`.
|
||
- **T8:** `ResolvedNativeAlarmSource` has **no `IsLocked`** field (per plan). Inheritance lock is enforced via a **local `lockedNames` HashSet** inside `ResolveInheritedNativeAlarmSources`. Override-lock is NOT enforced at flatten (matches plan; UI/validation layer handles it).
|
||
- **T9:** `SemanticValidator.Validate` gained an **optional** 3rd param `IReadOnlySet<string>? alarmCapableConnectionNames = null`. Connection-existence check only runs when callers pass it; empty source-ref / empty connection-name always checked. `ValidationCategory.NativeAlarmSourceInvalid` added. (Wiring real callers to pass the connection set is not yet done — fine for now.)
|
||
- **T10:** `DataConnectionActor` routes alarm transitions by **source-ref prefix** (`transition.SourceObjectReference`/`SourceReference` StartsWith bound key), dedup per transition. One feed per source-ref, ref-counted. Internal records `AlarmTransitionReceived`, `AlarmSubscribeCompleted`. `NativeAlarmSourceUnavailable` pushed on entering Reconnecting; `ReSubscribeAllAlarms` on reconnect.
|
||
- **T11 (OPC UA):** `OpcUaAlarmMapper` is pure/tested. `RealOpcUaClient.CreateAlarmSubscriptionAsync` does event MonitoredItem + `EventFilter` (select clauses indexed 0–12) + `ConditionRefresh` via `CallAsync` (the sync `Call` is obsolete→error). **`AlarmConditionState` collides with `Opc.Ua.AlarmConditionState`** — fully-qualified as `Commons.Types.Alarms.AlarmConditionState` at the one `new` site. **Behavior unverified until Task 28 (live A&C server).**
|
||
- **T12 (MxGateway):** `MxGatewayAlarmMapper` is pure/tested. Gateway proto enums `AlarmConditionState`/`AlarmTransitionKind` collide with Commons enums → aliased (`ProtoConditionState`/`ProtoTransitionKind` for proto; explicit `using X = Commons…` for the Commons ones). `MxGatewayClient.StreamAlarmsAsync(StreamAlarmsRequest, ct) → IAsyncEnumerable<AlarmFeedMessage>` confirmed present in pkg v0.1.0. Adapter opens **one shared session-less feed** (gateway-wide, null prefix), ref-counted, first-callback drives it (the actor routes). `RealMxGatewayClient.RunAlarmStreamAsync` reconnects internally (5s) — does NOT use `RaiseDisconnected`. Reference: OtOpcUa `…Driver.Galaxy/Runtime/GatewayGalaxyAlarmFeed.cs`. **Behavior unverified until Task 28 (live gateway).**
|
||
|
||
## Execution cadence
|
||
- Per-task TDD: write test → confirm RED → implement → GREEN → commit. Update native task status + this `.tasks.json` each task; report at each batch boundary and wait for "start"/feedback.
|
||
- Batches so far: B1 = T1–4, B2 = T5–8, B3 = T9–12. Next proposed: B4 = T13–17.
|
||
- Native task IDs map plan Task N → native id (N+6) — but on resume the native list is rebuilt from `.tasks.json` (Step 0).
|
||
|
||
## Watch items for remaining tasks
|
||
- **T18 (proto):** `sitestream.proto` is **not auto-compiled** — `<Protobuf>` include is commented out, generated `.cs` vendored in `SiteStreamGrpc/`. Manual macOS regen only (toggle include → `dotnet build` → copy generated files → re-comment). Do NOT auto-compile on Linux.
|
||
- **T28:** OPC UA A&C live smoke (SkippableFact) + confirm infra OPC UA server exposes A&C; manual deploy check via `bash docker/deploy.sh` / `docker-env2/deploy.sh`.
|