Files
ScadaBridge/docs/plans/2026-05-29-native-alarms.RESUME.md
T

5.8 KiB
Raw Blame History

Native Alarms — Execution Resume Notes

Skill in progress: superpowers-extended-cc:executing-plans on docs/plans/2026-05-29-native-alarms.md. To resume: /superpowers-extended-cc:executing-plans docs/plans/2026-05-29-native-alarms.md (reads …md.tasks.json).

Workspace

  • Worktree: /Users/dohertj2/Desktop/scadalink-design-native-alarms (branch feat/native-alarms, off main @ 09e19db which holds the design + plan).
  • Do all work here; main checkout stays untouched. Build: dotnet build ZB.MOM.WW.ScadaBridge.slnx.
  • The shared MS SQL container scadabridge-mssql is up (the ConfigDB MsSql migration-fixture tests use it).

Progress: Tasks 115 done & committed; 1628 pending

Commits (oldest→newest): 696da92 T1, edc2dac T2, ea14ace T3, 9134419 T4, 63f1ec2 T5, aedd17c T6, fc05ba1 T7, e5392d2 T8, ba27873 T9, d3b3d15 T10, 1fbb814+0d30b7d T11, c741170 T12, b44a844 T13, 24fd7be T14, fda7ac9 T15.

Cadence is now batches of 3 (user choice on resume). Batch 4 = T1315 (site runtime store/options/actor) . Full SiteRuntime.Tests green (311/311). Next: Batch 5 = T1618 (InstanceActor wiring → AlarmActor enrich → proto regen). Then 19 (gRPC), 2022 (mgmt/CLI), 2325 (UI), 26 (seed), 27 (docs), 28 (live integration).

Decisions / deviations made during execution — Batch 4 (T1315)

  • T15: NativeAlarmActor ctor has an optional trailing AlarmKind nativeKind = AlarmKind.NativeOpcUa (additive — keeps the 7-arg call working). T16 will pass NativeMxAccess when the connection protocol is MxGateway. Persistence is fire-and-forget (ContinueWith OnlyOnFaulted logs) — never blocks the actor. State keyed by SourceReference; AlarmName on the emitted AlarmStateChanged is set to the SourceReference. Snapshot path: Snapshot buffers, SnapshotComplete atomic-swaps (dropped → emit Active=false). Live path ignores older TransitionTime; retention drops a condition once !Active && Acknowledged. NativeAlarmSourceUnavailable = log + retain (no emit). Subscribe retry via ScheduleTellOnceCancelable at NativeAlarmRetryIntervalMs.

Known-flaky baseline (NOT my regressions)

  • 5 StaleTagMonitor*Tests in ZB.MOM.WW.ScadaBridge.Commons.Tests are timing-flaky under load. User approved treating as known-flaky; do not "fix". Watch only for NEW failures.

Decisions / deviations made during execution (carry forward)

  • T2: AlarmStateChanged.Condition is a computed-default property (getter falls back to AlarmConditionStateFactory.ForComputed(State, Priority)); additions are init-props (additive). AlarmConditionStateFactory lives in Commons/Types/Alarms.
  • T8: ResolvedNativeAlarmSource has no IsLocked field (per plan). Inheritance lock is enforced via a local lockedNames HashSet inside ResolveInheritedNativeAlarmSources. Override-lock is NOT enforced at flatten (matches plan; UI/validation layer handles it).
  • T9: SemanticValidator.Validate gained an optional 3rd param IReadOnlySet<string>? alarmCapableConnectionNames = null. Connection-existence check only runs when callers pass it; empty source-ref / empty connection-name always checked. ValidationCategory.NativeAlarmSourceInvalid added. (Wiring real callers to pass the connection set is not yet done — fine for now.)
  • T10: DataConnectionActor routes alarm transitions by source-ref prefix (transition.SourceObjectReference/SourceReference StartsWith bound key), dedup per transition. One feed per source-ref, ref-counted. Internal records AlarmTransitionReceived, AlarmSubscribeCompleted. NativeAlarmSourceUnavailable pushed on entering Reconnecting; ReSubscribeAllAlarms on reconnect.
  • T11 (OPC UA): OpcUaAlarmMapper is pure/tested. RealOpcUaClient.CreateAlarmSubscriptionAsync does event MonitoredItem + EventFilter (select clauses indexed 012) + ConditionRefresh via CallAsync (the sync Call is obsolete→error). AlarmConditionState collides with Opc.Ua.AlarmConditionState — fully-qualified as Commons.Types.Alarms.AlarmConditionState at the one new site. Behavior unverified until Task 28 (live A&C server).
  • T12 (MxGateway): MxGatewayAlarmMapper is pure/tested. Gateway proto enums AlarmConditionState/AlarmTransitionKind collide with Commons enums → aliased (ProtoConditionState/ProtoTransitionKind for proto; explicit using X = Commons… for the Commons ones). MxGatewayClient.StreamAlarmsAsync(StreamAlarmsRequest, ct) → IAsyncEnumerable<AlarmFeedMessage> confirmed present in pkg v0.1.0. Adapter opens one shared session-less feed (gateway-wide, null prefix), ref-counted, first-callback drives it (the actor routes). RealMxGatewayClient.RunAlarmStreamAsync reconnects internally (5s) — does NOT use RaiseDisconnected. Reference: OtOpcUa …Driver.Galaxy/Runtime/GatewayGalaxyAlarmFeed.cs. Behavior unverified until Task 28 (live gateway).

Execution cadence

  • Per-task TDD: write test → confirm RED → implement → GREEN → commit. Update native task status + this .tasks.json each task; report at each batch boundary and wait for "start"/feedback.
  • Batches so far: B1 = T14, B2 = T58, B3 = T912. Next proposed: B4 = T1317.
  • Native task IDs map plan Task N → native id (N+6) — but on resume the native list is rebuilt from .tasks.json (Step 0).

Watch items for remaining tasks

  • T18 (proto): sitestream.proto is not auto-compiled<Protobuf> include is commented out, generated .cs vendored in SiteStreamGrpc/. Manual macOS regen only (toggle include → dotnet build → copy generated files → re-comment). Do NOT auto-compile on Linux.
  • T28: OPC UA A&C live smoke (SkippableFact) + confirm infra OPC UA server exposes A&C; manual deploy check via bash docker/deploy.sh / docker-env2/deploy.sh.