Files
ScadaBridge/docs/plans/2026-05-29-native-alarms.RESUME.md
T

8.8 KiB
Raw Blame History

Native Alarms — Execution Resume Notes

Skill in progress: superpowers-extended-cc:executing-plans on docs/plans/2026-05-29-native-alarms.md. To resume: /superpowers-extended-cc:executing-plans docs/plans/2026-05-29-native-alarms.md (reads …md.tasks.json).

Workspace

  • Worktree: /Users/dohertj2/Desktop/scadalink-design-native-alarms (branch feat/native-alarms, off main @ 09e19db which holds the design + plan).
  • Do all work here; main checkout stays untouched. Build: dotnet build ZB.MOM.WW.ScadaBridge.slnx.
  • The shared MS SQL container scadabridge-mssql is up (the ConfigDB MsSql migration-fixture tests use it).

Progress: Tasks 121 done & committed; 2228 pending

Commits (oldest→newest): 696da92 T1 … 20b41b8 T18, 0c6f9a9 T19, b1df6d5 T20, 3bf1d26 T21. (Full list in git log.)

Cadence is batches of 3 (user choice on resume). Batch 4 = T1315 , Batch 5 = T1618 , Batch 6 = T1921 . Greens: SiteRuntime.Tests 313/313, Communication.Tests 200/200, ManagementService.Tests 111/111, Commons.Tests registry 8/8. Next: Batch 7 = T22 (CLI) → T23 (DebugView UI) → T24 (Template editor UI). Then 25 (instance UI), 26 (seed), 27 (docs), 28 (live integration).

Decisions / deviations — Batch 6 (T1921)

  • T19: AlarmShelveStateCodec (string↔enum, default Unshelved). Server StreamRelayActor maps msg.Condition.* + Kind.ToString() + nullable OriginalRaiseTime out; client ConvertToDomainEvent rebuilds Condition (severity = wire Priority) + ParseAlarmKind back. Gotcha: client imports Google.Protobuf.WellKnownTypes, so Enum is ambiguous — used System.Enum.TryParse. confirmed proto bool → domain bool? (false, never null after round-trip).
  • T20: ManagementCommandRegistry is reflection-based (auto-discovers *Command records in the Management namespace) — no manual registry edit needed; just added the 7 records to TemplateCommands.cs / InstanceCommands.cs.
  • T21: Handlers call ITemplateEngineRepository directly + SaveChangesAsync (per plan's "call the Task 6 repo methods"), NOT through TemplateService/InstanceService. Trade-off: this skips the service-layer IAuditService logging that the existing template/instance alarm CRUD gets. Acceptable for the read-only-mirror authoring commands and matches the plan's Files scope (ManagementService only), but flag if audit parity is wanted later. Roles: template mutations = Design, instance-override mutations = Deployment, lists = any authenticated. Update = fetch-then-mutate (preserves TemplateId for the unique index). Set-override = upsert (Add if absent else Update).

Decisions / deviations — Batch 5 (T1618)

  • T16: Connection protocol IS in FlattenedConfiguration.Connections[name].ProtocolResolveNativeKind maps protocol-contains-"Mx" → NativeMxAccess else NativeOpcUa; passed into NativeAlarmActor. Added _latestAlarmEvents (enriched event per AlarmName) + extracted BuildAlarmStatesSnapshot() used by both HandleSubscribeDebugView and HandleDebugSnapshot (enriched events Normal-projection fallback for computed alarms that haven't fired). Native actors skipped when _dclManager == null (isolated tests). Beyond the plan's Files list (justified): redeploy/undeploy clear — added native_alarm_state DELETE to SiteStorageService.RemoveDeployedConfigAsync transaction (undeploy) + ClearNativeAlarmsForInstanceAsync next to ClearStaticOverridesAsync in DeploymentManagerActor redeploy path. Native state survives failover (rehydrate) but resets on redeploy — mirrors static-override semantics.
  • T17: Test-only (as the plan predicted) — AlarmStateChanged.Condition getter already defaults to ForComputed(State, Priority) from T2, so computed alarms carry the unified condition without code change. Added regression AlarmActor_ComputedAlarm_CarriesUnifiedConditionState.
  • T18: Proto regen done via the documented macOS manual flow (uncomment <Protobuf> → delete vendored → build → copy obj/Debug/net10.0/Protos/*.cs → re-comment). csproj nets to no change. Only Sitestream.cs changed (service SitestreamGrpc.cs untouched — message-only change). confirmed is proto bool per plan (null→false fidelity loss accepted). New fields 821 on AlarmStateUpdate.

Decisions / deviations made during execution — Batch 4 (T1315)

  • T15: NativeAlarmActor ctor has an optional trailing AlarmKind nativeKind = AlarmKind.NativeOpcUa (additive — keeps the 7-arg call working). T16 will pass NativeMxAccess when the connection protocol is MxGateway. Persistence is fire-and-forget (ContinueWith OnlyOnFaulted logs) — never blocks the actor. State keyed by SourceReference; AlarmName on the emitted AlarmStateChanged is set to the SourceReference. Snapshot path: Snapshot buffers, SnapshotComplete atomic-swaps (dropped → emit Active=false). Live path ignores older TransitionTime; retention drops a condition once !Active && Acknowledged. NativeAlarmSourceUnavailable = log + retain (no emit). Subscribe retry via ScheduleTellOnceCancelable at NativeAlarmRetryIntervalMs.

Known-flaky baseline (NOT my regressions)

  • 5 StaleTagMonitor*Tests in ZB.MOM.WW.ScadaBridge.Commons.Tests are timing-flaky under load. User approved treating as known-flaky; do not "fix". Watch only for NEW failures.

Decisions / deviations made during execution (carry forward)

  • T2: AlarmStateChanged.Condition is a computed-default property (getter falls back to AlarmConditionStateFactory.ForComputed(State, Priority)); additions are init-props (additive). AlarmConditionStateFactory lives in Commons/Types/Alarms.
  • T8: ResolvedNativeAlarmSource has no IsLocked field (per plan). Inheritance lock is enforced via a local lockedNames HashSet inside ResolveInheritedNativeAlarmSources. Override-lock is NOT enforced at flatten (matches plan; UI/validation layer handles it).
  • T9: SemanticValidator.Validate gained an optional 3rd param IReadOnlySet<string>? alarmCapableConnectionNames = null. Connection-existence check only runs when callers pass it; empty source-ref / empty connection-name always checked. ValidationCategory.NativeAlarmSourceInvalid added. (Wiring real callers to pass the connection set is not yet done — fine for now.)
  • T10: DataConnectionActor routes alarm transitions by source-ref prefix (transition.SourceObjectReference/SourceReference StartsWith bound key), dedup per transition. One feed per source-ref, ref-counted. Internal records AlarmTransitionReceived, AlarmSubscribeCompleted. NativeAlarmSourceUnavailable pushed on entering Reconnecting; ReSubscribeAllAlarms on reconnect.
  • T11 (OPC UA): OpcUaAlarmMapper is pure/tested. RealOpcUaClient.CreateAlarmSubscriptionAsync does event MonitoredItem + EventFilter (select clauses indexed 012) + ConditionRefresh via CallAsync (the sync Call is obsolete→error). AlarmConditionState collides with Opc.Ua.AlarmConditionState — fully-qualified as Commons.Types.Alarms.AlarmConditionState at the one new site. Behavior unverified until Task 28 (live A&C server).
  • T12 (MxGateway): MxGatewayAlarmMapper is pure/tested. Gateway proto enums AlarmConditionState/AlarmTransitionKind collide with Commons enums → aliased (ProtoConditionState/ProtoTransitionKind for proto; explicit using X = Commons… for the Commons ones). MxGatewayClient.StreamAlarmsAsync(StreamAlarmsRequest, ct) → IAsyncEnumerable<AlarmFeedMessage> confirmed present in pkg v0.1.0. Adapter opens one shared session-less feed (gateway-wide, null prefix), ref-counted, first-callback drives it (the actor routes). RealMxGatewayClient.RunAlarmStreamAsync reconnects internally (5s) — does NOT use RaiseDisconnected. Reference: OtOpcUa …Driver.Galaxy/Runtime/GatewayGalaxyAlarmFeed.cs. Behavior unverified until Task 28 (live gateway).

Execution cadence

  • Per-task TDD: write test → confirm RED → implement → GREEN → commit. Update native task status + this .tasks.json each task; report at each batch boundary and wait for "start"/feedback.
  • Batches so far: B1 = T14, B2 = T58, B3 = T912. Next proposed: B4 = T1317.
  • Native task IDs map plan Task N → native id (N+6) — but on resume the native list is rebuilt from .tasks.json (Step 0).

Watch items for remaining tasks

  • T18 (proto): sitestream.proto is not auto-compiled<Protobuf> include is commented out, generated .cs vendored in SiteStreamGrpc/. Manual macOS regen only (toggle include → dotnet build → copy generated files → re-comment). Do NOT auto-compile on Linux.
  • T28: OPC UA A&C live smoke (SkippableFact) + confirm infra OPC UA server exposes A&C; manual deploy check via bash docker/deploy.sh / docker-env2/deploy.sh.