8.8 KiB
8.8 KiB
Native Alarms — Execution Resume Notes
Skill in progress: superpowers-extended-cc:executing-plans on docs/plans/2026-05-29-native-alarms.md.
To resume: /superpowers-extended-cc:executing-plans docs/plans/2026-05-29-native-alarms.md (reads …md.tasks.json).
Workspace
- Worktree:
/Users/dohertj2/Desktop/scadalink-design-native-alarms(branchfeat/native-alarms, offmain@09e19dbwhich holds the design + plan). - Do all work here;
maincheckout stays untouched. Build:dotnet build ZB.MOM.WW.ScadaBridge.slnx. - The shared MS SQL container
scadabridge-mssqlis up (the ConfigDB MsSql migration-fixture tests use it).
Progress: Tasks 1–21 done & committed; 22–28 pending
Commits (oldest→newest): 696da92 T1 … 20b41b8 T18, 0c6f9a9 T19, b1df6d5 T20, 3bf1d26 T21. (Full list in git log.)
Cadence is batches of 3 (user choice on resume). Batch 4 = T13–15 ✅, Batch 5 = T16–18 ✅, Batch 6 = T19–21 ✅. Greens: SiteRuntime.Tests 313/313, Communication.Tests 200/200, ManagementService.Tests 111/111, Commons.Tests registry 8/8. Next: Batch 7 = T22 (CLI) → T23 (DebugView UI) → T24 (Template editor UI). Then 25 (instance UI), 26 (seed), 27 (docs), 28 (live integration).
Decisions / deviations — Batch 6 (T19–21)
- T19:
AlarmShelveStateCodec(string↔enum, default Unshelved). ServerStreamRelayActormapsmsg.Condition.*+Kind.ToString()+ nullableOriginalRaiseTimeout; clientConvertToDomainEventrebuildsCondition(severity = wirePriority) +ParseAlarmKindback. Gotcha: client importsGoogle.Protobuf.WellKnownTypes, soEnumis ambiguous — usedSystem.Enum.TryParse.confirmedproto bool → domainbool?(false, never null after round-trip). - T20:
ManagementCommandRegistryis reflection-based (auto-discovers*Commandrecords in the Management namespace) — no manual registry edit needed; just added the 7 records to TemplateCommands.cs / InstanceCommands.cs. - T21: Handlers call
ITemplateEngineRepositorydirectly +SaveChangesAsync(per plan's "call the Task 6 repo methods"), NOT through TemplateService/InstanceService. Trade-off: this skips the service-layerIAuditServicelogging that the existing template/instance alarm CRUD gets. Acceptable for the read-only-mirror authoring commands and matches the plan's Files scope (ManagementService only), but flag if audit parity is wanted later. Roles: template mutations = Design, instance-override mutations = Deployment, lists = any authenticated. Update = fetch-then-mutate (preserves TemplateId for the unique index). Set-override = upsert (Add if absent else Update).
Decisions / deviations — Batch 5 (T16–18)
- T16: Connection protocol IS in
FlattenedConfiguration.Connections[name].Protocol→ResolveNativeKindmaps protocol-contains-"Mx" →NativeMxAccesselseNativeOpcUa; passed into NativeAlarmActor. Added_latestAlarmEvents(enriched event per AlarmName) + extractedBuildAlarmStatesSnapshot()used by bothHandleSubscribeDebugViewandHandleDebugSnapshot(enriched events ∪ Normal-projection fallback for computed alarms that haven't fired). Native actors skipped when_dclManager == null(isolated tests). Beyond the plan's Files list (justified): redeploy/undeploy clear — addednative_alarm_stateDELETE toSiteStorageService.RemoveDeployedConfigAsynctransaction (undeploy) +ClearNativeAlarmsForInstanceAsyncnext toClearStaticOverridesAsyncinDeploymentManagerActorredeploy path. Native state survives failover (rehydrate) but resets on redeploy — mirrors static-override semantics. - T17: Test-only (as the plan predicted) —
AlarmStateChanged.Conditiongetter already defaults toForComputed(State, Priority)from T2, so computed alarms carry the unified condition without code change. Added regressionAlarmActor_ComputedAlarm_CarriesUnifiedConditionState. - T18: Proto regen done via the documented macOS manual flow (uncomment
<Protobuf>→ delete vendored → build → copyobj/Debug/net10.0/Protos/*.cs→ re-comment). csproj nets to no change. OnlySitestream.cschanged (serviceSitestreamGrpc.csuntouched — message-only change).confirmedis protoboolper plan (null→false fidelity loss accepted). New fields 8–21 onAlarmStateUpdate.
Decisions / deviations made during execution — Batch 4 (T13–15)
- T15:
NativeAlarmActorctor has an optional trailingAlarmKind nativeKind = AlarmKind.NativeOpcUa(additive — keeps the 7-arg call working). T16 will passNativeMxAccesswhen the connection protocol is MxGateway. Persistence is fire-and-forget (ContinueWithOnlyOnFaulted logs) — never blocks the actor. State keyed bySourceReference;AlarmNameon the emittedAlarmStateChangedis set to theSourceReference. Snapshot path:Snapshotbuffers,SnapshotCompleteatomic-swaps (dropped → emitActive=false). Live path ignores olderTransitionTime; retention drops a condition once!Active && Acknowledged.NativeAlarmSourceUnavailable= log + retain (no emit). Subscribe retry viaScheduleTellOnceCancelableatNativeAlarmRetryIntervalMs.
Known-flaky baseline (NOT my regressions)
- 5
StaleTagMonitor*TestsinZB.MOM.WW.ScadaBridge.Commons.Testsare timing-flaky under load. User approved treating as known-flaky; do not "fix". Watch only for NEW failures.
Decisions / deviations made during execution (carry forward)
- T2:
AlarmStateChanged.Conditionis a computed-default property (getter falls back toAlarmConditionStateFactory.ForComputed(State, Priority)); additions are init-props (additive).AlarmConditionStateFactorylives inCommons/Types/Alarms. - T8:
ResolvedNativeAlarmSourcehas noIsLockedfield (per plan). Inheritance lock is enforced via a locallockedNamesHashSet insideResolveInheritedNativeAlarmSources. Override-lock is NOT enforced at flatten (matches plan; UI/validation layer handles it). - T9:
SemanticValidator.Validategained an optional 3rd paramIReadOnlySet<string>? alarmCapableConnectionNames = null. Connection-existence check only runs when callers pass it; empty source-ref / empty connection-name always checked.ValidationCategory.NativeAlarmSourceInvalidadded. (Wiring real callers to pass the connection set is not yet done — fine for now.) - T10:
DataConnectionActorroutes alarm transitions by source-ref prefix (transition.SourceObjectReference/SourceReferenceStartsWith bound key), dedup per transition. One feed per source-ref, ref-counted. Internal recordsAlarmTransitionReceived,AlarmSubscribeCompleted.NativeAlarmSourceUnavailablepushed on entering Reconnecting;ReSubscribeAllAlarmson reconnect. - T11 (OPC UA):
OpcUaAlarmMapperis pure/tested.RealOpcUaClient.CreateAlarmSubscriptionAsyncdoes event MonitoredItem +EventFilter(select clauses indexed 0–12) +ConditionRefreshviaCallAsync(the syncCallis obsolete→error).AlarmConditionStatecollides withOpc.Ua.AlarmConditionState— fully-qualified asCommons.Types.Alarms.AlarmConditionStateat the onenewsite. Behavior unverified until Task 28 (live A&C server). - T12 (MxGateway):
MxGatewayAlarmMapperis pure/tested. Gateway proto enumsAlarmConditionState/AlarmTransitionKindcollide with Commons enums → aliased (ProtoConditionState/ProtoTransitionKindfor proto; explicitusing X = Commons…for the Commons ones).MxGatewayClient.StreamAlarmsAsync(StreamAlarmsRequest, ct) → IAsyncEnumerable<AlarmFeedMessage>confirmed present in pkg v0.1.0. Adapter opens one shared session-less feed (gateway-wide, null prefix), ref-counted, first-callback drives it (the actor routes).RealMxGatewayClient.RunAlarmStreamAsyncreconnects internally (5s) — does NOT useRaiseDisconnected. Reference: OtOpcUa…Driver.Galaxy/Runtime/GatewayGalaxyAlarmFeed.cs. Behavior unverified until Task 28 (live gateway).
Execution cadence
- Per-task TDD: write test → confirm RED → implement → GREEN → commit. Update native task status + this
.tasks.jsoneach task; report at each batch boundary and wait for "start"/feedback. - Batches so far: B1 = T1–4, B2 = T5–8, B3 = T9–12. Next proposed: B4 = T13–17.
- Native task IDs map plan Task N → native id (N+6) — but on resume the native list is rebuilt from
.tasks.json(Step 0).
Watch items for remaining tasks
- T18 (proto):
sitestream.protois not auto-compiled —<Protobuf>include is commented out, generated.csvendored inSiteStreamGrpc/. Manual macOS regen only (toggle include →dotnet build→ copy generated files → re-comment). Do NOT auto-compile on Linux. - T28: OPC UA A&C live smoke (SkippableFact) + confirm infra OPC UA server exposes A&C; manual deploy check via
bash docker/deploy.sh/docker-env2/deploy.sh.