diff --git a/docs/plans/2026-06-13-alarm-subtag-fallback.md b/docs/plans/2026-06-13-alarm-subtag-fallback.md new file mode 100644 index 0000000..09af7b6 --- /dev/null +++ b/docs/plans/2026-06-13-alarm-subtag-fallback.md @@ -0,0 +1,858 @@ +# Alarm Subtag-Monitoring Fallback — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task. + +**Goal:** Add a second alarm source — direct MXAccess subtag monitoring — that the gateway auto-fails-over to when the wnwrap alarmmgr provider breaks, auto-fails-back to when it recovers, and can be forced on by config. + +**Architecture:** Worker-side synthesis (parity rule preserved). A new `SubtagAlarmConsumer` (own `LMXProxyServerClass`, `AddItem`/`Advise` on alarm subtags) and a `FailoverAlarmConsumer` composite (state machine over the wnwrap primary + subtag standby) both implement the existing `IMxAccessAlarmConsumer` seam. The gateway resolves the subtag watch-list (Galaxy Repository SQL + config override), arms the worker at subscribe time, and reflects the live provider mode into the gRPC alarm feed, the dashboard hub, and metrics. + +**Tech Stack:** .NET 10 (gateway, x64) + .NET Framework 4.8 (worker, x86, STA), protobuf/gRPC, `Microsoft.Data.SqlClient` (Galaxy Repository), SignalR (dashboard), `System.Diagnostics.Metrics`, xUnit (plain `Assert`, no FluentAssertions). + +**Design source:** `docs/plans/2026-06-13-alarm-subtag-fallback-design.md` + +**Branch:** `feat/alarm-subtag-fallback` (already created) + +--- + +## Conventions for every task + +- **TDD:** write the failing test, run it red, implement, run it green, commit. +- **xUnit, plain `Assert.*`**, naming `Subject_Condition_Expected`. Worker fakes are sealed private nested classes that raise events. +- **Build/test commands:** + - Contracts regen: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` + - Gateway: `dotnet build src/ZB.MOM.WW.MxGateway.Server` ; `dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj` + - Worker (x86): `dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86` ; `dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86` + - Single test: append `--filter FullyQualifiedName~` +- **Build is strict:** `TreatWarningsAsErrors=true`, nullable enabled. Add XML doc comments on public members (the repo runs a doc checker). +- **Generated code** under `Generated/` is never hand-edited — rebuild the contracts project to regenerate. +- **Namespaces:** worker MxAccess types live in `ZB.MOM.WW.MxGateway.Worker.MxAccess`; proto C# types in `ZB.MOM.WW.MxGateway.Contracts.Proto`. + +--- + +## Phase 0 — Contracts + +### Task 1: Worker proto — subtag watch-list, failover config, provider-mode enum + +**Classification:** high-risk +**Estimated implement time:** ~4 min +**Parallelizable with:** none (Task 2 imports these types) + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto` (alarm command block, ~lines 318-346) + +**Step 1: Add the enum and messages.** In `mxaccess_worker.proto`, replace the `SubscribeAlarmsCommand` message and add the new types after it: + +```protobuf +// Provider selection / current provider for the alarm feed. Defined here in +// the worker contract because the worker SubscribeAlarmsCommand references it; +// mxaccess_gateway.proto imports this file and reuses the same enum. +enum AlarmProviderMode { + ALARM_PROVIDER_MODE_UNSPECIFIED = 0; // auto: alarmmgr primary, subtag fallback + ALARM_PROVIDER_MODE_ALARMMGR = 1; + ALARM_PROVIDER_MODE_SUBTAG = 2; +} + +message SubscribeAlarmsCommand { + string subscription_expression = 1; + // UNSPECIFIED = auto-failover/failback. ALARMMGR/SUBTAG force one provider. + AlarmProviderMode forced_mode = 2; + // Subtag watch-list resolved by the gateway (GR SQL + config). Empty in pure + // alarmmgr mode; in subtag mode it bounds what the consumer can observe. + repeated AlarmSubtagTarget watch_list = 3; + AlarmFailoverConfig failover = 4; +} + +// One alarm attribute the subtag consumer advises. Addresses are full MXAccess +// item references the worker passes straight to AddItem. +message AlarmSubtagTarget { + string alarm_full_reference = 1; // e.g. "Galaxy!Area.Tank01.Level.HiHi" + string source_object_reference = 2; // e.g. "Tank01" + string active_subtag = 3; // item address of the in-alarm boolean + string acked_subtag = 4; // item address of the acknowledged boolean + string ack_comment_subtag = 5; // writable ack-comment attribute (ack write target) + string priority_subtag = 6; // optional severity source; empty if absent +} + +message AlarmFailoverConfig { + int32 consecutive_failure_threshold = 1; // wnwrap COM failures before switching (>=1) + int32 failback_probe_interval_seconds = 2; // probe cadence while degraded (>=1) + int32 failback_stable_probes = 3; // clean probes before switching back (>=1) +} +``` + +`UnsubscribeAlarmsCommand` and `AcknowledgeAlarmCommand` are unchanged. + +**Step 2: Regenerate & verify it compiles.** +Run: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` +Expected: build succeeds; generated `AlarmProviderMode`, `AlarmSubtagTarget`, `AlarmFailoverConfig` types appear. + +**Step 3: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto +git commit -m "contracts(worker): subtag watch-list + failover config + AlarmProviderMode" +``` + +--- + +### Task 2: Gateway proto — provider status on the feed, degraded provenance, mode-changed event + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Task 1; Task 3 tests both) + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto` (`OnAlarmTransitionEvent` ~719-771, `ActiveAlarmSnapshot` ~783-803, `AlarmFeedMessage` ~860-870, `MxEvent` family enum + body oneof, `MxEventFamily` enum) + +**Step 1: Add degraded provenance to the two alarm payloads.** Append to `OnAlarmTransitionEvent` (next free field 14): + +```protobuf + // True when this transition came from the subtag-monitoring fallback rather + // than the native alarmmgr provider — i.e. it was synthesized from data + // changes and carries reduced fidelity (synthetic GUID, no native raise time). + bool degraded = 14; + // Which provider produced this transition. + AlarmProviderMode source_provider = 15; +``` + +Append the identical two fields to `ActiveAlarmSnapshot` (next free field 14): +```protobuf + bool degraded = 14; + AlarmProviderMode source_provider = 15; +``` + +**Step 2: Add provider status to the feed oneof.** Add a new oneof case to `AlarmFeedMessage` (next free field 4) and a new message: + +```protobuf +message AlarmFeedMessage { + oneof payload { + ActiveAlarmSnapshot active_alarm = 1; + bool snapshot_complete = 2; + OnAlarmTransitionEvent transition = 3; + // Provider-mode status. Emitted once on stream open and again on every + // failover/failback so late joiners learn the current mode immediately. + AlarmProviderStatus provider_status = 4; + } +} + +message AlarmProviderStatus { + AlarmProviderMode mode = 1; + bool degraded = 2; // true whenever mode == SUBTAG + string reason = 3; // human-readable switch reason + google.protobuf.Timestamp since = 4; +} +``` + +**Step 3: Add the worker→gateway mode-changed event to `MxEvent`.** Find the `MxEventFamily` enum and the `MxEvent` body oneof. Add a family member and a body message + oneof case (use the next free family value and the next free `MxEvent` body field number — check the file): + +```protobuf +// in MxEventFamily enum: + MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED = ; + +// new message near OnAlarmTransitionEvent: +message OnAlarmProviderModeChangedEvent { + AlarmProviderMode mode = 1; + string reason = 2; + int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback + google.protobuf.Timestamp at = 4; +} + +// in MxEvent body oneof: + OnAlarmProviderModeChangedEvent on_alarm_provider_mode_changed = ; +``` + +`AlarmProviderMode` is defined in `mxaccess_worker.proto`; confirm `mxaccess_gateway.proto` already has `import "mxaccess_worker.proto";` (it references `SubscribeAlarmsCommand`, so it does) and reference the enum unqualified or via its package as the existing references do. + +**Step 4: Regenerate & verify.** +Run: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` +Expected: build succeeds. + +**Step 5: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto +git commit -m "contracts(gateway): AlarmProviderStatus feed case, degraded provenance, mode-changed event" +``` + +--- + +### Task 3: Proto round-trip tests for the new alarm fields + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** none (depends on Tasks 1-2) + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` + +**Step 1: Add tests** mirroring the existing `Event_RoundTripsOnAlarmTransitionWithFullPayload` style: + +```csharp +[Fact] +public void Feed_RoundTripsProviderStatus() +{ + var since = Timestamp.FromDateTime(new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc)); + var original = new AlarmFeedMessage + { + ProviderStatus = new AlarmProviderStatus + { + Mode = AlarmProviderMode.Subtag, + Degraded = true, + Reason = "wnwrap poll failed 3x (HRESULT 0x80004005)", + Since = since, + }, + }; + + var parsed = AlarmFeedMessage.Parser.ParseFrom(original.ToByteArray()); + + Assert.Equal(original, parsed); + Assert.Equal(AlarmFeedMessage.PayloadOneofCase.ProviderStatus, parsed.PayloadCase); + Assert.True(parsed.ProviderStatus.Degraded); + Assert.Equal(AlarmProviderMode.Subtag, parsed.ProviderStatus.Mode); +} + +[Fact] +public void Transition_RoundTripsDegradedProvenance() +{ + var t = new OnAlarmTransitionEvent + { + AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi", + TransitionKind = AlarmTransitionKind.Raise, + Degraded = true, + SourceProvider = AlarmProviderMode.Subtag, + }; + + var parsed = OnAlarmTransitionEvent.Parser.ParseFrom(t.ToByteArray()); + + Assert.True(parsed.Degraded); + Assert.Equal(AlarmProviderMode.Subtag, parsed.SourceProvider); +} +``` + +**Step 2: Run red→green.** +Run: `dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ProtobufContractRoundTripTests` +Expected: PASS. + +**Step 3: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs +git commit -m "test(contracts): round-trip provider status + degraded provenance" +``` + +--- + +## Phase 1 — Worker: subtag consumer + failover + +### Task 4: Subtag value-source abstraction + synthesis state holder + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** none (Task 5 builds on it) + +A testable seam so synthesis logic is unit-tested without COM. The COM wiring lands in Task 6. + +**Files:** +- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs` +- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs` +- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs` + +**Step 1: Define the source abstraction.** `ISubtagAlarmSource` advises subtag addresses and raises a normalized value-change callback on the STA: + +```csharp +namespace ZB.MOM.WW.MxGateway.Worker.MxAccess; + +/// A change in one advised subtag value, normalized off the COM boundary. +public sealed class SubtagValueChange +{ + /// The full item address that changed (matches an AlarmSubtagTarget subtag). + public string ItemAddress { get; init; } = string.Empty; + /// The new value (boolean for .active/.acked, numeric for priority). + public object? Value { get; init; } + /// The change timestamp in UTC. + public DateTime TimestampUtc { get; init; } +} + +/// +/// Advises a set of MXAccess subtag addresses and surfaces value changes. +/// The production implementation (Task 6) owns its own LMXProxyServerClass; +/// tests substitute a fake that pushes s. +/// +public interface ISubtagAlarmSource : IDisposable +{ + /// Raised on the STA when an advised subtag's value changes. + event EventHandler? ValueChanged; + + /// Advises every subtag in the supplied addresses; idempotent per address. + void Advise(IReadOnlyCollection itemAddresses); + + /// Writes a value to an item address (used for the ack-comment write). + void Write(string itemAddress, object? value); +} +``` + +**Step 2: Write the state-machine tests first.** `SubtagAlarmStateMachine` maps `(active, acked)` changes per target to `MxAlarmTransitionEvent`s. Test the four core transitions: + +```csharp +namespace ZB.MOM.WW.MxGateway.Worker.Tests.MxAccess; + +public sealed class SubtagAlarmStateMachineTests +{ + private static AlarmSubtagTarget Target() => new() + { + AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi", + SourceObjectReference = "Tank01", + ActiveSubtag = "Tank01.Level.HiHi.active", + AckedSubtag = "Tank01.Level.HiHi.acked", + AckCommentSubtag = "Tank01.Level.HiHi.ackmsg", + }; + + [Fact] + public void ActiveFalseToTrue_EmitsRaise_FlaggedDegraded() + { + var sm = new SubtagAlarmStateMachine(new[] { Target() }); + var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); + + var events = sm.Apply("Tank01.Level.HiHi.active", true, ts); + + var e = Assert.Single(events); + Assert.Equal(MxAlarmStateKind.UnackAlm, e.Record.State); + Assert.Equal(MxAlarmStateKind.Unspecified, e.PreviousState); + Assert.Equal("Tank01.Level.HiHi", e.Record.TagName); // reference minus provider/area + } + + [Fact] + public void AckedTrueWhileActive_EmitsAckTransition() + { + var sm = new SubtagAlarmStateMachine(new[] { Target() }); + var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); + sm.Apply("Tank01.Level.HiHi.active", true, ts); + + var events = sm.Apply("Tank01.Level.HiHi.acked", true, ts.AddSeconds(5)); + + var e = Assert.Single(events); + Assert.Equal(MxAlarmStateKind.AckAlm, e.Record.State); + Assert.Equal(MxAlarmStateKind.UnackAlm, e.PreviousState); + } + + [Fact] + public void ActiveTrueToFalse_WhileUnacked_EmitsUnackRtn() + { + var sm = new SubtagAlarmStateMachine(new[] { Target() }); + var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); + sm.Apply("Tank01.Level.HiHi.active", true, ts); + + var events = sm.Apply("Tank01.Level.HiHi.active", false, ts.AddSeconds(10)); + + var e = Assert.Single(events); + Assert.Equal(MxAlarmStateKind.UnackRtn, e.Record.State); + } + + [Fact] + public void Snapshot_ReflectsActiveAndAckedState() + { + var sm = new SubtagAlarmStateMachine(new[] { Target() }); + var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); + sm.Apply("Tank01.Level.HiHi.active", true, ts); + sm.Apply("Tank01.Level.HiHi.acked", true, ts); + + var snap = Assert.Single(sm.SnapshotActive()); + Assert.Equal(MxAlarmStateKind.AckAlm, snap.State); + } +} +``` + +Run: `dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmStateMachineTests` → FAIL (type missing). + +**Step 3: Implement `SubtagAlarmStateMachine`.** Build an address→target index (active/acked/priority/comment addresses), hold per-reference `(bool active, bool acked, DateTime firstRaiseUtc, int priority)`, and emit on change: +- active `false→true` ⇒ `UnackAlm`, set `firstRaiseUtc`, `PreviousState` from prior state. +- acked `false→true` while active ⇒ `AckAlm`. +- active `true→false` ⇒ `AckRtn` if currently acked else `UnackRtn`; then reset acked. +- priority change ⇒ update stored priority, no transition. +- `TagName` = `alarm_full_reference` with any `Provider!Area.` prefix stripped (match `WnWrapAlarmConsumer`'s reference shape so `GatewayAlarmMonitor` keys align). Set `ProviderName`, `Group`, `Priority`, `AlarmComment` from the target/last values. Mark a `Degraded`/source flag (carried via a new field — see Task 5 wiring). +- `SnapshotActive()` returns `MxAlarmSnapshotRecord` for references whose active is true. + +**Step 4: Run green.** Expected: PASS. + +**Step 5: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs \ + src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs \ + src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs +git commit -m "worker(alarms): subtag value-source seam + synthesis state machine" +``` + +--- + +### Task 5: `SubtagAlarmConsumer` over the source seam (no COM yet) + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Task 4) + +**Files:** +- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs` +- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs` + +**Step 1: Test with a fake `ISubtagAlarmSource`.** Drive value changes through the source, assert `AlarmTransitionEmitted` fires with synthesized records and that ack writes the comment to the ack-comment subtag: + +```csharp +public sealed class SubtagAlarmConsumerTests +{ + private sealed class FakeSource : ISubtagAlarmSource + { + public event EventHandler? ValueChanged; + public List Advised { get; } = new(); + public (string Address, object? Value)? LastWrite { get; private set; } + public void Advise(IReadOnlyCollection a) => Advised.AddRange(a); + public void Write(string a, object? v) => LastWrite = (a, v); + public void Raise(string addr, object? val, DateTime ts) => + ValueChanged?.Invoke(this, new SubtagValueChange { ItemAddress = addr, Value = val, TimestampUtc = ts }); + public void Dispose() { } + } + + private static AlarmSubtagTarget Target() => new() + { + AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi", + ActiveSubtag = "Tank01.Level.HiHi.active", + AckedSubtag = "Tank01.Level.HiHi.acked", + AckCommentSubtag = "Tank01.Level.HiHi.ackmsg", + }; + + [Fact] + public void Subscribe_AdvisesAllSubtags() + { + var src = new FakeSource(); + using var c = new SubtagAlarmConsumer(src, new[] { Target() }); + c.Subscribe("ignored-in-subtag-mode"); + Assert.Contains("Tank01.Level.HiHi.active", src.Advised); + Assert.Contains("Tank01.Level.HiHi.acked", src.Advised); + } + + [Fact] + public void ValueChange_RaisesSynthesizedTransition() + { + var src = new FakeSource(); + using var c = new SubtagAlarmConsumer(src, new[] { Target() }); + c.Subscribe("x"); + MxAlarmTransitionEvent? seen = null; + c.AlarmTransitionEmitted += (_, e) => seen = e; + + src.Raise("Tank01.Level.HiHi.active", true, new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc)); + + Assert.NotNull(seen); + Assert.Equal(MxAlarmStateKind.UnackAlm, seen!.Record.State); + } + + [Fact] + public void AcknowledgeByName_WritesCommentToAckCommentSubtag() + { + var src = new FakeSource(); + using var c = new SubtagAlarmConsumer(src, new[] { Target() }); + c.Subscribe("x"); + + int rc = c.AcknowledgeByName("Tank01.Level.HiHi", "Galaxy", "Area", + "ack from HMI", "op1", "node", "dom", "Op One"); + + Assert.Equal(0, rc); + Assert.Equal(("Tank01.Level.HiHi.ackmsg", (object?)"ack from HMI"), src.LastWrite); + } +} +``` + +**Step 2: Implement `SubtagAlarmConsumer : IMxAccessAlarmConsumer`.** +- Constructor `(ISubtagAlarmSource source, IReadOnlyList watchList)`; build a `SubtagAlarmStateMachine`; index `alarm_full_reference`→target for ack routing. +- `Subscribe(_)`: call `source.Advise()`; subscribe to `source.ValueChanged`, feed each into the state machine, and re-raise each produced `MxAlarmTransitionEvent` via `AlarmTransitionEmitted` (mark degraded). +- `AcknowledgeByName(alarmName, …, comment, …)`: resolve the target by reference; if no `AckCommentSubtag`, return a non-zero failure code; else `source.Write(target.AckCommentSubtag, comment)` and return 0. +- `AcknowledgeByGuid(guid, …)`: map the synthetic GUID (deterministic hash of reference — see Task 8 helper, or a local copy) back to a reference, then delegate to the name path; unknown GUID ⇒ non-zero. +- `SnapshotActiveAlarms()`: from the state machine. +- `PollOnce()`: no-op. +- `Dispose()`: unsubscribe + dispose source. + +**Step 3: Run green.** `dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmConsumerTests`. + +**Step 4: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs \ + src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs +git commit -m "worker(alarms): SubtagAlarmConsumer synthesizing transitions over the source seam" +``` + +--- + +### Task 6: COM-backed `LmxSubtagAlarmSource` (own LMXProxyServerClass) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none + +The only piece that touches live COM. Like `WnWrapAlarmConsumer`, it owns its own MXAccess server object so the subtag source is self-contained and isolated from the session's item pipeline. Logic stays thin (advise/write/marshal); real verification is the live smoke test in Task 17. + +**Files:** +- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs` +- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs` (constructor/guard tests only; COM path is live-gated) + +**Step 1: Implement `LmxSubtagAlarmSource : ISubtagAlarmSource`.** +- Own an `LMXProxyServerClass` (reuse the worker's `IMxAccessServer`/`MxAccessComServer` wrapper + `IMxAccessComObjectFactory` so it is fakeable; constructor takes the factory). +- `Advise(addresses)`: `RegisterServer` (topic) once; per address `AddItem`→`itemHandle`, `Advise`, and record `itemHandle→address`. Subscribe to the proxy's `OnDataChange`; in the handler, look up the address by `phItemHandle`, normalize `pvItemValue` (VARIANT→bool/double) and `pftItemTimeStamp`→UTC, and raise `ValueChanged`. All calls run on the STA (the worker STA pumps messages, so `OnDataChange` delivers). +- `Write(address, value)`: resolve/create the item handle, `server.Write(serverHandle, itemHandle, value, userId: 0)`. +- `Dispose()`: `UnAdvise`/`RemoveItem`/`Unregister`/release COM. + +**Step 2: Tests** — only the non-COM guards (null factory throws; `Write` before `Advise` resolves a handle or throws a clear error). Mark the COM round-trip `[LiveMxAccessFact]` and `Skip` per the `AlarmsLiveSmokeTests` precedent. + +**Step 3: Build x86 + run unit tests.** +`dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86` +`dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~LmxSubtagAlarmSourceTests` + +**Step 4: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs \ + src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs +git commit -m "worker(alarms): COM-backed LmxSubtagAlarmSource advising alarm subtags" +``` + +--- + +### Task 7: `FailoverAlarmConsumer` state machine + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Task 5) + +**Files:** +- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs` +- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs` (small EventArgs) +- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs` + +**Step 1: Test the switch/failback with a fake primary that throws.** + +```csharp +public sealed class FailoverAlarmConsumerTests +{ + private sealed class FlakyPrimary : IMxAccessAlarmConsumer + { + public event EventHandler? AlarmTransitionEmitted; + public int PollsUntilHeal = int.MaxValue; // becomes healthy after N polls while degraded + public bool ThrowOnPoll = true; + private int _polls; + public void Subscribe(string s) { if (ThrowOnPoll) throw new COMException("boom", unchecked((int)0x80004005)); } + public void PollOnce() + { + _polls++; + if (ThrowOnPoll && _polls < PollsUntilHeal) throw new COMException("boom", unchecked((int)0x80004005)); + } + public int AcknowledgeByGuid(Guid g, string c, string a, string b, string d, string e) => 0; + public int AcknowledgeByName(string n, string p, string gr, string c, string a, string b, string d, string e) => 0; + public IReadOnlyList SnapshotActiveAlarms() => Array.Empty(); + public void Dispose() { } + } + + private sealed class StubStandby : IMxAccessAlarmConsumer { /* records Subscribe, no-op rest */ } + + [Fact] + public void Primary_FailsThresholdTimes_SwitchesToSubtagAndEmitsModeChange() + { + var primary = new FlakyPrimary(); + var standby = new StubStandby(); + using var c = new FailoverAlarmConsumer(primary, standby, + new FailoverSettings(threshold: 3, probeIntervalSeconds: 30, stableProbes: 3)); + AlarmProviderModeChange? change = null; + c.ProviderModeChanged += (_, e) => change = e; + + c.Subscribe("\\\\host\\Galaxy!Area"); // primary.Subscribe throws -> counts as failure 1 + c.PollOnce(); // failure 2 + c.PollOnce(); // failure 3 -> switch + + Assert.NotNull(change); + Assert.Equal(AlarmProviderMode.Subtag, change!.Mode); + } + + [Fact] + public void WhileDegraded_PrimaryHeals_FailsBackAfterStableProbes() + { + var primary = new FlakyPrimary { PollsUntilHeal = 0 }; // will heal once we stop throwing + var standby = new StubStandby(); + using var c = new FailoverAlarmConsumer(primary, standby, + new FailoverSettings(threshold: 1, probeIntervalSeconds: 0, stableProbes: 2)); + var modes = new List(); + c.ProviderModeChanged += (_, e) => modes.Add(e.Mode); + + c.Subscribe("x"); // failure -> switch to subtag + primary.ThrowOnPoll = false; + c.ProbeOnce(); // clean probe 1 + c.ProbeOnce(); // clean probe 2 -> failback + + Assert.Equal(AlarmProviderMode.Subtag, modes[0]); + Assert.Equal(AlarmProviderMode.Alarmmgr, modes[^1]); + } +} +``` + +**Step 2: Implement.** +- `record FailoverSettings(int threshold, int probeIntervalSeconds, int stableProbes)`; `AlarmProviderModeChange : EventArgs { AlarmProviderMode Mode; string Reason; int HResult; DateTime AtUtc; }`. +- Constructor `(IMxAccessAlarmConsumer primary, IMxAccessAlarmConsumer standby, FailoverSettings settings)`; forced-mode variants handled in Task 9 wiring (forced ⇒ skip the other consumer). +- Forward `AlarmTransitionEmitted` from the **active** child only (swap the subscription on switch). +- Wrap `Subscribe`/`PollOnce` on the primary: on `COMException` (or a failure HRESULT) while `PrimaryActive`, increment a counter; at `threshold`, ensure standby `Subscribe`d, set active=standby, snapshot standby for hand-off, raise `ProviderModeChanged(Subtag, reason, hresult, now)`. Reset counter on any clean primary poll. +- `ProbeOnce()` (driven by the poll loop while degraded, gated by `probeIntervalSeconds`): try primary `Subscribe`+`PollOnce`; count consecutive clean probes; at `stableProbes`, set active=primary, return standby to standby, raise `ProviderModeChanged(Alarmmgr, "recovered", 0, now)`. +- `Acknowledge*` / `SnapshotActiveAlarms` delegate to the **active** child. +- `PollOnce()` drives the active child's poll, and—while degraded—also drives the failback probe cadence. + +**Step 3: Run green** (x86 filter `FailoverAlarmConsumerTests`). + +**Step 4: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs \ + src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs \ + src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs +git commit -m "worker(alarms): FailoverAlarmConsumer auto-failover/failback state machine" +``` + +--- + +### Task 8: Synthetic-GUID helper + degraded flag on the event sink path + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 9 + +Carry `degraded` + `source_provider` from the worker synthesis into the emitted `OnAlarmTransitionEvent`. + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs` (add `bool Degraded`) +- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` (`EnqueueTransition` carries degraded) +- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs` (`CreateOnAlarmTransition` sets `Degraded`/`SourceProvider`) +- Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs` +- Test: add cases to `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmDispatcherTests.cs` and a new `SyntheticAlarmGuidTests.cs` + +**Step 1: `SyntheticAlarmGuid.ForReference(string reference)`** — deterministic GUID from a stable hash (e.g. MD5 of the UTF-8 reference → `new Guid(bytes)`), so subtag-mode acks resolve by GUID. Test determinism + difference: + +```csharp +[Fact] public void SameReference_SameGuid() => + Assert.Equal(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.C")); +[Fact] public void DifferentReference_DifferentGuid() => + Assert.NotEqual(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.D")); +``` + +**Step 2: Thread `degraded`** through `MxAlarmSnapshotRecord.Degraded`, `EnqueueTransition(... bool degraded)`, and `CreateOnAlarmTransition(... bool degraded, AlarmProviderMode sourceProvider)`. Default `degraded=false`, `sourceProvider=Alarmmgr` so the wnwrap path is unchanged (regression: existing `AlarmDispatcherTests` still pass with `Degraded=false`). + +**Step 3: Tests** — extend `AlarmDispatcherTests` with a subtag-style transition asserting `body.Degraded == true` and `SourceProvider == Subtag`. + +**Step 4: Build x86 + run** worker tests for `AlarmDispatcherTests`, `SyntheticAlarmGuidTests`. + +**Step 5: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs \ + src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs \ + src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs \ + src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs \ + src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/ +git commit -m "worker(alarms): synthetic GUID + degraded provenance on emitted transitions" +``` + +--- + +### Task 9: Wire watch-list + failover config through `AlarmCommandHandler`; emit mode-changed event + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Tasks 5, 7, 8) + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmCommandHandler.cs` +- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IAlarmCommandHandler.cs` +- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs` (`ExecuteSubscribeAlarms`, ~lines 588-616) +- Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessStaSession.cs` (consumer factory wiring; mode-change → event queue) +- Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs` (extend or create) + +**Step 1: Carry the subscribe payload.** Change the alarm subscribe entry point from `Subscribe(string subscription)` to `Subscribe(SubscribeAlarmsCommand command)` (the command now has `ForcedMode`, `WatchList`, `Failover`). In `AlarmCommandHandler.Subscribe`: +- Build the active provider per `ForcedMode`: + - `ALARMMGR` ⇒ `WnWrapAlarmConsumer` only. + - `SUBTAG` ⇒ `SubtagAlarmConsumer(new LmxSubtagAlarmSource(factory), watchList)` only. + - `UNSPECIFIED` ⇒ `FailoverAlarmConsumer(primary: wnwrap, standby: subtag, settings-from-Failover)`. +- Use the existing `consumerFactory` seam but widen it to `Func` so tests inject fakes and production builds the failover composite. Subscribe to `FailoverAlarmConsumer.ProviderModeChanged` and enqueue an `OnAlarmProviderModeChangedEvent` MxEvent via the event queue (new mapper method `CreateOnAlarmProviderModeChanged`). + +**Step 2: Executor + STA wiring.** `ExecuteSubscribeAlarms` passes the full `SubscribeAlarmsCommand` (not just the expression). In `MxAccessStaSession`, the `alarmCommandHandlerFactory` must give the handler access to the `IMxAccessComObjectFactory` so the subtag source can create its own proxy server on the STA; keep the `EnsureOnAlarmConsumerThread` affinity guard on every path. + +**Step 3: Test** — fake consumer factory; assert that a `SUBTAG` forced command builds the subtag consumer and advises; that an auto command building a fake failover composite, when it raises `ProviderModeChanged`, enqueues an `OnAlarmProviderModeChangedEvent` on the queue. + +**Step 4: Build x86 + worker tests.** + +**Step 5: Commit.** +```bash +git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/ +git commit -m "worker(alarms): route watch-list/failover config; emit provider-mode-changed event" +``` + +--- + +## Phase 2 — Gateway: discovery, options, monitor, metrics, dashboard + +### Task 10: `AlarmsOptions.Fallback` + validation + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 11, Task 13 + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmsOptions.cs` +- Create: `src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmFallbackOptions.cs` +- Modify: `src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayOptionsValidator.cs` (`ValidateAlarms`, ~lines 234-258) +- Test: `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` (extend) + +**Step 1:** Add `AlarmFallbackOptions Fallback { get; init; } = new();` to `AlarmsOptions`. `AlarmFallbackOptions`: `string Mode = "Auto"` (`Auto|ForceAlarmManager|ForceSubtag`), `int ConsecutiveFailureThreshold = 3`, `int FailbackProbeIntervalSeconds = 30`, `int FailbackStableProbes = 3`, a `Discovery` sub-object (`bool UseGalaxyRepository = true`, `string Area = ""`, `string[] IncludeAttributes = []`, `string[] ExcludeAttributes = []`), and a `Subtags` sub-object (`Active="active"`, `Acked="acked"`, `AckComment=""`, `Priority="priority"`). + +**Step 2:** In `ValidateAlarms`, when `Enabled` and `Mode == "ForceSubtag"` and `Discovery.UseGalaxyRepository == false` and `IncludeAttributes` empty ⇒ add a validation error ("ForceSubtag requires Galaxy Repository discovery or an explicit IncludeAttributes list"). Floor the three numeric values at 1. Validate `Mode` is one of the three literals. + +**Step 3-5:** Test the new validation cases (red→green), build the server, commit. + +--- + +### Task 11: Galaxy Repository "alarm attributes" discovery query + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 10, Task 13 + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyRepository.cs` (add `GetAlarmAttributesAsync` + SQL constant, following `GetAttributesAsync` ~lines 86-115 and `AttributesSql` ~line 176) +- Modify: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/IGalaxyRepository.cs` +- Create: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyAlarmAttributeRow.cs` +- Test: `src/ZB.MOM.WW.MxGateway.Tests/Galaxy/` (projection unit test; live SQL gated) + +**Step 1:** `GalaxyAlarmAttributeRow { string FullTagReference; string SourceObjectReference; string AckCommentSubtag; }` (and any priority subtag). `GetAlarmAttributesAsync` reuses the existing `is_alarm` detection (the `AlarmExtension` primitive join already in `AttributesSql`) filtered to `is_alarm = 1`, projecting the alarm reference + its ack-comment attribute. Follow the exact `SqlConnection`/`SqlCommand`/`SqlDataReader` pattern from `GetAttributesAsync`. + +**Step 2:** Unit-test the row→`AlarmSubtagTarget` mapping (a pure mapper function); gate any live-DB test like the existing Galaxy live tests (or `Skip` with a note, matching `AlarmsLiveSmokeTests`). + +**Step 3-5:** red→green, build server, commit. + +--- + +### Task 12: Watch-list resolver (GR SQL + config override → `AlarmSubtagTarget[]`) + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** none (depends on Tasks 10, 11) + +**Files:** +- Create: `src/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmWatchListResolver.cs` +- Create: `src/ZB.MOM.WW.MxGateway.Server/Alarms/IAlarmWatchListResolver.cs` +- Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmWatchListResolverTests.cs` + +**Step 1: Test the merge** with a fake `IGalaxyRepository`: +- discovery rows + `IncludeAttributes` are unioned; `ExcludeAttributes` removed; each becomes an `AlarmSubtagTarget` with `.active`/`.acked`/`.ackmsg` addresses composed from the configured `Subtags` names (`.`, etc.); empty config subtag names fall back to defaults; GR unavailable + no includes ⇒ empty list + a logged warning flag. + +**Step 2: Implement** `ResolveAsync(AlarmsOptions, CancellationToken) → IReadOnlyList`. + +**Step 3-5:** red→green, build, commit. + +--- + +### Task 13: Gateway metrics — provider-mode gauge + switch counter + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** Task 10, Task 11 + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs` (ctor ~lines 55-79; add counter + observable gauge following the existing pattern) +- Test: `src/ZB.MOM.WW.MxGateway.Tests/Metrics/GatewayMetricsTests.cs` (if present; else assert via a `MeterListener`) + +**Step 1:** Add `mxgateway.alarms.provider_switches` counter (tagged `from`,`to`,`reason`) and `mxgateway.alarms.provider_mode` observable gauge (1=alarmmgr, 2=subtag), plus `AlarmProviderSwitched(int from, int to, string reason)` and a private `GetAlarmProviderMode()` (lock on `_syncRoot` like the others). + +**Step 2-4:** test, build, commit. + +--- + +### Task 14: `GatewayAlarmMonitor` — arm watch-list, reflect provider mode, reconcile on switch + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Tasks 9, 12, 13) + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cs` (ctor ~41-49; `SubscribeAlarmsAsync` ~210-233; event-drain loop; `StreamAsync` ~386-434) +- Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/GatewayAlarmMonitorProviderModeTests.cs` (new, using `FakeWorkerHarness`) + +**Step 1:** Inject `IAlarmWatchListResolver` and `GatewayMetrics`. In `SubscribeAlarmsAsync`, resolve the watch-list and build the `SubscribeAlarmsCommand` with `ForcedMode` (from `Fallback.Mode`), `WatchList`, and `Failover` populated from options — instead of the bare `{ SubscriptionExpression }`. + +**Step 2:** In the worker-event drain path, handle `OnAlarmProviderModeChangedEvent`: update a `_providerStatus` field (mode/degraded/reason/since), `Broadcast(new AlarmFeedMessage { ProviderStatus = … })` to every subscriber, call `metrics.AlarmProviderSwitched(...)`, and force a `ReconcileAsync` so the cache re-seeds from the now-active provider (avoids raise/clear storms). + +**Step 3:** In `StreamAsync`, emit the current `provider_status` as the **first** message (before the snapshot) so a late joiner immediately knows the mode. + +**Step 4: Test** — stand up the monitor with `FakeWorkerHarness`; emit an `OnAlarmProviderModeChangedEvent(Subtag)`; assert a `StreamAsync` subscriber receives a `ProviderStatus{ Mode=Subtag, Degraded=true }` and that the switch counter incremented. Also assert a transition emitted in subtag mode flows through with `Degraded=true`. + +**Step 5:** build server, run the new test, commit. + +--- + +### Task 15: Dashboard — push provider status to `/hubs/alarms` + UI indicator + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Task 14) + +**Files:** +- Modify: `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/AlarmsHubPublisher.cs` (forward `ProviderStatus` messages — they already flow through `StreamAsync`, so confirm the existing `SendAsync(AlarmMessage, message)` carries them; add a dedicated `"ProviderModeChanged"` client method if the dashboard needs a distinct channel) +- Modify: the alarms dashboard page/component (Bootstrap-only badge: green "alarmmgr" / amber "degraded — subtag") — find under `src/ZB.MOM.WW.MxGateway.Server/Dashboard/` +- Test: `src/ZB.MOM.WW.MxGateway.Tests/` dashboard model test (e.g. a `DashboardAlarmProviderStatus.FromFeed` mapper, mirroring `DashboardActiveAlarm.FromSnapshot`) + +**Constraint:** Bootstrap CSS/JS only — no MudBlazor/Radzen/FluentUI. + +**Steps:** TDD the model mapper, wire the publisher + badge, build, commit. + +--- + +## Phase 3 — Integration, docs, live smoke + +### Task 16: End-to-end fake-worker failover test + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 18 + +**Files:** +- Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmFailoverEndToEndTests.cs` + +Drive the full gateway path with `FakeWorkerHarness`: subscribe (assert the `SubscribeAlarmsCommand` carries a watch-list), emit a wnwrap-style transition (assert `Degraded=false`), emit `OnAlarmProviderModeChangedEvent(Subtag)`, emit a synthesized transition (assert `Degraded=true`, `SourceProvider=Subtag`), then `OnAlarmProviderModeChangedEvent(Alarmmgr)` and assert the feed reports recovery. Build, run, commit. + +--- + +### Task 17: Live subtag smoke test (opt-in) + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 18 + +**Files:** +- Test: `src/ZB.MOM.WW.MxGateway.IntegrationTests/...AlarmSubtagLiveSmokeTests.cs` (or the worker live suite) + +A `[LiveMxAccessFact]`, `Skip`-by-default test (per `AlarmsLiveSmokeTests` precedent) that, against a live Galaxy + alarm flip script: advises the real `.active`/`.acked` subtags via `LmxSubtagAlarmSource`, asserts a synthesized raise/clear, and performs an ack via the ack-comment write. Document the exact subtag names discovered (resolves the design's open item). Commit. + +--- + +### Task 18: Documentation + +**Classification:** trivial +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 16, Task 17 + +**Files:** +- Modify: `gateway.md` (alarm provider section: dual provider + auto-failover/failback) +- Modify: `docs/DesignDecisions.md` (record the fallback decision + parity rationale) +- Modify: `docs/GatewayConfiguration.md` (the `MxGateway:Alarms:Fallback` block) +- Modify: `docs/AlarmClientDiscovery.md` (subtag provider, synthesis rules, ack-comment write) +- Modify: `docs/Grpc.md` (new `provider_status` feed case + `degraded`/`source_provider` fields) + +Follow `StyleGuide.md` (PascalCase filenames, present tense, explain *why*). No code; commit. + +--- + +## Execution order & parallelism summary + +- **Serial spine:** 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8/9 → 10/11 → 12 → 13 → 14 → 15 → 16 → 17/18. +- **Parallelizable clusters:** {8, 9 partially}, {10, 11, 13}, {16, 17, 18}. +- **High-risk tasks** (full review chain): 1, 2, 6, 7, 9, 14. **Standard:** 4, 5, 8, 10, 11, 12, 15, 16. **Small/trivial:** 3, 13, 17, 18. + +## Risk notes for the executor + +- **Field-number collisions:** Task 2 must read the live `MxEvent`/`MxEventFamily` numbers before adding — the agent map gave alarm-payload maxima but not `MxEvent`'s. Verify before editing. +- **STA discipline:** every COM call in `LmxSubtagAlarmSource` and every consumer swap runs on the worker STA; keep the `EnsureOnAlarmConsumerThread` guard. The worker STA already pumps Windows messages, which is required for the subtag `OnDataChange` to deliver. +- **Parity regression:** alarmmgr-mode output must be byte-for-byte unchanged. Existing `AlarmDispatcherTests` and `ProtobufContractRoundTripTests` are the guardrail — they must stay green with `Degraded=false` defaults. +- **Subtag names unverified:** the design leaves exact AVEVA subtag names (`.active`, `.acked`, ack-comment) to confirm against `C:\Users\dohertj2\Desktop\mxaccess` + a live Galaxy (Task 17). The config `Subtags` block exists so names are not hard-coded. diff --git a/docs/plans/2026-06-13-alarm-subtag-fallback.md.tasks.json b/docs/plans/2026-06-13-alarm-subtag-fallback.md.tasks.json new file mode 100644 index 0000000..c1378c9 --- /dev/null +++ b/docs/plans/2026-06-13-alarm-subtag-fallback.md.tasks.json @@ -0,0 +1,24 @@ +{ + "planPath": "docs/plans/2026-06-13-alarm-subtag-fallback.md", + "tasks": [ + {"id": 54, "subject": "Task 1: Worker proto — watch-list, failover config, AlarmProviderMode", "status": "pending"}, + {"id": 55, "subject": "Task 2: Gateway proto — provider status, degraded provenance, mode-changed event", "status": "pending", "blockedBy": [54]}, + {"id": 56, "subject": "Task 3: Proto round-trip tests for new alarm fields", "status": "pending", "blockedBy": [54, 55]}, + {"id": 57, "subject": "Task 4: Subtag value-source abstraction + synthesis state machine", "status": "pending", "blockedBy": [54]}, + {"id": 58, "subject": "Task 5: SubtagAlarmConsumer over the source seam", "status": "pending", "blockedBy": [57]}, + {"id": 59, "subject": "Task 6: COM-backed LmxSubtagAlarmSource", "status": "pending", "blockedBy": [57]}, + {"id": 60, "subject": "Task 7: FailoverAlarmConsumer state machine", "status": "pending", "blockedBy": [58]}, + {"id": 61, "subject": "Task 8: Synthetic GUID + degraded flag on event sink path", "status": "pending", "blockedBy": [55]}, + {"id": 62, "subject": "Task 9: Wire watch-list/failover through AlarmCommandHandler; emit mode-changed", "status": "pending", "blockedBy": [58, 60, 61]}, + {"id": 63, "subject": "Task 10: AlarmsOptions.Fallback + validation", "status": "pending"}, + {"id": 64, "subject": "Task 11: Galaxy Repository alarm-attributes discovery query", "status": "pending"}, + {"id": 65, "subject": "Task 12: Watch-list resolver (GR SQL + config override)", "status": "pending", "blockedBy": [54, 63, 64]}, + {"id": 66, "subject": "Task 13: Metrics — provider-mode gauge + switch counter", "status": "pending"}, + {"id": 67, "subject": "Task 14: GatewayAlarmMonitor — arm watch-list, reflect mode, reconcile on switch", "status": "pending", "blockedBy": [55, 62, 65, 66]}, + {"id": 68, "subject": "Task 15: Dashboard — push provider status + UI badge", "status": "pending", "blockedBy": [67]}, + {"id": 69, "subject": "Task 16: End-to-end fake-worker failover test", "status": "pending", "blockedBy": [67]}, + {"id": 70, "subject": "Task 17: Live subtag smoke test (opt-in)", "status": "pending", "blockedBy": [59, 62]}, + {"id": 71, "subject": "Task 18: Documentation", "status": "pending", "blockedBy": [67]} + ], + "lastUpdated": "2026-06-13T12:40:00Z" +}