# Alarm Subtag-Monitoring Fallback — Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task. **Goal:** Add a second alarm source — direct MXAccess subtag monitoring — that the gateway auto-fails-over to when the wnwrap alarmmgr provider breaks, auto-fails-back to when it recovers, and can be forced on by config. **Architecture:** Worker-side synthesis (parity rule preserved). A new `SubtagAlarmConsumer` (own `LMXProxyServerClass`, `AddItem`/`Advise` on alarm subtags) and a `FailoverAlarmConsumer` composite (state machine over the wnwrap primary + subtag standby) both implement the existing `IMxAccessAlarmConsumer` seam. The gateway resolves the subtag watch-list (Galaxy Repository SQL + config override), arms the worker at subscribe time, and reflects the live provider mode into the gRPC alarm feed, the dashboard hub, and metrics. **Tech Stack:** .NET 10 (gateway, x64) + .NET Framework 4.8 (worker, x86, STA), protobuf/gRPC, `Microsoft.Data.SqlClient` (Galaxy Repository), SignalR (dashboard), `System.Diagnostics.Metrics`, xUnit (plain `Assert`, no FluentAssertions). **Design source:** `docs/plans/2026-06-13-alarm-subtag-fallback-design.md` **Branch:** `feat/alarm-subtag-fallback` (already created) --- ## Conventions for every task - **TDD:** write the failing test, run it red, implement, run it green, commit. - **xUnit, plain `Assert.*`**, naming `Subject_Condition_Expected`. Worker fakes are sealed private nested classes that raise events. - **Build/test commands:** - Contracts regen: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` - Gateway: `dotnet build src/ZB.MOM.WW.MxGateway.Server` ; `dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj` - Worker (x86): `dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86` ; `dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86` - Single test: append `--filter FullyQualifiedName~` - **Build is strict:** `TreatWarningsAsErrors=true`, nullable enabled. Add XML doc comments on public members (the repo runs a doc checker). - **Generated code** under `Generated/` is never hand-edited — rebuild the contracts project to regenerate. - **Namespaces:** worker MxAccess types live in `ZB.MOM.WW.MxGateway.Worker.MxAccess`; proto C# types in `ZB.MOM.WW.MxGateway.Contracts.Proto`. --- ## Phase 0 — Contracts ### Task 1: Worker proto — subtag watch-list, failover config, provider-mode enum **Classification:** high-risk **Estimated implement time:** ~4 min **Parallelizable with:** none (Task 2 imports these types) **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto` (real `SubscribeAlarmsCommand` at ~line 324; `MxCommand` references it at 123-125) > **CORRECTION (execution):** The alarm command messages and `MxCommand` live in **`mxaccess_gateway.proto`**, not the worker proto. `mxaccess_worker.proto` *imports* the gateway proto (`WorkerCommand.command` is `mxaccess_gateway.v1.MxCommand`), so the gateway proto is the base and the worker proto needs **no** change. `AlarmProviderMode` and the new types are added to the gateway proto and are visible to worker code as `mxaccess_gateway.v1` types. Tasks 1 and 2 are executed as a single combined edit on this one file. **Step 1: Add the enum and messages.** In `mxaccess_gateway.proto`, extend the existing `SubscribeAlarmsCommand` message (line 324) and add the new types after it: ```protobuf // Provider selection / current provider for the alarm feed. Defined here in // the worker contract because the worker SubscribeAlarmsCommand references it; // mxaccess_gateway.proto imports this file and reuses the same enum. enum AlarmProviderMode { ALARM_PROVIDER_MODE_UNSPECIFIED = 0; // auto: alarmmgr primary, subtag fallback ALARM_PROVIDER_MODE_ALARMMGR = 1; ALARM_PROVIDER_MODE_SUBTAG = 2; } message SubscribeAlarmsCommand { string subscription_expression = 1; // existing field — keep // UNSPECIFIED = auto-failover/failback. ALARMMGR/SUBTAG force one provider. AlarmProviderMode forced_mode = 2; // Subtag watch-list resolved by the gateway (GR SQL + config). Empty in pure // alarmmgr mode; in subtag mode it bounds what the consumer can observe. repeated AlarmSubtagTarget watch_list = 3; AlarmFailoverConfig failover = 4; } // One alarm attribute the subtag consumer advises. Addresses are full MXAccess // item references the worker passes straight to AddItem. message AlarmSubtagTarget { string alarm_full_reference = 1; // e.g. "Galaxy!Area.Tank01.Level.HiHi" string source_object_reference = 2; // e.g. "Tank01" string active_subtag = 3; // item address of the in-alarm boolean string acked_subtag = 4; // item address of the acknowledged boolean string ack_comment_subtag = 5; // writable ack-comment attribute (ack write target) string priority_subtag = 6; // optional severity source; empty if absent } message AlarmFailoverConfig { int32 consecutive_failure_threshold = 1; // wnwrap COM failures before switching (>=1) int32 failback_probe_interval_seconds = 2; // probe cadence while degraded (>=1) int32 failback_stable_probes = 3; // clean probes before switching back (>=1) } ``` `UnsubscribeAlarmsCommand` and `AcknowledgeAlarmCommand` are unchanged. **Step 2: Regenerate & verify it compiles.** Run: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` Expected: build succeeds; generated `AlarmProviderMode`, `AlarmSubtagTarget`, `AlarmFailoverConfig` types appear. **Step 3: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto git commit -m "contracts(worker): subtag watch-list + failover config + AlarmProviderMode" ``` --- ### Task 2: Gateway proto — provider status on the feed, degraded provenance, mode-changed event **Classification:** high-risk **Estimated implement time:** ~5 min **Parallelizable with:** none (depends on Task 1; Task 3 tests both) **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto` (`OnAlarmTransitionEvent` ~719-771, `ActiveAlarmSnapshot` ~783-803, `AlarmFeedMessage` ~860-870, `MxEvent` family enum + body oneof, `MxEventFamily` enum) **Step 1: Add degraded provenance to the two alarm payloads.** Append to `OnAlarmTransitionEvent` (next free field 14): ```protobuf // True when this transition came from the subtag-monitoring fallback rather // than the native alarmmgr provider — i.e. it was synthesized from data // changes and carries reduced fidelity (synthetic GUID, no native raise time). bool degraded = 14; // Which provider produced this transition. AlarmProviderMode source_provider = 15; ``` Append the identical two fields to `ActiveAlarmSnapshot` (next free field 14): ```protobuf bool degraded = 14; AlarmProviderMode source_provider = 15; ``` **Step 2: Add provider status to the feed oneof.** Add a new oneof case to `AlarmFeedMessage` (next free field 4) and a new message: ```protobuf message AlarmFeedMessage { oneof payload { ActiveAlarmSnapshot active_alarm = 1; bool snapshot_complete = 2; OnAlarmTransitionEvent transition = 3; // Provider-mode status. Emitted once on stream open and again on every // failover/failback so late joiners learn the current mode immediately. AlarmProviderStatus provider_status = 4; } } message AlarmProviderStatus { AlarmProviderMode mode = 1; bool degraded = 2; // true whenever mode == SUBTAG string reason = 3; // human-readable switch reason google.protobuf.Timestamp since = 4; } ``` **Step 3: Add the worker→gateway mode-changed event to `MxEvent`.** Find the `MxEventFamily` enum and the `MxEvent` body oneof. Add a family member and a body message + oneof case (use the next free family value and the next free `MxEvent` body field number — check the file): ```protobuf // in MxEventFamily enum: MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED = ; // new message near OnAlarmTransitionEvent: message OnAlarmProviderModeChangedEvent { AlarmProviderMode mode = 1; string reason = 2; int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback google.protobuf.Timestamp at = 4; } // in MxEvent body oneof: OnAlarmProviderModeChangedEvent on_alarm_provider_mode_changed = ; ``` `AlarmProviderMode` is defined in `mxaccess_worker.proto`; confirm `mxaccess_gateway.proto` already has `import "mxaccess_worker.proto";` (it references `SubscribeAlarmsCommand`, so it does) and reference the enum unqualified or via its package as the existing references do. **Step 4: Regenerate & verify.** Run: `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` Expected: build succeeds. **Step 5: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto git commit -m "contracts(gateway): AlarmProviderStatus feed case, degraded provenance, mode-changed event" ``` --- ### Task 3: Proto round-trip tests for the new alarm fields **Classification:** small **Estimated implement time:** ~3 min **Parallelizable with:** none (depends on Tasks 1-2) **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` **Step 1: Add tests** mirroring the existing `Event_RoundTripsOnAlarmTransitionWithFullPayload` style: ```csharp [Fact] public void Feed_RoundTripsProviderStatus() { var since = Timestamp.FromDateTime(new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc)); var original = new AlarmFeedMessage { ProviderStatus = new AlarmProviderStatus { Mode = AlarmProviderMode.Subtag, Degraded = true, Reason = "wnwrap poll failed 3x (HRESULT 0x80004005)", Since = since, }, }; var parsed = AlarmFeedMessage.Parser.ParseFrom(original.ToByteArray()); Assert.Equal(original, parsed); Assert.Equal(AlarmFeedMessage.PayloadOneofCase.ProviderStatus, parsed.PayloadCase); Assert.True(parsed.ProviderStatus.Degraded); Assert.Equal(AlarmProviderMode.Subtag, parsed.ProviderStatus.Mode); } [Fact] public void Transition_RoundTripsDegradedProvenance() { var t = new OnAlarmTransitionEvent { AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi", TransitionKind = AlarmTransitionKind.Raise, Degraded = true, SourceProvider = AlarmProviderMode.Subtag, }; var parsed = OnAlarmTransitionEvent.Parser.ParseFrom(t.ToByteArray()); Assert.True(parsed.Degraded); Assert.Equal(AlarmProviderMode.Subtag, parsed.SourceProvider); } ``` **Step 2: Run red→green.** Run: `dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ProtobufContractRoundTripTests` Expected: PASS. **Step 3: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs git commit -m "test(contracts): round-trip provider status + degraded provenance" ``` --- ## Phase 1 — Worker: subtag consumer + failover ### Task 4: Subtag value-source abstraction + synthesis state holder **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** none (Task 5 builds on it) A testable seam so synthesis logic is unit-tested without COM. The COM wiring lands in Task 6. **Files:** - Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs` - Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs` - Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs` **Step 1: Define the source abstraction.** `ISubtagAlarmSource` advises subtag addresses and raises a normalized value-change callback on the STA: ```csharp namespace ZB.MOM.WW.MxGateway.Worker.MxAccess; /// A change in one advised subtag value, normalized off the COM boundary. public sealed class SubtagValueChange { /// The full item address that changed (matches an AlarmSubtagTarget subtag). public string ItemAddress { get; init; } = string.Empty; /// The new value (boolean for .active/.acked, numeric for priority). public object? Value { get; init; } /// The change timestamp in UTC. public DateTime TimestampUtc { get; init; } } /// /// Advises a set of MXAccess subtag addresses and surfaces value changes. /// The production implementation (Task 6) owns its own LMXProxyServerClass; /// tests substitute a fake that pushes s. /// public interface ISubtagAlarmSource : IDisposable { /// Raised on the STA when an advised subtag's value changes. event EventHandler? ValueChanged; /// Advises every subtag in the supplied addresses; idempotent per address. void Advise(IReadOnlyCollection itemAddresses); /// Writes a value to an item address (used for the ack-comment write). void Write(string itemAddress, object? value); } ``` **Step 2: Write the state-machine tests first.** `SubtagAlarmStateMachine` maps `(active, acked)` changes per target to `MxAlarmTransitionEvent`s. Test the four core transitions: ```csharp namespace ZB.MOM.WW.MxGateway.Worker.Tests.MxAccess; public sealed class SubtagAlarmStateMachineTests { private static AlarmSubtagTarget Target() => new() { AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi", SourceObjectReference = "Tank01", ActiveSubtag = "Tank01.Level.HiHi.active", AckedSubtag = "Tank01.Level.HiHi.acked", AckCommentSubtag = "Tank01.Level.HiHi.ackmsg", }; [Fact] public void ActiveFalseToTrue_EmitsRaise_FlaggedDegraded() { var sm = new SubtagAlarmStateMachine(new[] { Target() }); var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); var events = sm.Apply("Tank01.Level.HiHi.active", true, ts); var e = Assert.Single(events); Assert.Equal(MxAlarmStateKind.UnackAlm, e.Record.State); Assert.Equal(MxAlarmStateKind.Unspecified, e.PreviousState); Assert.Equal("Tank01.Level.HiHi", e.Record.TagName); // reference minus provider/area } [Fact] public void AckedTrueWhileActive_EmitsAckTransition() { var sm = new SubtagAlarmStateMachine(new[] { Target() }); var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); sm.Apply("Tank01.Level.HiHi.active", true, ts); var events = sm.Apply("Tank01.Level.HiHi.acked", true, ts.AddSeconds(5)); var e = Assert.Single(events); Assert.Equal(MxAlarmStateKind.AckAlm, e.Record.State); Assert.Equal(MxAlarmStateKind.UnackAlm, e.PreviousState); } [Fact] public void ActiveTrueToFalse_WhileUnacked_EmitsUnackRtn() { var sm = new SubtagAlarmStateMachine(new[] { Target() }); var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); sm.Apply("Tank01.Level.HiHi.active", true, ts); var events = sm.Apply("Tank01.Level.HiHi.active", false, ts.AddSeconds(10)); var e = Assert.Single(events); Assert.Equal(MxAlarmStateKind.UnackRtn, e.Record.State); } [Fact] public void Snapshot_ReflectsActiveAndAckedState() { var sm = new SubtagAlarmStateMachine(new[] { Target() }); var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc); sm.Apply("Tank01.Level.HiHi.active", true, ts); sm.Apply("Tank01.Level.HiHi.acked", true, ts); var snap = Assert.Single(sm.SnapshotActive()); Assert.Equal(MxAlarmStateKind.AckAlm, snap.State); } } ``` Run: `dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmStateMachineTests` → FAIL (type missing). **Step 3: Implement `SubtagAlarmStateMachine`.** Build an address→target index (active/acked/priority/comment addresses), hold per-reference `(bool active, bool acked, DateTime firstRaiseUtc, int priority)`, and emit on change: - active `false→true` ⇒ `UnackAlm`, set `firstRaiseUtc`, `PreviousState` from prior state. - acked `false→true` while active ⇒ `AckAlm`. - active `true→false` ⇒ `AckRtn` if currently acked else `UnackRtn`; then reset acked. - priority change ⇒ update stored priority, no transition. - `TagName` = `alarm_full_reference` with any `Provider!Area.` prefix stripped (match `WnWrapAlarmConsumer`'s reference shape so `GatewayAlarmMonitor` keys align). Set `ProviderName`, `Group`, `Priority`, `AlarmComment` from the target/last values. Mark a `Degraded`/source flag (carried via a new field — see Task 5 wiring). - `SnapshotActive()` returns `MxAlarmSnapshotRecord` for references whose active is true. **Step 4: Run green.** Expected: PASS. **Step 5: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs \ src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs \ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs git commit -m "worker(alarms): subtag value-source seam + synthesis state machine" ``` --- ### Task 5: `SubtagAlarmConsumer` over the source seam (no COM yet) **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** none (depends on Task 4) **Files:** - Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs` - Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs` **Step 1: Test with a fake `ISubtagAlarmSource`.** Drive value changes through the source, assert `AlarmTransitionEmitted` fires with synthesized records and that ack writes the comment to the ack-comment subtag: ```csharp public sealed class SubtagAlarmConsumerTests { private sealed class FakeSource : ISubtagAlarmSource { public event EventHandler? ValueChanged; public List Advised { get; } = new(); public (string Address, object? Value)? LastWrite { get; private set; } public void Advise(IReadOnlyCollection a) => Advised.AddRange(a); public void Write(string a, object? v) => LastWrite = (a, v); public void Raise(string addr, object? val, DateTime ts) => ValueChanged?.Invoke(this, new SubtagValueChange { ItemAddress = addr, Value = val, TimestampUtc = ts }); public void Dispose() { } } private static AlarmSubtagTarget Target() => new() { AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi", ActiveSubtag = "Tank01.Level.HiHi.active", AckedSubtag = "Tank01.Level.HiHi.acked", AckCommentSubtag = "Tank01.Level.HiHi.ackmsg", }; [Fact] public void Subscribe_AdvisesAllSubtags() { var src = new FakeSource(); using var c = new SubtagAlarmConsumer(src, new[] { Target() }); c.Subscribe("ignored-in-subtag-mode"); Assert.Contains("Tank01.Level.HiHi.active", src.Advised); Assert.Contains("Tank01.Level.HiHi.acked", src.Advised); } [Fact] public void ValueChange_RaisesSynthesizedTransition() { var src = new FakeSource(); using var c = new SubtagAlarmConsumer(src, new[] { Target() }); c.Subscribe("x"); MxAlarmTransitionEvent? seen = null; c.AlarmTransitionEmitted += (_, e) => seen = e; src.Raise("Tank01.Level.HiHi.active", true, new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc)); Assert.NotNull(seen); Assert.Equal(MxAlarmStateKind.UnackAlm, seen!.Record.State); } [Fact] public void AcknowledgeByName_WritesCommentToAckCommentSubtag() { var src = new FakeSource(); using var c = new SubtagAlarmConsumer(src, new[] { Target() }); c.Subscribe("x"); int rc = c.AcknowledgeByName("Tank01.Level.HiHi", "Galaxy", "Area", "ack from HMI", "op1", "node", "dom", "Op One"); Assert.Equal(0, rc); Assert.Equal(("Tank01.Level.HiHi.ackmsg", (object?)"ack from HMI"), src.LastWrite); } } ``` **Step 2: Implement `SubtagAlarmConsumer : IMxAccessAlarmConsumer`.** - Constructor `(ISubtagAlarmSource source, IReadOnlyList watchList)`; build a `SubtagAlarmStateMachine`; index `alarm_full_reference`→target for ack routing. - `Subscribe(_)`: call `source.Advise()`; subscribe to `source.ValueChanged`, feed each into the state machine, and re-raise each produced `MxAlarmTransitionEvent` via `AlarmTransitionEmitted` (mark degraded). - `AcknowledgeByName(alarmName, …, comment, …)`: resolve the target by reference; if no `AckCommentSubtag`, return a non-zero failure code; else `source.Write(target.AckCommentSubtag, comment)` and return 0. - `AcknowledgeByGuid(guid, …)`: map the synthetic GUID (deterministic hash of reference — see Task 8 helper, or a local copy) back to a reference, then delegate to the name path; unknown GUID ⇒ non-zero. - `SnapshotActiveAlarms()`: from the state machine. - `PollOnce()`: no-op. - `Dispose()`: unsubscribe + dispose source. **Step 3: Run green.** `dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmConsumerTests`. **Step 4: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs \ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs git commit -m "worker(alarms): SubtagAlarmConsumer synthesizing transitions over the source seam" ``` --- ### Task 6: COM-backed `LmxSubtagAlarmSource` (own LMXProxyServerClass) **Classification:** high-risk **Estimated implement time:** ~5 min **Parallelizable with:** none The only piece that touches live COM. Like `WnWrapAlarmConsumer`, it owns its own MXAccess server object so the subtag source is self-contained and isolated from the session's item pipeline. Logic stays thin (advise/write/marshal); real verification is the live smoke test in Task 17. **Files:** - Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs` - Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs` (constructor/guard tests only; COM path is live-gated) **Step 1: Implement `LmxSubtagAlarmSource : ISubtagAlarmSource`.** - Own an `LMXProxyServerClass` (reuse the worker's `IMxAccessServer`/`MxAccessComServer` wrapper + `IMxAccessComObjectFactory` so it is fakeable; constructor takes the factory). - `Advise(addresses)`: `RegisterServer` (topic) once; per address `AddItem`→`itemHandle`, `Advise`, and record `itemHandle→address`. Subscribe to the proxy's `OnDataChange`; in the handler, look up the address by `phItemHandle`, normalize `pvItemValue` (VARIANT→bool/double) and `pftItemTimeStamp`→UTC, and raise `ValueChanged`. All calls run on the STA (the worker STA pumps messages, so `OnDataChange` delivers). - `Write(address, value)`: resolve/create the item handle, `server.Write(serverHandle, itemHandle, value, userId: 0)`. - `Dispose()`: `UnAdvise`/`RemoveItem`/`Unregister`/release COM. **Step 2: Tests** — only the non-COM guards (null factory throws; `Write` before `Advise` resolves a handle or throws a clear error). Mark the COM round-trip `[LiveMxAccessFact]` and `Skip` per the `AlarmsLiveSmokeTests` precedent. **Step 3: Build x86 + run unit tests.** `dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86` `dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~LmxSubtagAlarmSourceTests` **Step 4: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs \ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs git commit -m "worker(alarms): COM-backed LmxSubtagAlarmSource advising alarm subtags" ``` --- ### Task 7: `FailoverAlarmConsumer` state machine **Classification:** high-risk **Estimated implement time:** ~5 min **Parallelizable with:** none (depends on Task 5) **Files:** - Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs` - Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs` (small EventArgs) - Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs` **Step 1: Test the switch/failback with a fake primary that throws.** ```csharp public sealed class FailoverAlarmConsumerTests { private sealed class FlakyPrimary : IMxAccessAlarmConsumer { public event EventHandler? AlarmTransitionEmitted; public int PollsUntilHeal = int.MaxValue; // becomes healthy after N polls while degraded public bool ThrowOnPoll = true; private int _polls; public void Subscribe(string s) { if (ThrowOnPoll) throw new COMException("boom", unchecked((int)0x80004005)); } public void PollOnce() { _polls++; if (ThrowOnPoll && _polls < PollsUntilHeal) throw new COMException("boom", unchecked((int)0x80004005)); } public int AcknowledgeByGuid(Guid g, string c, string a, string b, string d, string e) => 0; public int AcknowledgeByName(string n, string p, string gr, string c, string a, string b, string d, string e) => 0; public IReadOnlyList SnapshotActiveAlarms() => Array.Empty(); public void Dispose() { } } private sealed class StubStandby : IMxAccessAlarmConsumer { /* records Subscribe, no-op rest */ } [Fact] public void Primary_FailsThresholdTimes_SwitchesToSubtagAndEmitsModeChange() { var primary = new FlakyPrimary(); var standby = new StubStandby(); using var c = new FailoverAlarmConsumer(primary, standby, new FailoverSettings(threshold: 3, probeIntervalSeconds: 30, stableProbes: 3)); AlarmProviderModeChange? change = null; c.ProviderModeChanged += (_, e) => change = e; c.Subscribe("\\\\host\\Galaxy!Area"); // primary.Subscribe throws -> counts as failure 1 c.PollOnce(); // failure 2 c.PollOnce(); // failure 3 -> switch Assert.NotNull(change); Assert.Equal(AlarmProviderMode.Subtag, change!.Mode); } [Fact] public void WhileDegraded_PrimaryHeals_FailsBackAfterStableProbes() { var primary = new FlakyPrimary { PollsUntilHeal = 0 }; // will heal once we stop throwing var standby = new StubStandby(); using var c = new FailoverAlarmConsumer(primary, standby, new FailoverSettings(threshold: 1, probeIntervalSeconds: 0, stableProbes: 2)); var modes = new List(); c.ProviderModeChanged += (_, e) => modes.Add(e.Mode); c.Subscribe("x"); // failure -> switch to subtag primary.ThrowOnPoll = false; c.ProbeOnce(); // clean probe 1 c.ProbeOnce(); // clean probe 2 -> failback Assert.Equal(AlarmProviderMode.Subtag, modes[0]); Assert.Equal(AlarmProviderMode.Alarmmgr, modes[^1]); } } ``` **Step 2: Implement.** - `record FailoverSettings(int threshold, int probeIntervalSeconds, int stableProbes)`; `AlarmProviderModeChange : EventArgs { AlarmProviderMode Mode; string Reason; int HResult; DateTime AtUtc; }`. - Constructor `(IMxAccessAlarmConsumer primary, IMxAccessAlarmConsumer standby, FailoverSettings settings)`; forced-mode variants handled in Task 9 wiring (forced ⇒ skip the other consumer). - Forward `AlarmTransitionEmitted` from the **active** child only (swap the subscription on switch). - Wrap `Subscribe`/`PollOnce` on the primary: on `COMException` (or a failure HRESULT) while `PrimaryActive`, increment a counter; at `threshold`, ensure standby `Subscribe`d, set active=standby, snapshot standby for hand-off, raise `ProviderModeChanged(Subtag, reason, hresult, now)`. Reset counter on any clean primary poll. - `ProbeOnce()` (driven by the poll loop while degraded, gated by `probeIntervalSeconds`): try primary `Subscribe`+`PollOnce`; count consecutive clean probes; at `stableProbes`, set active=primary, return standby to standby, raise `ProviderModeChanged(Alarmmgr, "recovered", 0, now)`. - `Acknowledge*` / `SnapshotActiveAlarms` delegate to the **active** child. - `PollOnce()` drives the active child's poll, and—while degraded—also drives the failback probe cadence. **Step 3: Run green** (x86 filter `FailoverAlarmConsumerTests`). **Step 4: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs \ src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs \ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs git commit -m "worker(alarms): FailoverAlarmConsumer auto-failover/failback state machine" ``` --- ### Task 8: Synthetic-GUID helper + degraded flag on the event sink path **Classification:** standard **Estimated implement time:** ~4 min **Parallelizable with:** Task 9 Carry `degraded` + `source_provider` from the worker synthesis into the emitted `OnAlarmTransitionEvent`. **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs` (add `bool Degraded`) - Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` (`EnqueueTransition` carries degraded) - Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs` (`CreateOnAlarmTransition` sets `Degraded`/`SourceProvider`) - Create: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs` - Test: add cases to `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmDispatcherTests.cs` and a new `SyntheticAlarmGuidTests.cs` **Step 1: `SyntheticAlarmGuid.ForReference(string reference)`** — deterministic GUID from a stable hash (e.g. MD5 of the UTF-8 reference → `new Guid(bytes)`), so subtag-mode acks resolve by GUID. Test determinism + difference: ```csharp [Fact] public void SameReference_SameGuid() => Assert.Equal(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.C")); [Fact] public void DifferentReference_DifferentGuid() => Assert.NotEqual(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.D")); ``` **Step 2: Thread `degraded`** through `MxAlarmSnapshotRecord.Degraded`, `EnqueueTransition(... bool degraded)`, and `CreateOnAlarmTransition(... bool degraded, AlarmProviderMode sourceProvider)`. Default `degraded=false`, `sourceProvider=Alarmmgr` so the wnwrap path is unchanged (regression: existing `AlarmDispatcherTests` still pass with `Degraded=false`). **Step 3: Tests** — extend `AlarmDispatcherTests` with a subtag-style transition asserting `body.Degraded == true` and `SourceProvider == Subtag`. **Step 4: Build x86 + run** worker tests for `AlarmDispatcherTests`, `SyntheticAlarmGuidTests`. **Step 5: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs \ src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs \ src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs \ src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs \ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/ git commit -m "worker(alarms): synthetic GUID + degraded provenance on emitted transitions" ``` --- ### Task 9: Wire watch-list + failover config through `AlarmCommandHandler`; emit mode-changed event **Classification:** high-risk **Estimated implement time:** ~5 min **Parallelizable with:** none (depends on Tasks 5, 7, 8) **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmCommandHandler.cs` - Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IAlarmCommandHandler.cs` - Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs` (`ExecuteSubscribeAlarms`, ~lines 588-616) - Modify: `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessStaSession.cs` (consumer factory wiring; mode-change → event queue) - Test: `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs` (extend or create) **Step 1: Carry the subscribe payload.** Change the alarm subscribe entry point from `Subscribe(string subscription)` to `Subscribe(SubscribeAlarmsCommand command)` (the command now has `ForcedMode`, `WatchList`, `Failover`). In `AlarmCommandHandler.Subscribe`: - Build the active provider per `ForcedMode`: - `ALARMMGR` ⇒ `WnWrapAlarmConsumer` only. - `SUBTAG` ⇒ `SubtagAlarmConsumer(new LmxSubtagAlarmSource(factory), watchList)` only. - `UNSPECIFIED` ⇒ `FailoverAlarmConsumer(primary: wnwrap, standby: subtag, settings-from-Failover)`. - Use the existing `consumerFactory` seam but widen it to `Func` so tests inject fakes and production builds the failover composite. Subscribe to `FailoverAlarmConsumer.ProviderModeChanged` and enqueue an `OnAlarmProviderModeChangedEvent` MxEvent via the event queue (new mapper method `CreateOnAlarmProviderModeChanged`). **Step 2: Executor + STA wiring.** `ExecuteSubscribeAlarms` passes the full `SubscribeAlarmsCommand` (not just the expression). In `MxAccessStaSession`, the `alarmCommandHandlerFactory` must give the handler access to the `IMxAccessComObjectFactory` so the subtag source can create its own proxy server on the STA; keep the `EnsureOnAlarmConsumerThread` affinity guard on every path. **Step 3: Test** — fake consumer factory; assert that a `SUBTAG` forced command builds the subtag consumer and advises; that an auto command building a fake failover composite, when it raises `ProviderModeChanged`, enqueues an `OnAlarmProviderModeChangedEvent` on the queue. **Step 4: Build x86 + worker tests.** **Step 5: Commit.** ```bash git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/ git commit -m "worker(alarms): route watch-list/failover config; emit provider-mode-changed event" ``` --- ## Phase 2 — Gateway: discovery, options, monitor, metrics, dashboard ### Task 10: `AlarmsOptions.Fallback` + validation **Classification:** standard **Estimated implement time:** ~4 min **Parallelizable with:** Task 11, Task 13 **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmsOptions.cs` - Create: `src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmFallbackOptions.cs` - Modify: `src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayOptionsValidator.cs` (`ValidateAlarms`, ~lines 234-258) - Test: `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` (extend) **Step 1:** Add `AlarmFallbackOptions Fallback { get; init; } = new();` to `AlarmsOptions`. `AlarmFallbackOptions`: `string Mode = "Auto"` (`Auto|ForceAlarmManager|ForceSubtag`), `int ConsecutiveFailureThreshold = 3`, `int FailbackProbeIntervalSeconds = 30`, `int FailbackStableProbes = 3`, a `Discovery` sub-object (`bool UseGalaxyRepository = true`, `string Area = ""`, `string[] IncludeAttributes = []`, `string[] ExcludeAttributes = []`), and a `Subtags` sub-object (`Active="active"`, `Acked="acked"`, `AckComment=""`, `Priority="priority"`). **Step 2:** In `ValidateAlarms`, when `Enabled` and `Mode == "ForceSubtag"` and `Discovery.UseGalaxyRepository == false` and `IncludeAttributes` empty ⇒ add a validation error ("ForceSubtag requires Galaxy Repository discovery or an explicit IncludeAttributes list"). Floor the three numeric values at 1. Validate `Mode` is one of the three literals. **Step 3-5:** Test the new validation cases (red→green), build the server, commit. --- ### Task 11: Galaxy Repository "alarm attributes" discovery query **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** Task 10, Task 13 **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyRepository.cs` (add `GetAlarmAttributesAsync` + SQL constant, following `GetAttributesAsync` ~lines 86-115 and `AttributesSql` ~line 176) - Modify: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/IGalaxyRepository.cs` - Create: `src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyAlarmAttributeRow.cs` - Test: `src/ZB.MOM.WW.MxGateway.Tests/Galaxy/` (projection unit test; live SQL gated) **Step 1:** `GalaxyAlarmAttributeRow { string FullTagReference; string SourceObjectReference; string AckCommentSubtag; }` (and any priority subtag). `GetAlarmAttributesAsync` reuses the existing `is_alarm` detection (the `AlarmExtension` primitive join already in `AttributesSql`) filtered to `is_alarm = 1`, projecting the alarm reference + its ack-comment attribute. Follow the exact `SqlConnection`/`SqlCommand`/`SqlDataReader` pattern from `GetAttributesAsync`. **Step 2:** Unit-test the row→`AlarmSubtagTarget` mapping (a pure mapper function); gate any live-DB test like the existing Galaxy live tests (or `Skip` with a note, matching `AlarmsLiveSmokeTests`). **Step 3-5:** red→green, build server, commit. --- ### Task 12: Watch-list resolver (GR SQL + config override → `AlarmSubtagTarget[]`) **Classification:** standard **Estimated implement time:** ~4 min **Parallelizable with:** none (depends on Tasks 10, 11) **Files:** - Create: `src/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmWatchListResolver.cs` - Create: `src/ZB.MOM.WW.MxGateway.Server/Alarms/IAlarmWatchListResolver.cs` - Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmWatchListResolverTests.cs` **Step 1: Test the merge** with a fake `IGalaxyRepository`: - discovery rows + `IncludeAttributes` are unioned; `ExcludeAttributes` removed; each becomes an `AlarmSubtagTarget` with `.active`/`.acked`/`.ackmsg` addresses composed from the configured `Subtags` names (`.`, etc.); empty config subtag names fall back to defaults; GR unavailable + no includes ⇒ empty list + a logged warning flag. **Step 2: Implement** `ResolveAsync(AlarmsOptions, CancellationToken) → IReadOnlyList`. **Step 3-5:** red→green, build, commit. --- ### Task 13: Gateway metrics — provider-mode gauge + switch counter **Classification:** small **Estimated implement time:** ~3 min **Parallelizable with:** Task 10, Task 11 **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs` (ctor ~lines 55-79; add counter + observable gauge following the existing pattern) - Test: `src/ZB.MOM.WW.MxGateway.Tests/Metrics/GatewayMetricsTests.cs` (if present; else assert via a `MeterListener`) **Step 1:** Add `mxgateway.alarms.provider_switches` counter (tagged `from`,`to`,`reason`) and `mxgateway.alarms.provider_mode` observable gauge (1=alarmmgr, 2=subtag), plus `AlarmProviderSwitched(int from, int to, string reason)` and a private `GetAlarmProviderMode()` (lock on `_syncRoot` like the others). **Step 2-4:** test, build, commit. --- ### Task 14: `GatewayAlarmMonitor` — arm watch-list, reflect provider mode, reconcile on switch **Classification:** high-risk **Estimated implement time:** ~5 min **Parallelizable with:** none (depends on Tasks 9, 12, 13) **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cs` (ctor ~41-49; `SubscribeAlarmsAsync` ~210-233; event-drain loop; `StreamAsync` ~386-434) - Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/GatewayAlarmMonitorProviderModeTests.cs` (new, using `FakeWorkerHarness`) **Step 1:** Inject `IAlarmWatchListResolver` and `GatewayMetrics`. In `SubscribeAlarmsAsync`, resolve the watch-list and build the `SubscribeAlarmsCommand` with `ForcedMode` (from `Fallback.Mode`), `WatchList`, and `Failover` populated from options — instead of the bare `{ SubscriptionExpression }`. **Step 2:** In the worker-event drain path, handle `OnAlarmProviderModeChangedEvent`: update a `_providerStatus` field (mode/degraded/reason/since), `Broadcast(new AlarmFeedMessage { ProviderStatus = … })` to every subscriber, call `metrics.AlarmProviderSwitched(...)`, and force a `ReconcileAsync` so the cache re-seeds from the now-active provider (avoids raise/clear storms). **Step 3:** In `StreamAsync`, emit the current `provider_status` as the **first** message (before the snapshot) so a late joiner immediately knows the mode. **Step 4: Test** — stand up the monitor with `FakeWorkerHarness`; emit an `OnAlarmProviderModeChangedEvent(Subtag)`; assert a `StreamAsync` subscriber receives a `ProviderStatus{ Mode=Subtag, Degraded=true }` and that the switch counter incremented. Also assert a transition emitted in subtag mode flows through with `Degraded=true`. **Step 5:** build server, run the new test, commit. --- ### Task 15: Dashboard — push provider status to `/hubs/alarms` + UI indicator **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** none (depends on Task 14) **Files:** - Modify: `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/AlarmsHubPublisher.cs` (forward `ProviderStatus` messages — they already flow through `StreamAsync`, so confirm the existing `SendAsync(AlarmMessage, message)` carries them; add a dedicated `"ProviderModeChanged"` client method if the dashboard needs a distinct channel) - Modify: the alarms dashboard page/component (Bootstrap-only badge: green "alarmmgr" / amber "degraded — subtag") — find under `src/ZB.MOM.WW.MxGateway.Server/Dashboard/` - Test: `src/ZB.MOM.WW.MxGateway.Tests/` dashboard model test (e.g. a `DashboardAlarmProviderStatus.FromFeed` mapper, mirroring `DashboardActiveAlarm.FromSnapshot`) **Constraint:** Bootstrap CSS/JS only — no MudBlazor/Radzen/FluentUI. **Steps:** TDD the model mapper, wire the publisher + badge, build, commit. --- ## Phase 3 — Integration, docs, live smoke ### Task 16: End-to-end fake-worker failover test **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** Task 18 **Files:** - Test: `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmFailoverEndToEndTests.cs` Drive the full gateway path with `FakeWorkerHarness`: subscribe (assert the `SubscribeAlarmsCommand` carries a watch-list), emit a wnwrap-style transition (assert `Degraded=false`), emit `OnAlarmProviderModeChangedEvent(Subtag)`, emit a synthesized transition (assert `Degraded=true`, `SourceProvider=Subtag`), then `OnAlarmProviderModeChangedEvent(Alarmmgr)` and assert the feed reports recovery. Build, run, commit. --- ### Task 17: Live subtag smoke test (opt-in) **Classification:** small **Estimated implement time:** ~4 min **Parallelizable with:** Task 18 **Files:** - Test: `src/ZB.MOM.WW.MxGateway.IntegrationTests/...AlarmSubtagLiveSmokeTests.cs` (or the worker live suite) A `[LiveMxAccessFact]`, `Skip`-by-default test (per `AlarmsLiveSmokeTests` precedent) that, against a live Galaxy + alarm flip script: advises the real `.active`/`.acked` subtags via `LmxSubtagAlarmSource`, asserts a synthesized raise/clear, and performs an ack via the ack-comment write. Document the exact subtag names discovered (resolves the design's open item). Commit. --- ### Task 18: Documentation **Classification:** trivial **Estimated implement time:** ~5 min **Parallelizable with:** Task 16, Task 17 **Files:** - Modify: `gateway.md` (alarm provider section: dual provider + auto-failover/failback) - Modify: `docs/DesignDecisions.md` (record the fallback decision + parity rationale) - Modify: `docs/GatewayConfiguration.md` (the `MxGateway:Alarms:Fallback` block) - Modify: `docs/AlarmClientDiscovery.md` (subtag provider, synthesis rules, ack-comment write) - Modify: `docs/Grpc.md` (new `provider_status` feed case + `degraded`/`source_provider` fields) Follow `StyleGuide.md` (PascalCase filenames, present tense, explain *why*). No code; commit. --- ## Execution order & parallelism summary - **Serial spine:** 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8/9 → 10/11 → 12 → 13 → 14 → 15 → 16 → 17/18. - **Parallelizable clusters:** {8, 9 partially}, {10, 11, 13}, {16, 17, 18}. - **High-risk tasks** (full review chain): 1, 2, 6, 7, 9, 14. **Standard:** 4, 5, 8, 10, 11, 12, 15, 16. **Small/trivial:** 3, 13, 17, 18. ## Risk notes for the executor - **Field-number collisions:** Task 2 must read the live `MxEvent`/`MxEventFamily` numbers before adding — the agent map gave alarm-payload maxima but not `MxEvent`'s. Verify before editing. - **STA discipline:** every COM call in `LmxSubtagAlarmSource` and every consumer swap runs on the worker STA; keep the `EnsureOnAlarmConsumerThread` guard. The worker STA already pumps Windows messages, which is required for the subtag `OnDataChange` to deliver. - **Parity regression:** alarmmgr-mode output must be byte-for-byte unchanged. Existing `AlarmDispatcherTests` and `ProtobufContractRoundTripTests` are the guardrail — they must stay green with `Degraded=false` defaults. - **Subtag names unverified:** the design leaves exact AVEVA subtag names (`.active`, `.acked`, ack-comment) to confirm against `C:\Users\dohertj2\Desktop\mxaccess` + a live Galaxy (Task 17). The config `Subtags` block exists so names are not hard-coded.