44 KiB
Alarm Subtag-Monitoring Fallback — Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
Goal: Add a second alarm source — direct MXAccess subtag monitoring — that the gateway auto-fails-over to when the wnwrap alarmmgr provider breaks, auto-fails-back to when it recovers, and can be forced on by config.
Architecture: Worker-side synthesis (parity rule preserved). A new SubtagAlarmConsumer (own LMXProxyServerClass, AddItem/Advise on alarm subtags) and a FailoverAlarmConsumer composite (state machine over the wnwrap primary + subtag standby) both implement the existing IMxAccessAlarmConsumer seam. The gateway resolves the subtag watch-list (Galaxy Repository SQL + config override), arms the worker at subscribe time, and reflects the live provider mode into the gRPC alarm feed, the dashboard hub, and metrics.
Tech Stack: .NET 10 (gateway, x64) + .NET Framework 4.8 (worker, x86, STA), protobuf/gRPC, Microsoft.Data.SqlClient (Galaxy Repository), SignalR (dashboard), System.Diagnostics.Metrics, xUnit (plain Assert, no FluentAssertions).
Design source: docs/plans/2026-06-13-alarm-subtag-fallback-design.md
Branch: feat/alarm-subtag-fallback (already created)
Conventions for every task
- TDD: write the failing test, run it red, implement, run it green, commit.
- xUnit, plain
Assert.*, namingSubject_Condition_Expected. Worker fakes are sealed private nested classes that raise events. - Build/test commands:
- Contracts regen:
dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj - Gateway:
dotnet build src/ZB.MOM.WW.MxGateway.Server;dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj - Worker (x86):
dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86;dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86 - Single test: append
--filter FullyQualifiedName~<ClassOrMethod>
- Contracts regen:
- Build is strict:
TreatWarningsAsErrors=true, nullable enabled. Add XML doc comments on public members (the repo runs a doc checker). - Generated code under
Generated/is never hand-edited — rebuild the contracts project to regenerate. - Namespaces: worker MxAccess types live in
ZB.MOM.WW.MxGateway.Worker.MxAccess; proto C# types inZB.MOM.WW.MxGateway.Contracts.Proto.
Phase 0 — Contracts
Task 1: Worker proto — subtag watch-list, failover config, provider-mode enum
Classification: high-risk Estimated implement time: ~4 min Parallelizable with: none (Task 2 imports these types)
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto(realSubscribeAlarmsCommandat ~line 324;MxCommandreferences it at 123-125)
CORRECTION (execution): The alarm command messages and
MxCommandlive inmxaccess_gateway.proto, not the worker proto.mxaccess_worker.protoimports the gateway proto (WorkerCommand.commandismxaccess_gateway.v1.MxCommand), so the gateway proto is the base and the worker proto needs no change.AlarmProviderModeand the new types are added to the gateway proto and are visible to worker code asmxaccess_gateway.v1types. Tasks 1 and 2 are executed as a single combined edit on this one file.
Step 1: Add the enum and messages. In mxaccess_gateway.proto, extend the existing SubscribeAlarmsCommand message (line 324) and add the new types after it:
// Provider selection / current provider for the alarm feed. Defined here in
// the worker contract because the worker SubscribeAlarmsCommand references it;
// mxaccess_gateway.proto imports this file and reuses the same enum.
enum AlarmProviderMode {
ALARM_PROVIDER_MODE_UNSPECIFIED = 0; // auto: alarmmgr primary, subtag fallback
ALARM_PROVIDER_MODE_ALARMMGR = 1;
ALARM_PROVIDER_MODE_SUBTAG = 2;
}
message SubscribeAlarmsCommand {
string subscription_expression = 1; // existing field — keep
// UNSPECIFIED = auto-failover/failback. ALARMMGR/SUBTAG force one provider.
AlarmProviderMode forced_mode = 2;
// Subtag watch-list resolved by the gateway (GR SQL + config). Empty in pure
// alarmmgr mode; in subtag mode it bounds what the consumer can observe.
repeated AlarmSubtagTarget watch_list = 3;
AlarmFailoverConfig failover = 4;
}
// One alarm attribute the subtag consumer advises. Addresses are full MXAccess
// item references the worker passes straight to AddItem.
message AlarmSubtagTarget {
string alarm_full_reference = 1; // e.g. "Galaxy!Area.Tank01.Level.HiHi"
string source_object_reference = 2; // e.g. "Tank01"
string active_subtag = 3; // item address of the in-alarm boolean
string acked_subtag = 4; // item address of the acknowledged boolean
string ack_comment_subtag = 5; // writable ack-comment attribute (ack write target)
string priority_subtag = 6; // optional severity source; empty if absent
}
message AlarmFailoverConfig {
int32 consecutive_failure_threshold = 1; // wnwrap COM failures before switching (>=1)
int32 failback_probe_interval_seconds = 2; // probe cadence while degraded (>=1)
int32 failback_stable_probes = 3; // clean probes before switching back (>=1)
}
UnsubscribeAlarmsCommand and AcknowledgeAlarmCommand are unchanged.
Step 2: Regenerate & verify it compiles.
Run: dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj
Expected: build succeeds; generated AlarmProviderMode, AlarmSubtagTarget, AlarmFailoverConfig types appear.
Step 3: Commit.
git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto
git commit -m "contracts(worker): subtag watch-list + failover config + AlarmProviderMode"
Task 2: Gateway proto — provider status on the feed, degraded provenance, mode-changed event
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none (depends on Task 1; Task 3 tests both)
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto(OnAlarmTransitionEvent~719-771,ActiveAlarmSnapshot~783-803,AlarmFeedMessage~860-870,MxEventfamily enum + body oneof,MxEventFamilyenum)
Step 1: Add degraded provenance to the two alarm payloads. Append to OnAlarmTransitionEvent (next free field 14):
// True when this transition came from the subtag-monitoring fallback rather
// than the native alarmmgr provider — i.e. it was synthesized from data
// changes and carries reduced fidelity (synthetic GUID, no native raise time).
bool degraded = 14;
// Which provider produced this transition.
AlarmProviderMode source_provider = 15;
Append the identical two fields to ActiveAlarmSnapshot (next free field 14):
bool degraded = 14;
AlarmProviderMode source_provider = 15;
Step 2: Add provider status to the feed oneof. Add a new oneof case to AlarmFeedMessage (next free field 4) and a new message:
message AlarmFeedMessage {
oneof payload {
ActiveAlarmSnapshot active_alarm = 1;
bool snapshot_complete = 2;
OnAlarmTransitionEvent transition = 3;
// Provider-mode status. Emitted once on stream open and again on every
// failover/failback so late joiners learn the current mode immediately.
AlarmProviderStatus provider_status = 4;
}
}
message AlarmProviderStatus {
AlarmProviderMode mode = 1;
bool degraded = 2; // true whenever mode == SUBTAG
string reason = 3; // human-readable switch reason
google.protobuf.Timestamp since = 4;
}
Step 3: Add the worker→gateway mode-changed event to MxEvent. Find the MxEventFamily enum and the MxEvent body oneof. Add a family member and a body message + oneof case (use the next free family value and the next free MxEvent body field number — check the file):
// in MxEventFamily enum:
MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED = <next>;
// new message near OnAlarmTransitionEvent:
message OnAlarmProviderModeChangedEvent {
AlarmProviderMode mode = 1;
string reason = 2;
int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback
google.protobuf.Timestamp at = 4;
}
// in MxEvent body oneof:
OnAlarmProviderModeChangedEvent on_alarm_provider_mode_changed = <next>;
AlarmProviderMode is defined in mxaccess_worker.proto; confirm mxaccess_gateway.proto already has import "mxaccess_worker.proto"; (it references SubscribeAlarmsCommand, so it does) and reference the enum unqualified or via its package as the existing references do.
Step 4: Regenerate & verify.
Run: dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj
Expected: build succeeds.
Step 5: Commit.
git add src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto
git commit -m "contracts(gateway): AlarmProviderStatus feed case, degraded provenance, mode-changed event"
Task 3: Proto round-trip tests for the new alarm fields
Classification: small Estimated implement time: ~3 min Parallelizable with: none (depends on Tasks 1-2)
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs
Step 1: Add tests mirroring the existing Event_RoundTripsOnAlarmTransitionWithFullPayload style:
[Fact]
public void Feed_RoundTripsProviderStatus()
{
var since = Timestamp.FromDateTime(new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc));
var original = new AlarmFeedMessage
{
ProviderStatus = new AlarmProviderStatus
{
Mode = AlarmProviderMode.Subtag,
Degraded = true,
Reason = "wnwrap poll failed 3x (HRESULT 0x80004005)",
Since = since,
},
};
var parsed = AlarmFeedMessage.Parser.ParseFrom(original.ToByteArray());
Assert.Equal(original, parsed);
Assert.Equal(AlarmFeedMessage.PayloadOneofCase.ProviderStatus, parsed.PayloadCase);
Assert.True(parsed.ProviderStatus.Degraded);
Assert.Equal(AlarmProviderMode.Subtag, parsed.ProviderStatus.Mode);
}
[Fact]
public void Transition_RoundTripsDegradedProvenance()
{
var t = new OnAlarmTransitionEvent
{
AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi",
TransitionKind = AlarmTransitionKind.Raise,
Degraded = true,
SourceProvider = AlarmProviderMode.Subtag,
};
var parsed = OnAlarmTransitionEvent.Parser.ParseFrom(t.ToByteArray());
Assert.True(parsed.Degraded);
Assert.Equal(AlarmProviderMode.Subtag, parsed.SourceProvider);
}
Step 2: Run red→green.
Run: dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ProtobufContractRoundTripTests
Expected: PASS.
Step 3: Commit.
git add src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs
git commit -m "test(contracts): round-trip provider status + degraded provenance"
Phase 1 — Worker: subtag consumer + failover
Task 4: Subtag value-source abstraction + synthesis state holder
Classification: standard Estimated implement time: ~5 min Parallelizable with: none (Task 5 builds on it)
A testable seam so synthesis logic is unit-tested without COM. The COM wiring lands in Task 6.
Files:
- Create:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs - Create:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs - Test:
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs
Step 1: Define the source abstraction. ISubtagAlarmSource advises subtag addresses and raises a normalized value-change callback on the STA:
namespace ZB.MOM.WW.MxGateway.Worker.MxAccess;
/// <summary>A change in one advised subtag value, normalized off the COM boundary.</summary>
public sealed class SubtagValueChange
{
/// <summary>The full item address that changed (matches an AlarmSubtagTarget subtag).</summary>
public string ItemAddress { get; init; } = string.Empty;
/// <summary>The new value (boolean for .active/.acked, numeric for priority).</summary>
public object? Value { get; init; }
/// <summary>The change timestamp in UTC.</summary>
public DateTime TimestampUtc { get; init; }
}
/// <summary>
/// Advises a set of MXAccess subtag addresses and surfaces value changes.
/// The production implementation (Task 6) owns its own LMXProxyServerClass;
/// tests substitute a fake that pushes <see cref="SubtagValueChange"/>s.
/// </summary>
public interface ISubtagAlarmSource : IDisposable
{
/// <summary>Raised on the STA when an advised subtag's value changes.</summary>
event EventHandler<SubtagValueChange>? ValueChanged;
/// <summary>Advises every subtag in the supplied addresses; idempotent per address.</summary>
void Advise(IReadOnlyCollection<string> itemAddresses);
/// <summary>Writes a value to an item address (used for the ack-comment write).</summary>
void Write(string itemAddress, object? value);
}
Step 2: Write the state-machine tests first. SubtagAlarmStateMachine maps (active, acked) changes per target to MxAlarmTransitionEvents. Test the four core transitions:
namespace ZB.MOM.WW.MxGateway.Worker.Tests.MxAccess;
public sealed class SubtagAlarmStateMachineTests
{
private static AlarmSubtagTarget Target() => new()
{
AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi",
SourceObjectReference = "Tank01",
ActiveSubtag = "Tank01.Level.HiHi.active",
AckedSubtag = "Tank01.Level.HiHi.acked",
AckCommentSubtag = "Tank01.Level.HiHi.ackmsg",
};
[Fact]
public void ActiveFalseToTrue_EmitsRaise_FlaggedDegraded()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
var events = sm.Apply("Tank01.Level.HiHi.active", true, ts);
var e = Assert.Single(events);
Assert.Equal(MxAlarmStateKind.UnackAlm, e.Record.State);
Assert.Equal(MxAlarmStateKind.Unspecified, e.PreviousState);
Assert.Equal("Tank01.Level.HiHi", e.Record.TagName); // reference minus provider/area
}
[Fact]
public void AckedTrueWhileActive_EmitsAckTransition()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
sm.Apply("Tank01.Level.HiHi.active", true, ts);
var events = sm.Apply("Tank01.Level.HiHi.acked", true, ts.AddSeconds(5));
var e = Assert.Single(events);
Assert.Equal(MxAlarmStateKind.AckAlm, e.Record.State);
Assert.Equal(MxAlarmStateKind.UnackAlm, e.PreviousState);
}
[Fact]
public void ActiveTrueToFalse_WhileUnacked_EmitsUnackRtn()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
sm.Apply("Tank01.Level.HiHi.active", true, ts);
var events = sm.Apply("Tank01.Level.HiHi.active", false, ts.AddSeconds(10));
var e = Assert.Single(events);
Assert.Equal(MxAlarmStateKind.UnackRtn, e.Record.State);
}
[Fact]
public void Snapshot_ReflectsActiveAndAckedState()
{
var sm = new SubtagAlarmStateMachine(new[] { Target() });
var ts = new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc);
sm.Apply("Tank01.Level.HiHi.active", true, ts);
sm.Apply("Tank01.Level.HiHi.acked", true, ts);
var snap = Assert.Single(sm.SnapshotActive());
Assert.Equal(MxAlarmStateKind.AckAlm, snap.State);
}
}
Run: dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmStateMachineTests → FAIL (type missing).
Step 3: Implement SubtagAlarmStateMachine. Build an address→target index (active/acked/priority/comment addresses), hold per-reference (bool active, bool acked, DateTime firstRaiseUtc, int priority), and emit on change:
- active
false→true⇒UnackAlm, setfirstRaiseUtc,PreviousStatefrom prior state. - acked
false→truewhile active ⇒AckAlm. - active
true→false⇒AckRtnif currently acked elseUnackRtn; then reset acked. - priority change ⇒ update stored priority, no transition.
TagName=alarm_full_referencewith anyProvider!Area.prefix stripped (matchWnWrapAlarmConsumer's reference shape soGatewayAlarmMonitorkeys align). SetProviderName,Group,Priority,AlarmCommentfrom the target/last values. Mark aDegraded/source flag (carried via a new field — see Task 5 wiring).SnapshotActive()returnsMxAlarmSnapshotRecordfor references whose active is true.
Step 4: Run green. Expected: PASS.
Step 5: Commit.
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ISubtagAlarmSource.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmStateMachine.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmStateMachineTests.cs
git commit -m "worker(alarms): subtag value-source seam + synthesis state machine"
Task 5: SubtagAlarmConsumer over the source seam (no COM yet)
Classification: standard Estimated implement time: ~5 min Parallelizable with: none (depends on Task 4)
Files:
- Create:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs - Test:
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs
Step 1: Test with a fake ISubtagAlarmSource. Drive value changes through the source, assert AlarmTransitionEmitted fires with synthesized records and that ack writes the comment to the ack-comment subtag:
public sealed class SubtagAlarmConsumerTests
{
private sealed class FakeSource : ISubtagAlarmSource
{
public event EventHandler<SubtagValueChange>? ValueChanged;
public List<string> Advised { get; } = new();
public (string Address, object? Value)? LastWrite { get; private set; }
public void Advise(IReadOnlyCollection<string> a) => Advised.AddRange(a);
public void Write(string a, object? v) => LastWrite = (a, v);
public void Raise(string addr, object? val, DateTime ts) =>
ValueChanged?.Invoke(this, new SubtagValueChange { ItemAddress = addr, Value = val, TimestampUtc = ts });
public void Dispose() { }
}
private static AlarmSubtagTarget Target() => new()
{
AlarmFullReference = "Galaxy!Area.Tank01.Level.HiHi",
ActiveSubtag = "Tank01.Level.HiHi.active",
AckedSubtag = "Tank01.Level.HiHi.acked",
AckCommentSubtag = "Tank01.Level.HiHi.ackmsg",
};
[Fact]
public void Subscribe_AdvisesAllSubtags()
{
var src = new FakeSource();
using var c = new SubtagAlarmConsumer(src, new[] { Target() });
c.Subscribe("ignored-in-subtag-mode");
Assert.Contains("Tank01.Level.HiHi.active", src.Advised);
Assert.Contains("Tank01.Level.HiHi.acked", src.Advised);
}
[Fact]
public void ValueChange_RaisesSynthesizedTransition()
{
var src = new FakeSource();
using var c = new SubtagAlarmConsumer(src, new[] { Target() });
c.Subscribe("x");
MxAlarmTransitionEvent? seen = null;
c.AlarmTransitionEmitted += (_, e) => seen = e;
src.Raise("Tank01.Level.HiHi.active", true, new DateTime(2026, 6, 13, 9, 0, 0, DateTimeKind.Utc));
Assert.NotNull(seen);
Assert.Equal(MxAlarmStateKind.UnackAlm, seen!.Record.State);
}
[Fact]
public void AcknowledgeByName_WritesCommentToAckCommentSubtag()
{
var src = new FakeSource();
using var c = new SubtagAlarmConsumer(src, new[] { Target() });
c.Subscribe("x");
int rc = c.AcknowledgeByName("Tank01.Level.HiHi", "Galaxy", "Area",
"ack from HMI", "op1", "node", "dom", "Op One");
Assert.Equal(0, rc);
Assert.Equal(("Tank01.Level.HiHi.ackmsg", (object?)"ack from HMI"), src.LastWrite);
}
}
Step 2: Implement SubtagAlarmConsumer : IMxAccessAlarmConsumer.
- Constructor
(ISubtagAlarmSource source, IReadOnlyList<AlarmSubtagTarget> watchList); build aSubtagAlarmStateMachine; indexalarm_full_reference→target for ack routing. Subscribe(_): callsource.Advise(<all subtag addresses>); subscribe tosource.ValueChanged, feed each into the state machine, and re-raise each producedMxAlarmTransitionEventviaAlarmTransitionEmitted(mark degraded).AcknowledgeByName(alarmName, …, comment, …): resolve the target by reference; if noAckCommentSubtag, return a non-zero failure code; elsesource.Write(target.AckCommentSubtag, comment)and return 0.AcknowledgeByGuid(guid, …): map the synthetic GUID (deterministic hash of reference — see Task 8 helper, or a local copy) back to a reference, then delegate to the name path; unknown GUID ⇒ non-zero.SnapshotActiveAlarms(): from the state machine.PollOnce(): no-op.Dispose(): unsubscribe + dispose source.
Step 3: Run green. dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~SubtagAlarmConsumerTests.
Step 4: Commit.
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SubtagAlarmConsumer.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/SubtagAlarmConsumerTests.cs
git commit -m "worker(alarms): SubtagAlarmConsumer synthesizing transitions over the source seam"
Task 6: COM-backed LmxSubtagAlarmSource (own LMXProxyServerClass)
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none
The only piece that touches live COM. Like WnWrapAlarmConsumer, it owns its own MXAccess server object so the subtag source is self-contained and isolated from the session's item pipeline. Logic stays thin (advise/write/marshal); real verification is the live smoke test in Task 17.
Files:
- Create:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs - Test:
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs(constructor/guard tests only; COM path is live-gated)
Step 1: Implement LmxSubtagAlarmSource : ISubtagAlarmSource.
- Own an
LMXProxyServerClass(reuse the worker'sIMxAccessServer/MxAccessComServerwrapper +IMxAccessComObjectFactoryso it is fakeable; constructor takes the factory). Advise(addresses):RegisterServer(topic) once; per addressAddItem→itemHandle,Advise, and recorditemHandle→address. Subscribe to the proxy'sOnDataChange; in the handler, look up the address byphItemHandle, normalizepvItemValue(VARIANT→bool/double) andpftItemTimeStamp→UTC, and raiseValueChanged. All calls run on the STA (the worker STA pumps messages, soOnDataChangedelivers).Write(address, value): resolve/create the item handle,server.Write(serverHandle, itemHandle, value, userId: 0).Dispose():UnAdvise/RemoveItem/Unregister/release COM.
Step 2: Tests — only the non-COM guards (null factory throws; Write before Advise resolves a handle or throws a clear error). Mark the COM round-trip [LiveMxAccessFact] and Skip per the AlarmsLiveSmokeTests precedent.
Step 3: Build x86 + run unit tests.
dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86
dotnet test ...Worker.Tests... -p:Platform=x86 --filter FullyQualifiedName~LmxSubtagAlarmSourceTests
Step 4: Commit.
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/LmxSubtagAlarmSource.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/LmxSubtagAlarmSourceTests.cs
git commit -m "worker(alarms): COM-backed LmxSubtagAlarmSource advising alarm subtags"
Task 7: FailoverAlarmConsumer state machine
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none (depends on Task 5)
Files:
- Create:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs - Create:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs(small EventArgs) - Test:
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs
Step 1: Test the switch/failback with a fake primary that throws.
public sealed class FailoverAlarmConsumerTests
{
private sealed class FlakyPrimary : IMxAccessAlarmConsumer
{
public event EventHandler<MxAlarmTransitionEvent>? AlarmTransitionEmitted;
public int PollsUntilHeal = int.MaxValue; // becomes healthy after N polls while degraded
public bool ThrowOnPoll = true;
private int _polls;
public void Subscribe(string s) { if (ThrowOnPoll) throw new COMException("boom", unchecked((int)0x80004005)); }
public void PollOnce()
{
_polls++;
if (ThrowOnPoll && _polls < PollsUntilHeal) throw new COMException("boom", unchecked((int)0x80004005));
}
public int AcknowledgeByGuid(Guid g, string c, string a, string b, string d, string e) => 0;
public int AcknowledgeByName(string n, string p, string gr, string c, string a, string b, string d, string e) => 0;
public IReadOnlyList<MxAlarmSnapshotRecord> SnapshotActiveAlarms() => Array.Empty<MxAlarmSnapshotRecord>();
public void Dispose() { }
}
private sealed class StubStandby : IMxAccessAlarmConsumer { /* records Subscribe, no-op rest */ }
[Fact]
public void Primary_FailsThresholdTimes_SwitchesToSubtagAndEmitsModeChange()
{
var primary = new FlakyPrimary();
var standby = new StubStandby();
using var c = new FailoverAlarmConsumer(primary, standby,
new FailoverSettings(threshold: 3, probeIntervalSeconds: 30, stableProbes: 3));
AlarmProviderModeChange? change = null;
c.ProviderModeChanged += (_, e) => change = e;
c.Subscribe("\\\\host\\Galaxy!Area"); // primary.Subscribe throws -> counts as failure 1
c.PollOnce(); // failure 2
c.PollOnce(); // failure 3 -> switch
Assert.NotNull(change);
Assert.Equal(AlarmProviderMode.Subtag, change!.Mode);
}
[Fact]
public void WhileDegraded_PrimaryHeals_FailsBackAfterStableProbes()
{
var primary = new FlakyPrimary { PollsUntilHeal = 0 }; // will heal once we stop throwing
var standby = new StubStandby();
using var c = new FailoverAlarmConsumer(primary, standby,
new FailoverSettings(threshold: 1, probeIntervalSeconds: 0, stableProbes: 2));
var modes = new List<AlarmProviderMode>();
c.ProviderModeChanged += (_, e) => modes.Add(e.Mode);
c.Subscribe("x"); // failure -> switch to subtag
primary.ThrowOnPoll = false;
c.ProbeOnce(); // clean probe 1
c.ProbeOnce(); // clean probe 2 -> failback
Assert.Equal(AlarmProviderMode.Subtag, modes[0]);
Assert.Equal(AlarmProviderMode.Alarmmgr, modes[^1]);
}
}
Step 2: Implement.
record FailoverSettings(int threshold, int probeIntervalSeconds, int stableProbes);AlarmProviderModeChange : EventArgs { AlarmProviderMode Mode; string Reason; int HResult; DateTime AtUtc; }.- Constructor
(IMxAccessAlarmConsumer primary, IMxAccessAlarmConsumer standby, FailoverSettings settings); forced-mode variants handled in Task 9 wiring (forced ⇒ skip the other consumer). - Forward
AlarmTransitionEmittedfrom the active child only (swap the subscription on switch). - Wrap
Subscribe/PollOnceon the primary: onCOMException(or a failure HRESULT) whilePrimaryActive, increment a counter; atthreshold, ensure standbySubscribed, set active=standby, snapshot standby for hand-off, raiseProviderModeChanged(Subtag, reason, hresult, now). Reset counter on any clean primary poll. ProbeOnce()(driven by the poll loop while degraded, gated byprobeIntervalSeconds): try primarySubscribe+PollOnce; count consecutive clean probes; atstableProbes, set active=primary, return standby to standby, raiseProviderModeChanged(Alarmmgr, "recovered", 0, now).Acknowledge*/SnapshotActiveAlarmsdelegate to the active child.PollOnce()drives the active child's poll, and—while degraded—also drives the failback probe cadence.
Step 3: Run green (x86 filter FailoverAlarmConsumerTests).
Step 4: Commit.
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/FailoverAlarmConsumer.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmProviderModeChange.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/FailoverAlarmConsumerTests.cs
git commit -m "worker(alarms): FailoverAlarmConsumer auto-failover/failback state machine"
Task 8: Synthetic-GUID helper + degraded flag on the event sink path
Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 9
Carry degraded + source_provider from the worker synthesis into the emitted OnAlarmTransitionEvent.
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs(addbool Degraded) - Modify:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs(EnqueueTransitioncarries degraded) - Modify:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs(CreateOnAlarmTransitionsetsDegraded/SourceProvider) - Create:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs - Test: add cases to
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmDispatcherTests.csand a newSyntheticAlarmGuidTests.cs
Step 1: SyntheticAlarmGuid.ForReference(string reference) — deterministic GUID from a stable hash (e.g. MD5 of the UTF-8 reference → new Guid(bytes)), so subtag-mode acks resolve by GUID. Test determinism + difference:
[Fact] public void SameReference_SameGuid() =>
Assert.Equal(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.C"));
[Fact] public void DifferentReference_DifferentGuid() =>
Assert.NotEqual(SyntheticAlarmGuid.ForReference("A.B.C"), SyntheticAlarmGuid.ForReference("A.B.D"));
Step 2: Thread degraded through MxAlarmSnapshotRecord.Degraded, EnqueueTransition(... bool degraded), and CreateOnAlarmTransition(... bool degraded, AlarmProviderMode sourceProvider). Default degraded=false, sourceProvider=Alarmmgr so the wnwrap path is unchanged (regression: existing AlarmDispatcherTests still pass with Degraded=false).
Step 3: Tests — extend AlarmDispatcherTests with a subtag-style transition asserting body.Degraded == true and SourceProvider == Subtag.
Step 4: Build x86 + run worker tests for AlarmDispatcherTests, SyntheticAlarmGuidTests.
Step 5: Commit.
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmSnapshot.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessEventMapper.cs \
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/SyntheticAlarmGuid.cs \
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/
git commit -m "worker(alarms): synthetic GUID + degraded provenance on emitted transitions"
Task 9: Wire watch-list + failover config through AlarmCommandHandler; emit mode-changed event
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none (depends on Tasks 5, 7, 8)
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmCommandHandler.cs - Modify:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IAlarmCommandHandler.cs - Modify:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs(ExecuteSubscribeAlarms, ~lines 588-616) - Modify:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessStaSession.cs(consumer factory wiring; mode-change → event queue) - Test:
src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs(extend or create)
Step 1: Carry the subscribe payload. Change the alarm subscribe entry point from Subscribe(string subscription) to Subscribe(SubscribeAlarmsCommand command) (the command now has ForcedMode, WatchList, Failover). In AlarmCommandHandler.Subscribe:
- Build the active provider per
ForcedMode:ALARMMGR⇒WnWrapAlarmConsumeronly.SUBTAG⇒SubtagAlarmConsumer(new LmxSubtagAlarmSource(factory), watchList)only.UNSPECIFIED⇒FailoverAlarmConsumer(primary: wnwrap, standby: subtag, settings-from-Failover).
- Use the existing
consumerFactoryseam but widen it toFunc<SubscribeAlarmsCommand, IMxAccessAlarmConsumer>so tests inject fakes and production builds the failover composite. Subscribe toFailoverAlarmConsumer.ProviderModeChangedand enqueue anOnAlarmProviderModeChangedEventMxEvent via the event queue (new mapper methodCreateOnAlarmProviderModeChanged).
Step 2: Executor + STA wiring. ExecuteSubscribeAlarms passes the full SubscribeAlarmsCommand (not just the expression). In MxAccessStaSession, the alarmCommandHandlerFactory must give the handler access to the IMxAccessComObjectFactory so the subtag source can create its own proxy server on the STA; keep the EnsureOnAlarmConsumerThread affinity guard on every path.
Step 3: Test — fake consumer factory; assert that a SUBTAG forced command builds the subtag consumer and advises; that an auto command building a fake failover composite, when it raises ProviderModeChanged, enqueues an OnAlarmProviderModeChangedEvent on the queue.
Step 4: Build x86 + worker tests.
Step 5: Commit.
git add src/ZB.MOM.WW.MxGateway.Worker/MxAccess/ src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/
git commit -m "worker(alarms): route watch-list/failover config; emit provider-mode-changed event"
Phase 2 — Gateway: discovery, options, monitor, metrics, dashboard
Task 10: AlarmsOptions.Fallback + validation
Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 11, Task 13
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmsOptions.cs - Create:
src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmFallbackOptions.cs - Modify:
src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayOptionsValidator.cs(ValidateAlarms, ~lines 234-258) - Test:
src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs(extend)
Step 1: Add AlarmFallbackOptions Fallback { get; init; } = new(); to AlarmsOptions. AlarmFallbackOptions: string Mode = "Auto" (Auto|ForceAlarmManager|ForceSubtag), int ConsecutiveFailureThreshold = 3, int FailbackProbeIntervalSeconds = 30, int FailbackStableProbes = 3, a Discovery sub-object (bool UseGalaxyRepository = true, string Area = "", string[] IncludeAttributes = [], string[] ExcludeAttributes = []), and a Subtags sub-object (Active="active", Acked="acked", AckComment="", Priority="priority").
Step 2: In ValidateAlarms, when Enabled and Mode == "ForceSubtag" and Discovery.UseGalaxyRepository == false and IncludeAttributes empty ⇒ add a validation error ("ForceSubtag requires Galaxy Repository discovery or an explicit IncludeAttributes list"). Floor the three numeric values at 1. Validate Mode is one of the three literals.
Step 3-5: Test the new validation cases (red→green), build the server, commit.
Task 11: Galaxy Repository "alarm attributes" discovery query
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 10, Task 13
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyRepository.cs(addGetAlarmAttributesAsync+ SQL constant, followingGetAttributesAsync~lines 86-115 andAttributesSql~line 176) - Modify:
src/ZB.MOM.WW.MxGateway.Server/Galaxy/IGalaxyRepository.cs - Create:
src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyAlarmAttributeRow.cs - Test:
src/ZB.MOM.WW.MxGateway.Tests/Galaxy/(projection unit test; live SQL gated)
Step 1: GalaxyAlarmAttributeRow { string FullTagReference; string SourceObjectReference; string AckCommentSubtag; } (and any priority subtag). GetAlarmAttributesAsync reuses the existing is_alarm detection (the AlarmExtension primitive join already in AttributesSql) filtered to is_alarm = 1, projecting the alarm reference + its ack-comment attribute. Follow the exact SqlConnection/SqlCommand/SqlDataReader pattern from GetAttributesAsync.
Step 2: Unit-test the row→AlarmSubtagTarget mapping (a pure mapper function); gate any live-DB test like the existing Galaxy live tests (or Skip with a note, matching AlarmsLiveSmokeTests).
Step 3-5: red→green, build server, commit.
Task 12: Watch-list resolver (GR SQL + config override → AlarmSubtagTarget[])
Classification: standard Estimated implement time: ~4 min Parallelizable with: none (depends on Tasks 10, 11)
Files:
- Create:
src/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmWatchListResolver.cs - Create:
src/ZB.MOM.WW.MxGateway.Server/Alarms/IAlarmWatchListResolver.cs - Test:
src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmWatchListResolverTests.cs
Step 1: Test the merge with a fake IGalaxyRepository:
- discovery rows +
IncludeAttributesare unioned;ExcludeAttributesremoved; each becomes anAlarmSubtagTargetwith.active/.acked/.ackmsgaddresses composed from the configuredSubtagsnames (<reference>.<Active>, etc.); empty config subtag names fall back to defaults; GR unavailable + no includes ⇒ empty list + a logged warning flag.
Step 2: Implement ResolveAsync(AlarmsOptions, CancellationToken) → IReadOnlyList<AlarmSubtagTarget>.
Step 3-5: red→green, build, commit.
Task 13: Gateway metrics — provider-mode gauge + switch counter
Classification: small Estimated implement time: ~3 min Parallelizable with: Task 10, Task 11
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs(ctor ~lines 55-79; add counter + observable gauge following the existing pattern) - Test:
src/ZB.MOM.WW.MxGateway.Tests/Metrics/GatewayMetricsTests.cs(if present; else assert via aMeterListener)
Step 1: Add mxgateway.alarms.provider_switches counter (tagged from,to,reason) and mxgateway.alarms.provider_mode observable gauge (1=alarmmgr, 2=subtag), plus AlarmProviderSwitched(int from, int to, string reason) and a private GetAlarmProviderMode() (lock on _syncRoot like the others).
Step 2-4: test, build, commit.
Task 14: GatewayAlarmMonitor — arm watch-list, reflect provider mode, reconcile on switch
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none (depends on Tasks 9, 12, 13)
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cs(ctor ~41-49;SubscribeAlarmsAsync~210-233; event-drain loop;StreamAsync~386-434) - Test:
src/ZB.MOM.WW.MxGateway.Tests/Alarms/GatewayAlarmMonitorProviderModeTests.cs(new, usingFakeWorkerHarness)
Step 1: Inject IAlarmWatchListResolver and GatewayMetrics. In SubscribeAlarmsAsync, resolve the watch-list and build the SubscribeAlarmsCommand with ForcedMode (from Fallback.Mode), WatchList, and Failover populated from options — instead of the bare { SubscriptionExpression }.
Step 2: In the worker-event drain path, handle OnAlarmProviderModeChangedEvent: update a _providerStatus field (mode/degraded/reason/since), Broadcast(new AlarmFeedMessage { ProviderStatus = … }) to every subscriber, call metrics.AlarmProviderSwitched(...), and force a ReconcileAsync so the cache re-seeds from the now-active provider (avoids raise/clear storms).
Step 3: In StreamAsync, emit the current provider_status as the first message (before the snapshot) so a late joiner immediately knows the mode.
Step 4: Test — stand up the monitor with FakeWorkerHarness; emit an OnAlarmProviderModeChangedEvent(Subtag); assert a StreamAsync subscriber receives a ProviderStatus{ Mode=Subtag, Degraded=true } and that the switch counter incremented. Also assert a transition emitted in subtag mode flows through with Degraded=true.
Step 5: build server, run the new test, commit.
Task 15: Dashboard — push provider status to /hubs/alarms + UI indicator
Classification: standard Estimated implement time: ~5 min Parallelizable with: none (depends on Task 14)
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/AlarmsHubPublisher.cs(forwardProviderStatusmessages — they already flow throughStreamAsync, so confirm the existingSendAsync(AlarmMessage, message)carries them; add a dedicated"ProviderModeChanged"client method if the dashboard needs a distinct channel) - Modify: the alarms dashboard page/component (Bootstrap-only badge: green "alarmmgr" / amber "degraded — subtag") — find under
src/ZB.MOM.WW.MxGateway.Server/Dashboard/ - Test:
src/ZB.MOM.WW.MxGateway.Tests/dashboard model test (e.g. aDashboardAlarmProviderStatus.FromFeedmapper, mirroringDashboardActiveAlarm.FromSnapshot)
Constraint: Bootstrap CSS/JS only — no MudBlazor/Radzen/FluentUI.
Steps: TDD the model mapper, wire the publisher + badge, build, commit.
Phase 3 — Integration, docs, live smoke
Task 16: End-to-end fake-worker failover test
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 18
Files:
- Test:
src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmFailoverEndToEndTests.cs
Drive the full gateway path with FakeWorkerHarness: subscribe (assert the SubscribeAlarmsCommand carries a watch-list), emit a wnwrap-style transition (assert Degraded=false), emit OnAlarmProviderModeChangedEvent(Subtag), emit a synthesized transition (assert Degraded=true, SourceProvider=Subtag), then OnAlarmProviderModeChangedEvent(Alarmmgr) and assert the feed reports recovery. Build, run, commit.
Task 17: Live subtag smoke test (opt-in)
Classification: small Estimated implement time: ~4 min Parallelizable with: Task 18
Files:
- Test:
src/ZB.MOM.WW.MxGateway.IntegrationTests/...AlarmSubtagLiveSmokeTests.cs(or the worker live suite)
A [LiveMxAccessFact], Skip-by-default test (per AlarmsLiveSmokeTests precedent) that, against a live Galaxy + alarm flip script: advises the real .active/.acked subtags via LmxSubtagAlarmSource, asserts a synthesized raise/clear, and performs an ack via the ack-comment write. Document the exact subtag names discovered (resolves the design's open item). Commit.
Task 18: Documentation
Classification: trivial Estimated implement time: ~5 min Parallelizable with: Task 16, Task 17
Files:
- Modify:
gateway.md(alarm provider section: dual provider + auto-failover/failback) - Modify:
docs/DesignDecisions.md(record the fallback decision + parity rationale) - Modify:
docs/GatewayConfiguration.md(theMxGateway:Alarms:Fallbackblock) - Modify:
docs/AlarmClientDiscovery.md(subtag provider, synthesis rules, ack-comment write) - Modify:
docs/Grpc.md(newprovider_statusfeed case +degraded/source_providerfields)
Follow StyleGuide.md (PascalCase filenames, present tense, explain why). No code; commit.
Execution order & parallelism summary
- Serial spine: 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8/9 → 10/11 → 12 → 13 → 14 → 15 → 16 → 17/18.
- Parallelizable clusters: {8, 9 partially}, {10, 11, 13}, {16, 17, 18}.
- High-risk tasks (full review chain): 1, 2, 6, 7, 9, 14. Standard: 4, 5, 8, 10, 11, 12, 15, 16. Small/trivial: 3, 13, 17, 18.
Risk notes for the executor
- Field-number collisions: Task 2 must read the live
MxEvent/MxEventFamilynumbers before adding — the agent map gave alarm-payload maxima but notMxEvent's. Verify before editing. - STA discipline: every COM call in
LmxSubtagAlarmSourceand every consumer swap runs on the worker STA; keep theEnsureOnAlarmConsumerThreadguard. The worker STA already pumps Windows messages, which is required for the subtagOnDataChangeto deliver. - Parity regression: alarmmgr-mode output must be byte-for-byte unchanged. Existing
AlarmDispatcherTestsandProtobufContractRoundTripTestsare the guardrail — they must stay green withDegraded=falsedefaults. - Subtag names unverified: the design leaves exact AVEVA subtag names (
.active,.acked, ack-comment) to confirm againstC:\Users\dohertj2\Desktop\mxaccess+ a live Galaxy (Task 17). The configSubtagsblock exists so names are not hard-coded.