docs(alarms): clarify resolver cancellation contract; mark design doc superseded

C6b: IAlarmWatchListResolver.ResolveAsync doc now notes that while discovery being
unavailable never throws, a triggered cancellation token still propagates.
C7: annotate the original design doc where it drifted from the shipped code — metric
names / unimplemented watch-list gauges, and the proto-type location (gateway proto, not
worker proto).
This commit is contained in:
Joseph Doherty
2026-06-14 02:33:14 -04:00
parent 5573f2a229
commit 37aadf72b3
2 changed files with 19 additions and 3 deletions
@@ -1,7 +1,10 @@
# Alarm Subtag-Monitoring Fallback — Design
**Date:** 2026-06-13
**Status:** Approved (brainstorming), ready for implementation planning
**Status:** Superseded by implementation (merged to `main`). This is the original
brainstorming design; a few details below were refined during implementation —
see the inline **Superseded** notes. The shipped behaviour is documented in
`docs/AlarmClientDiscovery.md`, the client READMEs, and the contracts.
**Branch:** `feat/alarm-subtag-fallback`
## Problem
@@ -162,6 +165,11 @@ reconcile cadence and pushes an updated watch-list when the model changes.
**`mxaccess_worker.proto`:**
> **Superseded:** these additions shipped in `mxaccess_gateway.proto`, not
> `mxaccess_worker.proto` — the worker imports the gateway proto and the alarm
> commands/events live there (`AlarmSubtagTarget`,
> `OnAlarmProviderModeChangedEvent`, the extended subscribe command).
- Extend the alarm-subscribe command with: `AlarmProviderMode forced_mode`
(`UNSPECIFIED` = auto), `int32 consecutive_failure_threshold`,
`int32 failback_probe_interval_seconds`, `int32 failback_stable_probes`, and
@@ -240,6 +248,12 @@ to `/hubs/alarms`, (c) update metrics, (d) force a reconcile.
- `mxgateway_alarm_provider_switch_total{from,to,reason}` (counter)
- `mxgateway_alarm_fallback_watchlist_size` (gauge)
> **Superseded:** the shipped meter names are `mxgateway.alarms.provider_mode`
> (gauge) and `mxgateway.alarms.provider_switches{from,to,reason}` (counter,
> `reason` bounded to `failover`/`failback`/`unknown`). The watch-list-size /
> watch-list-empty gauges were not implemented; an empty watch-list is surfaced
> via a warning log and the feed's degraded `ProviderStatus` instead.
## Configuration
```jsonc
@@ -19,8 +19,10 @@ public interface IAlarmWatchListResolver
/// <param name="cancellationToken">Token to cancel the asynchronous operation.</param>
/// <returns>
/// The resolved <see cref="AlarmSubtagTarget"/> watch-list, possibly empty.
/// Discovery being unavailable never throws; the caller decides what to do
/// with an empty list.
/// Discovery being unavailable never throws — it yields an empty (or
/// config-only) list and the caller decides what to do with it. Cancellation
/// is the one exception: a triggered <paramref name="cancellationToken"/>
/// still propagates an <see cref="OperationCanceledException"/>.
/// </returns>
Task<IReadOnlyList<AlarmSubtagTarget>> ResolveAsync(
AlarmsOptions options,