metrics(alarms): expose provider-switch count in snapshot, bound the reason tag

B1: add AlarmProviderSwitchCount to GatewayMetricsSnapshot so the switch total is
readable without scraping the OTEL counter.
B2: replace the free-text reason tag on mxgateway.alarms.provider_switches with a
bounded AlarmProviderSwitchReason enum (failover/failback/unknown); the human-readable
reason stays in the structured log.
This commit is contained in:
Joseph Doherty
2026-06-14 02:33:02 -04:00
parent 5b31e99ab6
commit 56abd64c6c
5 changed files with 47 additions and 8 deletions
@@ -399,7 +399,13 @@ public sealed class GatewayAlarmMonitor : BackgroundService, IGatewayAlarmServic
BroadcastToAll(new AlarmFeedMessage { ProviderStatus = status });
}
_metrics.AlarmProviderSwitched(fromModeInt, ModeToInt(toMode), reason);
AlarmProviderSwitchReason switchReason = toMode switch
{
AlarmProviderMode.Subtag => AlarmProviderSwitchReason.Failover,
AlarmProviderMode.Alarmmgr => AlarmProviderSwitchReason.Failback,
_ => AlarmProviderSwitchReason.Unknown,
};
_metrics.AlarmProviderSwitched(fromModeInt, ModeToInt(toMode), switchReason);
_logger.LogInformation(
"Alarm provider mode changed to {Mode} (degraded={Degraded}): {Reason}",