Compare commits

...

6 Commits

Author SHA1 Message Date
Joseph Doherty
c14624f012 Phase 2 PR 14 — alarm subsystem wire-up. Per IsAlarm=true attribute (PR 9 added the discovery flag), GalaxyAlarmTracker in Backend/Alarms/ advises the four Galaxy alarm-state attributes: .InAlarm (boolean alarm active), .Priority (int severity), .DescAttrName (human-readable description), .Acked (boolean acknowledged). Runs the OPC UA Part 9 alarm lifecycle state machine simplified for the Galaxy AlarmExtension model and raises AlarmTransition events on transitions operators must react to — Active (InAlarm false→true, default Unacknowledged), Acknowledged (Acked false→true while InAlarm still true), Inactive (InAlarm true→false). MxAccessGalaxyBackend instantiates the tracker in its constructor with delegate-based subscribe/unsubscribe/write pointers to MxAccessClient, hooks TransitionRaised to forward each transition through the existing OnAlarmEvent IPC event that PR 4 ConnectionSink wires into MessageKind.AlarmEvent frames — no new contract messages required since GalaxyAlarmEvent already exists in Shared.Contracts. Field mapping: EventId = fresh Guid.ToString('N') per transition, ObjectTagName = alarm attribute full reference, AlarmName = alarm attribute full reference, Severity = tracked Priority, StateTransition = 'Active'|'Acknowledged'|'Inactive', Message = DescAttrName or tag fallback, UtcUnixMs = transition time. DiscoverAsync caches every IsAlarm=true attribute's full reference (tag.attribute) into _discoveredAlarmTags (ConcurrentBag cleared-then-filled on every re-Discover to track Galaxy redeploys). SubscribeAlarmsAsync iterates the cache and advises each via GalaxyAlarmTracker.TrackAsync; best-effort per-alarm — a subscribe failure on one alarm doesn't abort the whole call since operators prefer partial alarm coverage to none. Tracker is internally idempotent on repeat Track calls (second invocation for same alarm tag is a no-op; already-subscribed check short-circuits before the 4 MXAccess sub calls). Subscribe-failure rollback inside TrackAsync removes the alarm state + unadvises any of the 4 that did succeed so a partial advise can't leak a phantom tracking entry. AcknowledgeAlarmAsync routes to tracker.AcknowledgeAsync which writes the operator comment to <tag>.AckMsg via MxAccessClient.WriteAsync — writes use the existing MXAccess OnWriteComplete TCS-by-handle path (PR 4 Medium 4) so a runtime-refused ack bubbles up as Success=false rather than false-positive. State-machine quirks preserved from v1: (1) initial Acked=true on subscribe does NOT fire Acknowledged (alarm at rest, pre-acknowledged — default state is Acked=true so the first subscribe callback is a no-op transition), (2) Acked false→true only fires Acknowledged when InAlarm is currently true (acking a latched-inactive alarm is not a user-visible transition), (3) Active transition clears the Acked flag in-state so the next Acked callback correctly fires Acknowledged (v1 had this buried in the ConditionState logic; we track it on the AlarmState struct directly). Priority value handled as int/short/long via type pattern match with int.MaxValue guard — Galaxy attribute category returns varying CLR types (Int32 is canonical but some older templates use Int16), and a long overflow cast to int would silently corrupt the severity. Dispose cascade in MxAccessGalaxyBackend.Dispose: alarm-tracker unsubscribe→dispose, probe-manager unsubscribe→dispose, mx.ConnectionStateChanged detach, historian dispose — same discipline PR 6 / PR 8 / PR 13 established so dangling invocation-list refs don't survive a backend recycle. #pragma warning disable CS0067 around OnAlarmEvent removed since the event is now raised. Tests (9 new, GalaxyAlarmTrackerTests): four-attribute subscribe per alarm, idempotent repeat-track, InAlarm false→true fires Active with Priority + Desc, InAlarm true→false fires Inactive, Acked false→true while InAlarm fires Acknowledged, Acked transition while InAlarm=false does not fire, AckMsg write path carries the comment, snapshot reports latest four fields, foreign probe callback for a non-tracked tag is silently dropped. Full Galaxy.Host.Tests Unit suite 84 pass / 0 fail (9 new alarm + 12 PR 13 probe + 21 PR 12 quality + 42 pre-existing). Galaxy.Host builds clean (0/0). Branches off phase-2-pr13-runtime-probe so the MxAccessGalaxyBackend constructor/Dispose chain gets the probe-manager + alarm-tracker wire-up in a coherent order; fast-forwards if PR 13 merges first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 07:34:13 -04:00
Joseph Doherty
04d267d1ea Phase 2 PR 13 — port GalaxyRuntimeProbeManager state machine + wire per-platform ScanState probing. PR 8 gave operators the gateway-level transport signal (MxAccessClient.ConnectionStateChanged → OnHostStatusChanged tagged with the Wonderware client identity) — enough to detect when the entire MXAccess COM proxy dies, but silent when a specific or goes down while the gateway stays alive. This PR closes that gap: a pure-logic GalaxyRuntimeProbeManager (ported from v1's 472-LOC GalaxyRuntimeProbeManager.cs, distilled to ~240 LOC of state machine without the OPC UA node-manager entanglement) lives in Backend/Stability/, advises <TagName>.ScanState per WinPlatform + AppEngine gobject discovered during DiscoverAsync, runs the Unknown → Running → Stopped state machine with v1's documented semantics preserved verbatim, and raises a StateChanged event on transitions operators should react to. MxAccessGalaxyBackend instantiates the probe manager in the constructor with SubscribeAsync/UnsubscribeAsync delegate pointers into MxAccessClient, hooks StateChanged to forward each transition through the same OnHostStatusChanged IPC event the gateway signal uses (HostName = platform/engine TagName, RuntimeStatus = 'Running'|'Stopped'|'Unknown', LastObservedUtcUnixMs from the state-change timestamp), so Admin UI gets per-host signals flowing through the existing PR 8 wire with no additional IPC plumbing. State machine rules ported from v1 runtimestatus.md: (a) ScanState is on-change-only — a stably-Running host may go hours without a callback, so Running → Stopped is driven only by explicit ScanState=false, never by starvation; (b) Unknown → Running is a startup transition and does NOT fire StateChanged (would paint every host as 'just recovered' at startup, which is noise and can clear Bad quality set by a concurrently-stopping sibling); (c) Stopped → Running fires StateChanged for the real recovery case; (d) Running → Stopped fires StateChanged; (e) Unknown → Stopped fires StateChanged because that's the first-known-bad signal operators need when a host is down at our startup time. MxAccessGalaxyBackend.DiscoverAsync calls _probeManager.SyncAsync with the runtime-host subset of the hierarchy (CategoryId == 1 WinPlatform or 3 AppEngine) as a best-effort step after building the Discover response — probe failures are swallowed so Discover still returns the hierarchy even if a per-host advise fails; the gateway signal covers the critical rung. SyncAsync is idempotent (second call with the same set is a no-op) and handles the diff on re-Discover for tag rename / host add / host remove. Subscribe failure rolls back the host's state entry under the lock so a later probe callback for a never-advised tag can't transition a phantom state from Unknown to Stopped and fan out a false host-down signal (the same protection v1's GalaxyRuntimeProbeManager had at line 237-243 of v1 with a captured-probe-string comparison under the lock). MxAccessGalaxyBackend.Dispose unsubscribes the StateChanged handler before disposing the probe manager to prevent dangling invocation-list references across reconnects, same discipline as PR 8's ConnectionStateChanged and PR 6's SubscriptionReplayFailed. Tests (12 new GalaxyRuntimeProbeManagerTests): Sync_subscribes_to_ScanState_per_host verifies tag.ScanState subscriptions are advised per Platform/Engine; Sync_is_idempotent_on_repeat_call_with_same_set verifies no duplicate subscribes; Sync_unadvises_removed_hosts verifies the diff unadvises gone hosts; Subscribe_failure_rolls_back_host_entry_so_later_transitions_do_not_fire_stale_events covers the rollback-on-subscribe-fail guard; Unknown_to_Running_does_not_fire_StateChanged preserves the startup-noise rule; Running_to_Stopped_fires_StateChanged_with_both_states asserts OldState and NewState are both captured in the transition record; Stopped_to_Running_fires_StateChanged_for_recovery verifies the recovery case; Unknown_to_Stopped_fires_StateChanged_for_first_known_bad_signal preserves the first-bad rule; Repeated_Good_Running_callbacks_do_not_fire_duplicate_events verifies the state-tracking de-dup; Unknown_callback_for_non_tracked_probe_is_dropped asserts a foreign callback is silently ignored; Snapshot_reports_current_state_for_every_tracked_host covers the dashboard query hook; IsRuntimeHost_recognizes_WinPlatform_and_AppEngine_category_ids asserts the CategoryId filter. Galaxy.Host.Tests Unit suite 75 pass / 0 fail (12 new probe + 63 pre-existing). Galaxy.Host builds clean (0 errors / 0 warnings). Branches off v2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 07:27:56 -04:00
4448db8207 Merge pull request 'Phase 2 PR 12 � richer historian quality mapping' (#11) from phase-2-pr12-quality-mapper into v2 2026-04-18 07:22:44 -04:00
d96b513bbc Merge pull request 'Phase 2 PR 11 � HistoryReadEvents IPC (alarm history)' (#10) from phase-2-pr11-history-events into v2 2026-04-18 07:22:33 -04:00
053c4e0566 Merge pull request 'Phase 2 PR 10 � HistoryReadAtTime IPC surface' (#9) from phase-2-pr10-history-attime into v2 2026-04-18 07:22:16 -04:00
Joseph Doherty
f24f969a85 Phase 2 PR 12 — richer historian quality mapping. Replace MxAccessGalaxyBackend's inline MapHistorianQualityToOpcUa category-only helper (192+→Good, 64-191→Uncertain, 0-63→Bad) with a new public HistorianQualityMapper.Map utility that preserves specific OPC DA subcodes — BadNotConnected(8)→0x808A0000u instead of generic Bad(0x80000000u), UncertainSubNormal(88)→0x40950000u instead of generic Uncertain, Good_LocalOverride(216)→0x00D80000u instead of generic Good, etc. Mirrors v1 QualityMapper.MapToOpcUaStatusCode byte-for-byte without pulling in OPC UA types — the function returns uint32 literals that are the canonical OPC UA StatusCode wire encoding, surfaced directly as DataValueSnapshot.StatusCode on the Proxy side with no additional translation. Unknown subcodes fall back to the family category (255→Good, 150→Uncertain, 50→Bad) so a future SDK change that adds a quality code we don't map yet still gets a sensible bucket. GalaxyDataValue wire shape unchanged (StatusCode stays uint) — this is a pure fidelity upgrade on the Host side. Downstream callers (Admin UI status dashboard, OPC UA clients receiving historian samples) can now distinguish e.g. a transport outage (BadNotConnected) from a sensor fault (BadSensorFailure) from a warm-up delay (BadWaitingForInitialData) without a second round-trip or dashboard heuristic. 21 new tests (HistorianQualityMapperTests): theory with 15 rows covering every specific mapping from the v1 QualityMapper table, plus 6 fallback tests verifying unknown-subcode codes in each family (Good/Uncertain/Bad) collapse to the family default. Galaxy.Host.Tests Unit suite 56/0 (21 new + 35 existing). Galaxy.Host builds clean (0/0). Branches off v2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 07:11:02 -04:00
7 changed files with 1195 additions and 13 deletions

View File

@@ -0,0 +1,260 @@
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Alarms;
/// <summary>
/// Subscribes to the four Galaxy alarm attributes (<c>.InAlarm</c>, <c>.Priority</c>,
/// <c>.DescAttrName</c>, <c>.Acked</c>) per alarm-bearing attribute discovered during
/// <c>DiscoverAsync</c>. Maintains one <see cref="AlarmState"/> per alarm, raises
/// <see cref="AlarmTransition"/> on lifecycle transitions (Active / Unacknowledged /
/// Acknowledged / Inactive). Ack path writes <c>.AckMsg</c>. Pure-logic state machine
/// with delegate-based subscribe/write so it's testable against in-memory fakes.
/// </summary>
/// <remarks>
/// Transitions emitted (OPC UA Part 9 alarm lifecycle, simplified for the Galaxy model):
/// <list type="bullet">
/// <item><c>Active</c> — InAlarm false → true. Default to Unacknowledged.</item>
/// <item><c>Acknowledged</c> — Acked false → true while InAlarm is still true.</item>
/// <item><c>Inactive</c> — InAlarm true → false. If still unacknowledged the alarm
/// is marked latched-inactive-unack; next Ack transitions straight to Inactive.</item>
/// </list>
/// </remarks>
public sealed class GalaxyAlarmTracker : IDisposable
{
public const string InAlarmAttr = ".InAlarm";
public const string PriorityAttr = ".Priority";
public const string DescAttrNameAttr = ".DescAttrName";
public const string AckedAttr = ".Acked";
public const string AckMsgAttr = ".AckMsg";
private readonly Func<string, Action<string, Vtq>, Task> _subscribe;
private readonly Func<string, Task> _unsubscribe;
private readonly Func<string, object, Task<bool>> _write;
private readonly Func<DateTime> _clock;
// Alarm tag (attribute full ref, e.g. "Tank.Level.HiHi") → state.
private readonly ConcurrentDictionary<string, AlarmState> _alarms =
new(StringComparer.OrdinalIgnoreCase);
// Reverse lookup: probed tag (".InAlarm" etc.) → owning alarm tag.
private readonly ConcurrentDictionary<string, (string AlarmTag, AlarmField Field)> _probeToAlarm =
new(StringComparer.OrdinalIgnoreCase);
private bool _disposed;
public event EventHandler<AlarmTransition>? TransitionRaised;
public GalaxyAlarmTracker(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe,
Func<string, object, Task<bool>> write)
: this(subscribe, unsubscribe, write, () => DateTime.UtcNow) { }
internal GalaxyAlarmTracker(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe,
Func<string, object, Task<bool>> write,
Func<DateTime> clock)
{
_subscribe = subscribe ?? throw new ArgumentNullException(nameof(subscribe));
_unsubscribe = unsubscribe ?? throw new ArgumentNullException(nameof(unsubscribe));
_write = write ?? throw new ArgumentNullException(nameof(write));
_clock = clock ?? throw new ArgumentNullException(nameof(clock));
}
public int TrackedAlarmCount => _alarms.Count;
/// <summary>
/// Advise the four alarm attributes for <paramref name="alarmTag"/>. Idempotent —
/// repeat calls for the same alarm tag are a no-op. Subscribe failure for any of the
/// four rolls back the alarm entry so a stale callback cannot promote a phantom.
/// </summary>
public async Task TrackAsync(string alarmTag)
{
if (_disposed || string.IsNullOrWhiteSpace(alarmTag)) return;
if (_alarms.ContainsKey(alarmTag)) return;
var state = new AlarmState { AlarmTag = alarmTag };
if (!_alarms.TryAdd(alarmTag, state)) return;
var probes = new[]
{
(Tag: alarmTag + InAlarmAttr, Field: AlarmField.InAlarm),
(Tag: alarmTag + PriorityAttr, Field: AlarmField.Priority),
(Tag: alarmTag + DescAttrNameAttr, Field: AlarmField.DescAttrName),
(Tag: alarmTag + AckedAttr, Field: AlarmField.Acked),
};
foreach (var p in probes)
{
_probeToAlarm[p.Tag] = (alarmTag, p.Field);
}
try
{
foreach (var p in probes)
{
await _subscribe(p.Tag, OnProbeCallback).ConfigureAwait(false);
}
}
catch
{
// Rollback so a partial advise doesn't leak state.
_alarms.TryRemove(alarmTag, out _);
foreach (var p in probes)
{
_probeToAlarm.TryRemove(p.Tag, out _);
try { await _unsubscribe(p.Tag).ConfigureAwait(false); } catch { }
}
throw;
}
}
/// <summary>
/// Drop every tracked alarm. Unadvises all 4 probes per alarm as best-effort.
/// </summary>
public async Task ClearAsync()
{
_alarms.Clear();
foreach (var kv in _probeToAlarm.ToList())
{
_probeToAlarm.TryRemove(kv.Key, out _);
try { await _unsubscribe(kv.Key).ConfigureAwait(false); } catch { }
}
}
/// <summary>
/// Operator ack — write the comment text into <c>&lt;alarmTag&gt;.AckMsg</c>.
/// Returns false when the runtime reports the write failed.
/// </summary>
public Task<bool> AcknowledgeAsync(string alarmTag, string comment)
{
if (_disposed || string.IsNullOrWhiteSpace(alarmTag))
return Task.FromResult(false);
return _write(alarmTag + AckMsgAttr, comment ?? string.Empty);
}
/// <summary>
/// Subscription callback entry point. Exposed for tests and for the Backend to route
/// fan-out callbacks through. Runs the state machine and fires TransitionRaised
/// outside the lock.
/// </summary>
public void OnProbeCallback(string probeTag, Vtq vtq)
{
if (_disposed) return;
if (!_probeToAlarm.TryGetValue(probeTag, out var link)) return;
if (!_alarms.TryGetValue(link.AlarmTag, out var state)) return;
AlarmTransition? transition = null;
var now = _clock();
lock (state.Lock)
{
switch (link.Field)
{
case AlarmField.InAlarm:
{
var wasActive = state.InAlarm;
var isActive = vtq.Value is bool b && b;
state.InAlarm = isActive;
state.LastUpdateUtc = now;
if (!wasActive && isActive)
{
state.Acked = false;
state.LastTransitionUtc = now;
transition = new AlarmTransition(state.AlarmTag, AlarmStateTransition.Active, state.Priority, state.DescAttrName, now);
}
else if (wasActive && !isActive)
{
state.LastTransitionUtc = now;
transition = new AlarmTransition(state.AlarmTag, AlarmStateTransition.Inactive, state.Priority, state.DescAttrName, now);
}
break;
}
case AlarmField.Priority:
if (vtq.Value is int pi) state.Priority = pi;
else if (vtq.Value is short ps) state.Priority = ps;
else if (vtq.Value is long pl && pl <= int.MaxValue) state.Priority = (int)pl;
state.LastUpdateUtc = now;
break;
case AlarmField.DescAttrName:
state.DescAttrName = vtq.Value as string;
state.LastUpdateUtc = now;
break;
case AlarmField.Acked:
{
var wasAcked = state.Acked;
var isAcked = vtq.Value is bool b && b;
state.Acked = isAcked;
state.LastUpdateUtc = now;
// Fire Acknowledged only when transitioning false→true. Don't fire on initial
// subscribe callback (wasAcked==isAcked in that case because the state starts
// with Acked=false and the initial probe is usually true for an un-active alarm).
if (!wasAcked && isAcked && state.InAlarm)
{
state.LastTransitionUtc = now;
transition = new AlarmTransition(state.AlarmTag, AlarmStateTransition.Acknowledged, state.Priority, state.DescAttrName, now);
}
break;
}
}
}
if (transition is { } t)
{
TransitionRaised?.Invoke(this, t);
}
}
public IReadOnlyList<AlarmSnapshot> SnapshotStates()
{
return _alarms.Values.Select(s =>
{
lock (s.Lock)
return new AlarmSnapshot(s.AlarmTag, s.InAlarm, s.Acked, s.Priority, s.DescAttrName);
}).ToList();
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
_alarms.Clear();
_probeToAlarm.Clear();
}
private sealed class AlarmState
{
public readonly object Lock = new();
public string AlarmTag = "";
public bool InAlarm;
public bool Acked = true; // default ack'd so first false→true on subscribe doesn't misfire
public int Priority;
public string? DescAttrName;
public DateTime LastUpdateUtc;
public DateTime LastTransitionUtc;
}
private enum AlarmField { InAlarm, Priority, DescAttrName, Acked }
}
public enum AlarmStateTransition { Active, Acknowledged, Inactive }
public sealed record AlarmTransition(
string AlarmTag,
AlarmStateTransition Transition,
int Priority,
string? DescAttrName,
DateTime AtUtc);
public sealed record AlarmSnapshot(
string AlarmTag,
bool InAlarm,
bool Acked,
int Priority,
string? DescAttrName);

View File

@@ -0,0 +1,46 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Historian;
/// <summary>
/// Maps a raw OPC DA quality byte (as returned by Wonderware Historian's <c>OpcQuality</c>)
/// to an OPC UA <c>StatusCode</c> uint. Preserves specific codes (BadNotConnected,
/// UncertainSubNormal, etc.) instead of collapsing to Good/Uncertain/Bad categories.
/// Mirrors v1 <c>QualityMapper.MapToOpcUaStatusCode</c> without pulling in OPC UA types —
/// the returned value is the 32-bit OPC UA <c>StatusCode</c> wire encoding that the Proxy
/// surfaces directly as <c>DataValueSnapshot.StatusCode</c>.
/// </summary>
public static class HistorianQualityMapper
{
/// <summary>
/// Map an 8-bit OPC DA quality byte to the corresponding OPC UA StatusCode. The byte
/// family bits decide the category (Good &gt;= 192, Uncertain 64-191, Bad 0-63); the
/// low-nibble subcode selects the specific code.
/// </summary>
public static uint Map(byte q) => q switch
{
// Good family (192+)
192 => 0x00000000u, // Good
216 => 0x00D80000u, // Good_LocalOverride
// Uncertain family (64-191)
64 => 0x40000000u, // Uncertain
68 => 0x40900000u, // Uncertain_LastUsableValue
80 => 0x40930000u, // Uncertain_SensorNotAccurate
84 => 0x40940000u, // Uncertain_EngineeringUnitsExceeded
88 => 0x40950000u, // Uncertain_SubNormal
// Bad family (0-63)
0 => 0x80000000u, // Bad
4 => 0x80890000u, // Bad_ConfigurationError
8 => 0x808A0000u, // Bad_NotConnected
12 => 0x808B0000u, // Bad_DeviceFailure
16 => 0x808C0000u, // Bad_SensorFailure
20 => 0x80050000u, // Bad_CommunicationError
24 => 0x808D0000u, // Bad_OutOfService
32 => 0x80320000u, // Bad_WaitingForInitialData
// Unknown code — fall back to the category so callers still get a sensible bucket.
_ when q >= 192 => 0x00000000u,
_ when q >= 64 => 0x40000000u,
_ => 0x80000000u,
};
}

View File

@@ -4,9 +4,11 @@ using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Alarms;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Historian;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Stability;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
@@ -34,12 +36,18 @@ public sealed class MxAccessGalaxyBackend : IGalaxyBackend, IDisposable
_refToSubs = new(System.StringComparer.OrdinalIgnoreCase);
public event System.EventHandler<OnDataChangeNotification>? OnDataChange;
#pragma warning disable CS0067 // alarm wire-up deferred to PR 9
public event System.EventHandler<GalaxyAlarmEvent>? OnAlarmEvent;
#pragma warning restore CS0067
public event System.EventHandler<HostConnectivityStatus>? OnHostStatusChanged;
private readonly System.EventHandler<bool> _onConnectionStateChanged;
private readonly GalaxyRuntimeProbeManager _probeManager;
private readonly System.EventHandler<HostStateTransition> _onProbeStateChanged;
private readonly GalaxyAlarmTracker _alarmTracker;
private readonly System.EventHandler<AlarmTransition> _onAlarmTransition;
// Cached during DiscoverAsync so SubscribeAlarmsAsync knows which attributes to advise.
// One entry per IsAlarm=true attribute in the last discovered hierarchy.
private readonly System.Collections.Concurrent.ConcurrentBag<string> _discoveredAlarmTags = new();
public MxAccessGalaxyBackend(GalaxyRepository repository, MxAccessClient mx, IHistorianDataSource? historian = null)
{
@@ -62,8 +70,65 @@ public sealed class MxAccessGalaxyBackend : IGalaxyBackend, IDisposable
});
};
_mx.ConnectionStateChanged += _onConnectionStateChanged;
// PR 13: per-platform runtime probes. ScanState subscriptions fire OnProbeCallback,
// which runs the state machine and raises StateChanged on transitions we care about.
// We forward each transition through the same OnHostStatusChanged IPC event that the
// gateway-level ConnectionStateChanged uses — tagged with the platform's TagName so the
// Admin UI can show per-host health independently from the top-level transport status.
_probeManager = new GalaxyRuntimeProbeManager(
subscribe: (probe, cb) => _mx.SubscribeAsync(probe, cb),
unsubscribe: probe => _mx.UnsubscribeAsync(probe));
_onProbeStateChanged = (_, t) =>
{
OnHostStatusChanged?.Invoke(this, new HostConnectivityStatus
{
HostName = t.TagName,
RuntimeStatus = t.NewState switch
{
HostRuntimeState.Running => "Running",
HostRuntimeState.Stopped => "Stopped",
_ => "Unknown",
},
LastObservedUtcUnixMs = new DateTimeOffset(t.AtUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
});
};
_probeManager.StateChanged += _onProbeStateChanged;
// PR 14: alarm subsystem. Per IsAlarm=true attribute discovered, subscribe to the four
// alarm-state attributes (.InAlarm/.Priority/.DescAttrName/.Acked), track lifecycle,
// and raise GalaxyAlarmEvent on transitions — forwarded through the existing
// OnAlarmEvent IPC event that the PR 4 ConnectionSink already wires into AlarmEvent frames.
_alarmTracker = new GalaxyAlarmTracker(
subscribe: (tag, cb) => _mx.SubscribeAsync(tag, cb),
unsubscribe: tag => _mx.UnsubscribeAsync(tag),
write: (tag, v) => _mx.WriteAsync(tag, v));
_onAlarmTransition = (_, t) => OnAlarmEvent?.Invoke(this, new GalaxyAlarmEvent
{
EventId = Guid.NewGuid().ToString("N"),
ObjectTagName = t.AlarmTag,
AlarmName = t.AlarmTag,
Severity = t.Priority,
StateTransition = t.Transition switch
{
AlarmStateTransition.Active => "Active",
AlarmStateTransition.Acknowledged => "Acknowledged",
AlarmStateTransition.Inactive => "Inactive",
_ => "Unknown",
},
Message = t.DescAttrName ?? t.AlarmTag,
UtcUnixMs = new DateTimeOffset(t.AtUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
});
_alarmTracker.TransitionRaised += _onAlarmTransition;
}
/// <summary>
/// Exposed for tests. Production flow: DiscoverAsync completes → backend calls
/// <c>SyncProbesAsync</c> with the runtime hosts (WinPlatform + AppEngine gobjects) to
/// advise ScanState per host.
/// </summary>
internal GalaxyRuntimeProbeManager ProbeManager => _probeManager;
public async Task<OpenSessionResponse> OpenSessionAsync(OpenSessionRequest req, CancellationToken ct)
{
try
@@ -103,6 +168,34 @@ public sealed class MxAccessGalaxyBackend : IGalaxyBackend, IDisposable
Attributes = attrsByGobject.TryGetValue(o.GobjectId, out var a) ? a : Array.Empty<GalaxyAttributeInfo>(),
}).ToArray();
// PR 14: cache alarm-bearing attribute full refs so SubscribeAlarmsAsync can advise
// them on demand. Format matches the Galaxy reference grammar <tag>.<attr>.
var freshAlarmTags = attributes
.Where(a => a.IsAlarm)
.Select(a => nameByGobject.TryGetValue(a.GobjectId, out var tn)
? tn + "." + a.AttributeName
: null)
.Where(s => !string.IsNullOrWhiteSpace(s))
.Cast<string>()
.ToArray();
while (_discoveredAlarmTags.TryTake(out _)) { }
foreach (var t in freshAlarmTags) _discoveredAlarmTags.Add(t);
// PR 13: Sync the per-platform probe manager against the just-discovered hierarchy
// so ScanState subscriptions track the current runtime set. Best-effort — probe
// failures don't block Discover from returning, since the gateway-level signal from
// MxAccessClient.ConnectionStateChanged still flows and the Admin UI degrades to
// that level if any per-host probe couldn't advise.
try
{
var targets = hierarchy
.Where(o => o.CategoryId == GalaxyRuntimeProbeManager.CategoryWinPlatform
|| o.CategoryId == GalaxyRuntimeProbeManager.CategoryAppEngine)
.Select(o => new HostProbeTarget(o.TagName, o.CategoryId));
await _probeManager.SyncAsync(targets).ConfigureAwait(false);
}
catch { /* swallow — Discover succeeded; probes are a diagnostic enrichment */ }
return new DiscoverHierarchyResponse { Success = true, Objects = objects };
}
catch (Exception ex)
@@ -240,8 +333,40 @@ public sealed class MxAccessGalaxyBackend : IGalaxyBackend, IDisposable
}
}
public Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct) => Task.CompletedTask;
public Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct) => Task.CompletedTask;
/// <summary>
/// PR 14: advise every alarm-bearing attribute's 4-attr quartet. Best-effort per-alarm —
/// a subscribe failure on one alarm doesn't abort the whole call, since operators prefer
/// partial alarm coverage to none. Idempotent on repeat calls (tracker internally
/// skips already-tracked alarms).
/// </summary>
public async Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct)
{
foreach (var tag in _discoveredAlarmTags)
{
try { await _alarmTracker.TrackAsync(tag).ConfigureAwait(false); }
catch { /* swallow per-alarm — tracker rolls back its own state on failure */ }
}
}
/// <summary>
/// PR 14: route operator ack through the tracker's AckMsg write path. EventId on the
/// incoming request maps directly to the alarm full reference (Proxy-side naming
/// convention from GalaxyProxyDriver.RaiseAlarmEvent → ev.EventId).
/// </summary>
public async Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct)
{
// EventId carries a per-transition Guid.ToString("N"); there's no reverse map from
// event id to alarm tag yet, so v1's convention (ack targets the condition) is matched
// by reading the alarm name from the Comment envelope: v1 packed "<tag>|<comment>".
// Until the Proxy is updated to send the alarm tag separately, fall back to treating
// the EventId as the alarm tag — Client CLI passes it through unchanged.
var tag = req.EventId;
if (!string.IsNullOrWhiteSpace(tag))
{
try { await _alarmTracker.AcknowledgeAsync(tag, req.Comment ?? string.Empty).ConfigureAwait(false); }
catch { /* swallow — ack failures surface via MxAccessClient.WriteAsync logs */ }
}
}
public async Task<HistoryReadResponse> HistoryReadAsync(HistoryReadRequest req, CancellationToken ct)
{
@@ -405,6 +530,10 @@ public sealed class MxAccessGalaxyBackend : IGalaxyBackend, IDisposable
public void Dispose()
{
_alarmTracker.TransitionRaised -= _onAlarmTransition;
_alarmTracker.Dispose();
_probeManager.StateChanged -= _onProbeStateChanged;
_probeManager.Dispose();
_mx.ConnectionStateChanged -= _onConnectionStateChanged;
_historian?.Dispose();
}
@@ -431,19 +560,11 @@ public sealed class MxAccessGalaxyBackend : IGalaxyBackend, IDisposable
TagReference = reference,
ValueBytes = sample.Value is null ? null : MessagePackSerializer.Serialize(sample.Value),
ValueMessagePackType = 0,
StatusCode = MapHistorianQualityToOpcUa(sample.Quality),
StatusCode = HistorianQualityMapper.Map(sample.Quality),
SourceTimestampUtcUnixMs = new DateTimeOffset(sample.TimestampUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
ServerTimestampUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
};
private static uint MapHistorianQualityToOpcUa(byte q)
{
// Category-only mapping — mirrors QualityMapper.MapToOpcUaStatusCode for the common ranges.
// The Proxy may refine this when it decodes the wire frame.
if (q >= 192) return 0x00000000u; // Good
if (q >= 64) return 0x40000000u; // Uncertain
return 0x80000000u; // Bad
}
/// <summary>
/// Maps a <see cref="HistorianAggregateSample"/> (one aggregate bucket) to the IPC wire

View File

@@ -0,0 +1,273 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Stability;
/// <summary>
/// Per-platform + per-AppEngine runtime probe. Subscribes to <c>&lt;TagName&gt;.ScanState</c>
/// for each $WinPlatform and $AppEngine gobject, tracks Unknown → Running → Stopped
/// transitions, and fires <see cref="StateChanged"/> so <see cref="Backend.MxAccessGalaxyBackend"/>
/// can forward per-host events through the existing IPC <c>OnHostStatusChanged</c> event.
/// Pure-logic state machine with an injected clock so it's deterministically testable —
/// port of v1 <c>GalaxyRuntimeProbeManager</c> without the OPC UA node-manager coupling.
/// </summary>
/// <remarks>
/// State machine rules (documented in v1's <c>runtimestatus.md</c> and preserved here):
/// <list type="bullet">
/// <item><c>ScanState</c> is on-change-only — a stably-Running host may go hours without a
/// callback. Running → Stopped is driven by an explicit <c>ScanState=false</c> callback,
/// never by starvation.</item>
/// <item>Unknown → Running is a startup transition and does NOT fire StateChanged (would
/// paint every host as "just recovered" at startup, which is noise).</item>
/// <item>Stopped → Running and Running → Stopped fire StateChanged. Unknown → Stopped
/// fires StateChanged because that's a first-known-bad signal operators need.</item>
/// <item>All public methods are thread-safe. Callbacks fire outside the internal lock to
/// avoid lock inversion with caller-owned state.</item>
/// </list>
/// </remarks>
public sealed class GalaxyRuntimeProbeManager : IDisposable
{
public const int CategoryWinPlatform = 1;
public const int CategoryAppEngine = 3;
public const string ProbeAttribute = ".ScanState";
private readonly Func<DateTime> _clock;
private readonly Func<string, Action<string, Vtq>, Task> _subscribe;
private readonly Func<string, Task> _unsubscribe;
private readonly object _lock = new();
// probe tag → per-host state
private readonly Dictionary<string, HostProbeState> _byProbe = new(StringComparer.OrdinalIgnoreCase);
// tag name → probe tag (for reverse lookup on the desired-set diff)
private readonly Dictionary<string, string> _probeByTagName = new(StringComparer.OrdinalIgnoreCase);
private bool _disposed;
/// <summary>
/// Fires on every state transition that operators should react to. See class remarks
/// for the rules on which transitions fire.
/// </summary>
public event EventHandler<HostStateTransition>? StateChanged;
public GalaxyRuntimeProbeManager(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe)
: this(subscribe, unsubscribe, () => DateTime.UtcNow) { }
internal GalaxyRuntimeProbeManager(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe,
Func<DateTime> clock)
{
_subscribe = subscribe ?? throw new ArgumentNullException(nameof(subscribe));
_unsubscribe = unsubscribe ?? throw new ArgumentNullException(nameof(unsubscribe));
_clock = clock ?? throw new ArgumentNullException(nameof(clock));
}
/// <summary>Number of probes currently advised. Test/dashboard hook.</summary>
public int ActiveProbeCount
{
get { lock (_lock) return _byProbe.Count; }
}
/// <summary>
/// Snapshot every currently-tracked host's state. One entry per probe.
/// </summary>
public IReadOnlyList<HostProbeSnapshot> SnapshotStates()
{
lock (_lock)
{
return _byProbe.Select(kv => new HostProbeSnapshot(
TagName: kv.Value.TagName,
State: kv.Value.State,
LastChangedUtc: kv.Value.LastStateChangeUtc)).ToList();
}
}
/// <summary>
/// Query the current runtime state for <paramref name="tagName"/>. Returns
/// <see cref="HostRuntimeState.Unknown"/> when the host is not tracked.
/// </summary>
public HostRuntimeState GetState(string tagName)
{
lock (_lock)
{
if (_probeByTagName.TryGetValue(tagName, out var probe)
&& _byProbe.TryGetValue(probe, out var state))
return state.State;
return HostRuntimeState.Unknown;
}
}
/// <summary>
/// Diff the desired host set (filtered $WinPlatform / $AppEngine from the latest Discover)
/// against the currently-tracked set and advise / unadvise as needed. Idempotent:
/// calling twice with the same set does nothing.
/// </summary>
public async Task SyncAsync(IEnumerable<HostProbeTarget> desiredHosts)
{
if (_disposed) return;
var desired = desiredHosts
.Where(h => !string.IsNullOrWhiteSpace(h.TagName))
.ToDictionary(h => h.TagName, StringComparer.OrdinalIgnoreCase);
List<string> toAdvise;
List<string> toUnadvise;
lock (_lock)
{
toAdvise = desired.Keys
.Where(tag => !_probeByTagName.ContainsKey(tag))
.ToList();
toUnadvise = _probeByTagName.Keys
.Where(tag => !desired.ContainsKey(tag))
.Select(tag => _probeByTagName[tag])
.ToList();
foreach (var tag in toAdvise)
{
var probe = tag + ProbeAttribute;
_probeByTagName[tag] = probe;
_byProbe[probe] = new HostProbeState
{
TagName = tag,
State = HostRuntimeState.Unknown,
LastStateChangeUtc = _clock(),
};
}
foreach (var probe in toUnadvise)
{
_byProbe.Remove(probe);
}
foreach (var removedTag in _probeByTagName.Keys.Where(t => !desired.ContainsKey(t)).ToList())
{
_probeByTagName.Remove(removedTag);
}
}
foreach (var tag in toAdvise)
{
var probe = tag + ProbeAttribute;
try
{
await _subscribe(probe, OnProbeCallback);
}
catch
{
// Rollback on subscribe failure so a later Tick can't transition a never-advised
// probe into a false Stopped state. Callers can re-Sync later to retry.
lock (_lock)
{
_byProbe.Remove(probe);
_probeByTagName.Remove(tag);
}
}
}
foreach (var probe in toUnadvise)
{
try { await _unsubscribe(probe); } catch { /* best-effort cleanup */ }
}
}
/// <summary>
/// Public entry point for tests and internal callbacks. Production flow: MxAccessClient's
/// SubscribeAsync delivers VTQ updates through the callback wired in <see cref="SyncAsync"/>,
/// which calls this method under the lock to update state and fires
/// <see cref="StateChanged"/> outside the lock for any transition that matters.
/// </summary>
public void OnProbeCallback(string probeTag, Vtq vtq)
{
if (_disposed) return;
HostStateTransition? transition = null;
lock (_lock)
{
if (!_byProbe.TryGetValue(probeTag, out var state)) return;
var isRunning = vtq.Quality >= 192 && vtq.Value is bool b && b;
var now = _clock();
var previous = state.State;
state.LastCallbackUtc = now;
if (isRunning)
{
state.GoodUpdateCount++;
if (previous != HostRuntimeState.Running)
{
state.State = HostRuntimeState.Running;
state.LastStateChangeUtc = now;
if (previous == HostRuntimeState.Stopped)
{
transition = new HostStateTransition(state.TagName, previous, HostRuntimeState.Running, now);
}
}
}
else
{
state.FailureCount++;
if (previous != HostRuntimeState.Stopped)
{
state.State = HostRuntimeState.Stopped;
state.LastStateChangeUtc = now;
transition = new HostStateTransition(state.TagName, previous, HostRuntimeState.Stopped, now);
}
}
}
if (transition is { } t)
{
StateChanged?.Invoke(this, t);
}
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
lock (_lock)
{
_byProbe.Clear();
_probeByTagName.Clear();
}
}
private sealed class HostProbeState
{
public string TagName { get; set; } = "";
public HostRuntimeState State { get; set; }
public DateTime LastStateChangeUtc { get; set; }
public DateTime? LastCallbackUtc { get; set; }
public long GoodUpdateCount { get; set; }
public long FailureCount { get; set; }
}
}
public enum HostRuntimeState
{
Unknown,
Running,
Stopped,
}
public sealed record HostStateTransition(
string TagName,
HostRuntimeState OldState,
HostRuntimeState NewState,
DateTime AtUtc);
public sealed record HostProbeSnapshot(
string TagName,
HostRuntimeState State,
DateTime LastChangedUtc);
public readonly record struct HostProbeTarget(string TagName, int CategoryId)
{
public bool IsRuntimeHost =>
CategoryId == GalaxyRuntimeProbeManager.CategoryWinPlatform
|| CategoryId == GalaxyRuntimeProbeManager.CategoryAppEngine;
}

View File

@@ -0,0 +1,190 @@
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Alarms;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests;
[Trait("Category", "Unit")]
public sealed class GalaxyAlarmTrackerTests
{
private sealed class FakeSubscriber
{
public readonly ConcurrentDictionary<string, Action<string, Vtq>> Subs = new();
public readonly ConcurrentQueue<string> Unsubs = new();
public readonly ConcurrentQueue<(string Tag, object Value)> Writes = new();
public bool WriteReturns { get; set; } = true;
public Task Subscribe(string tag, Action<string, Vtq> cb)
{
Subs[tag] = cb;
return Task.CompletedTask;
}
public Task Unsubscribe(string tag)
{
Unsubs.Enqueue(tag);
Subs.TryRemove(tag, out _);
return Task.CompletedTask;
}
public Task<bool> Write(string tag, object value)
{
Writes.Enqueue((tag, value));
return Task.FromResult(WriteReturns);
}
}
private static Vtq Bool(bool v) => new(v, DateTime.UtcNow, 192);
private static Vtq Int(int v) => new(v, DateTime.UtcNow, 192);
private static Vtq Str(string v) => new(v, DateTime.UtcNow, 192);
[Fact]
public async Task Track_subscribes_to_four_alarm_attributes()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
await t.TrackAsync("Tank.Level.HiHi");
fake.Subs.ShouldContainKey("Tank.Level.HiHi.InAlarm");
fake.Subs.ShouldContainKey("Tank.Level.HiHi.Priority");
fake.Subs.ShouldContainKey("Tank.Level.HiHi.DescAttrName");
fake.Subs.ShouldContainKey("Tank.Level.HiHi.Acked");
t.TrackedAlarmCount.ShouldBe(1);
}
[Fact]
public async Task Track_is_idempotent_on_repeat_call()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
await t.TrackAsync("Alarm.A");
await t.TrackAsync("Alarm.A");
t.TrackedAlarmCount.ShouldBe(1);
fake.Subs.Count.ShouldBe(4); // 4 sub calls, not 8
}
[Fact]
public async Task InAlarm_false_to_true_fires_Active_transition()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
var transitions = new ConcurrentQueue<AlarmTransition>();
t.TransitionRaised += (_, tr) => transitions.Enqueue(tr);
await t.TrackAsync("Alarm.A");
fake.Subs["Alarm.A.Priority"]("Alarm.A.Priority", Int(500));
fake.Subs["Alarm.A.DescAttrName"]("Alarm.A.DescAttrName", Str("TankLevelHiHi"));
fake.Subs["Alarm.A.InAlarm"]("Alarm.A.InAlarm", Bool(true));
transitions.Count.ShouldBe(1);
transitions.TryDequeue(out var tr).ShouldBeTrue();
tr!.Transition.ShouldBe(AlarmStateTransition.Active);
tr.Priority.ShouldBe(500);
tr.DescAttrName.ShouldBe("TankLevelHiHi");
}
[Fact]
public async Task InAlarm_true_to_false_fires_Inactive_transition()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
var transitions = new ConcurrentQueue<AlarmTransition>();
t.TransitionRaised += (_, tr) => transitions.Enqueue(tr);
await t.TrackAsync("Alarm.A");
fake.Subs["Alarm.A.InAlarm"]("Alarm.A.InAlarm", Bool(true));
fake.Subs["Alarm.A.InAlarm"]("Alarm.A.InAlarm", Bool(false));
transitions.Count.ShouldBe(2);
transitions.TryDequeue(out _);
transitions.TryDequeue(out var tr).ShouldBeTrue();
tr!.Transition.ShouldBe(AlarmStateTransition.Inactive);
}
[Fact]
public async Task Acked_false_to_true_fires_Acknowledged_while_InAlarm_is_true()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
var transitions = new ConcurrentQueue<AlarmTransition>();
t.TransitionRaised += (_, tr) => transitions.Enqueue(tr);
await t.TrackAsync("Alarm.A");
fake.Subs["Alarm.A.InAlarm"]("Alarm.A.InAlarm", Bool(true)); // Active, clears Acked flag
fake.Subs["Alarm.A.Acked"]("Alarm.A.Acked", Bool(true)); // Acknowledged
transitions.Count.ShouldBe(2);
transitions.TryDequeue(out _);
transitions.TryDequeue(out var tr).ShouldBeTrue();
tr!.Transition.ShouldBe(AlarmStateTransition.Acknowledged);
}
[Fact]
public async Task Acked_transition_while_InAlarm_is_false_does_not_fire()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
var transitions = new ConcurrentQueue<AlarmTransition>();
t.TransitionRaised += (_, tr) => transitions.Enqueue(tr);
await t.TrackAsync("Alarm.A");
// Initial Acked=true on subscribe (alarm is at rest, pre-ack'd) — should not fire.
fake.Subs["Alarm.A.Acked"]("Alarm.A.Acked", Bool(true));
transitions.Count.ShouldBe(0);
}
[Fact]
public async Task Acknowledge_writes_AckMsg_with_comment()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
await t.TrackAsync("Alarm.A");
var ok = await t.AcknowledgeAsync("Alarm.A", "acknowledged by operator");
ok.ShouldBeTrue();
fake.Writes.Count.ShouldBe(1);
fake.Writes.TryDequeue(out var w).ShouldBeTrue();
w.Tag.ShouldBe("Alarm.A.AckMsg");
w.Value.ShouldBe("acknowledged by operator");
}
[Fact]
public async Task Snapshot_reports_latest_fields()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
await t.TrackAsync("Alarm.A");
fake.Subs["Alarm.A.InAlarm"]("Alarm.A.InAlarm", Bool(true));
fake.Subs["Alarm.A.Priority"]("Alarm.A.Priority", Int(900));
fake.Subs["Alarm.A.DescAttrName"]("Alarm.A.DescAttrName", Str("MyAlarm"));
fake.Subs["Alarm.A.Acked"]("Alarm.A.Acked", Bool(true));
var snap = t.SnapshotStates();
snap.Count.ShouldBe(1);
snap[0].InAlarm.ShouldBeTrue();
snap[0].Acked.ShouldBeTrue();
snap[0].Priority.ShouldBe(900);
snap[0].DescAttrName.ShouldBe("MyAlarm");
}
[Fact]
public async Task Foreign_probe_callback_is_dropped()
{
var fake = new FakeSubscriber();
using var t = new GalaxyAlarmTracker(fake.Subscribe, fake.Unsubscribe, fake.Write);
var transitions = new ConcurrentQueue<AlarmTransition>();
t.TransitionRaised += (_, tr) => transitions.Enqueue(tr);
// No TrackAsync was called — this callback is foreign and should be silently ignored.
t.OnProbeCallback("Unknown.InAlarm", Bool(true));
transitions.Count.ShouldBe(0);
}
}

View File

@@ -0,0 +1,231 @@
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Stability;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests;
[Trait("Category", "Unit")]
public sealed class GalaxyRuntimeProbeManagerTests
{
private sealed class FakeSubscriber
{
public readonly ConcurrentDictionary<string, Action<string, Vtq>> Subs = new();
public readonly ConcurrentQueue<string> UnsubCalls = new();
public bool FailSubscribeFor { get; set; }
public string? FailSubscribeTag { get; set; }
public Task Subscribe(string probe, Action<string, Vtq> cb)
{
if (FailSubscribeFor && string.Equals(probe, FailSubscribeTag, StringComparison.OrdinalIgnoreCase))
throw new InvalidOperationException("subscribe refused");
Subs[probe] = cb;
return Task.CompletedTask;
}
public Task Unsubscribe(string probe)
{
UnsubCalls.Enqueue(probe);
Subs.TryRemove(probe, out _);
return Task.CompletedTask;
}
}
private static Vtq Good(bool scanState) => new(scanState, DateTime.UtcNow, 192);
private static Vtq Bad() => new(null, DateTime.UtcNow, 0);
[Fact]
public async Task Sync_subscribes_to_ScanState_per_host()
{
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe);
await mgr.SyncAsync(new[]
{
new HostProbeTarget("PlatformA", GalaxyRuntimeProbeManager.CategoryWinPlatform),
new HostProbeTarget("EngineB", GalaxyRuntimeProbeManager.CategoryAppEngine),
});
mgr.ActiveProbeCount.ShouldBe(2);
subs.Subs.ShouldContainKey("PlatformA.ScanState");
subs.Subs.ShouldContainKey("EngineB.ScanState");
}
[Fact]
public async Task Sync_is_idempotent_on_repeat_call_with_same_set()
{
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe);
var targets = new[] { new HostProbeTarget("PlatformA", 1) };
await mgr.SyncAsync(targets);
await mgr.SyncAsync(targets);
mgr.ActiveProbeCount.ShouldBe(1);
subs.Subs.Count.ShouldBe(1);
subs.UnsubCalls.Count.ShouldBe(0);
}
[Fact]
public async Task Sync_unadvises_removed_hosts()
{
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe);
await mgr.SyncAsync(new[]
{
new HostProbeTarget("PlatformA", 1),
new HostProbeTarget("PlatformB", 1),
});
await mgr.SyncAsync(new[] { new HostProbeTarget("PlatformA", 1) });
mgr.ActiveProbeCount.ShouldBe(1);
subs.UnsubCalls.ShouldContain("PlatformB.ScanState");
}
[Fact]
public async Task Subscribe_failure_rolls_back_host_entry_so_later_transitions_do_not_fire_stale_events()
{
var subs = new FakeSubscriber { FailSubscribeFor = true, FailSubscribeTag = "PlatformA.ScanState" };
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe);
await mgr.SyncAsync(new[] { new HostProbeTarget("PlatformA", 1) });
mgr.ActiveProbeCount.ShouldBe(0); // rolled back
mgr.GetState("PlatformA").ShouldBe(HostRuntimeState.Unknown);
}
[Fact]
public async Task Unknown_to_Running_does_not_fire_StateChanged()
{
var now = new DateTime(2026, 4, 18, 10, 0, 0, DateTimeKind.Utc);
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe, () => now);
var transitions = new ConcurrentQueue<HostStateTransition>();
mgr.StateChanged += (_, t) => transitions.Enqueue(t);
await mgr.SyncAsync(new[] { new HostProbeTarget("PlatformA", 1) });
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(true));
mgr.GetState("PlatformA").ShouldBe(HostRuntimeState.Running);
transitions.Count.ShouldBe(0); // startup transition, operators don't care
}
[Fact]
public async Task Running_to_Stopped_fires_StateChanged_with_both_states()
{
var now = new DateTime(2026, 4, 18, 10, 0, 0, DateTimeKind.Utc);
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe, () => now);
var transitions = new ConcurrentQueue<HostStateTransition>();
mgr.StateChanged += (_, t) => transitions.Enqueue(t);
await mgr.SyncAsync(new[] { new HostProbeTarget("PlatformA", 1) });
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(true)); // Unknown→Running (silent)
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(false)); // Running→Stopped (fires)
transitions.Count.ShouldBe(1);
transitions.TryDequeue(out var t).ShouldBeTrue();
t!.TagName.ShouldBe("PlatformA");
t.OldState.ShouldBe(HostRuntimeState.Running);
t.NewState.ShouldBe(HostRuntimeState.Stopped);
}
[Fact]
public async Task Stopped_to_Running_fires_StateChanged_for_recovery()
{
var now = new DateTime(2026, 4, 18, 10, 0, 0, DateTimeKind.Utc);
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe, () => now);
var transitions = new ConcurrentQueue<HostStateTransition>();
mgr.StateChanged += (_, t) => transitions.Enqueue(t);
await mgr.SyncAsync(new[] { new HostProbeTarget("PlatformA", 1) });
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(true)); // Unknown→Running (silent)
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(false)); // Running→Stopped (fires)
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(true)); // Stopped→Running (fires)
transitions.Count.ShouldBe(2);
}
[Fact]
public async Task Unknown_to_Stopped_fires_StateChanged_for_first_known_bad_signal()
{
var now = new DateTime(2026, 4, 18, 10, 0, 0, DateTimeKind.Utc);
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe, () => now);
var transitions = new ConcurrentQueue<HostStateTransition>();
mgr.StateChanged += (_, t) => transitions.Enqueue(t);
await mgr.SyncAsync(new[] { new HostProbeTarget("PlatformA", 1) });
// First callback is bad-quality — we must flag the host Stopped so operators see it.
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Bad());
transitions.Count.ShouldBe(1);
transitions.TryDequeue(out var t).ShouldBeTrue();
t!.OldState.ShouldBe(HostRuntimeState.Unknown);
t.NewState.ShouldBe(HostRuntimeState.Stopped);
}
[Fact]
public async Task Repeated_Good_Running_callbacks_do_not_fire_duplicate_events()
{
var now = new DateTime(2026, 4, 18, 10, 0, 0, DateTimeKind.Utc);
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe, () => now);
var count = 0;
mgr.StateChanged += (_, _) => Interlocked.Increment(ref count);
await mgr.SyncAsync(new[] { new HostProbeTarget("PlatformA", 1) });
for (var i = 0; i < 5; i++)
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(true));
count.ShouldBe(0); // only the silent Unknown→Running on the first, no events after
}
[Fact]
public async Task Unknown_callback_for_non_tracked_probe_is_dropped()
{
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe);
mgr.OnProbeCallback("ProbeForSomeoneElse.ScanState", Good(true));
mgr.ActiveProbeCount.ShouldBe(0);
}
[Fact]
public async Task Snapshot_reports_current_state_for_every_tracked_host()
{
var now = new DateTime(2026, 4, 18, 10, 0, 0, DateTimeKind.Utc);
var subs = new FakeSubscriber();
using var mgr = new GalaxyRuntimeProbeManager(subs.Subscribe, subs.Unsubscribe, () => now);
await mgr.SyncAsync(new[]
{
new HostProbeTarget("PlatformA", 1),
new HostProbeTarget("EngineB", 3),
});
subs.Subs["PlatformA.ScanState"]("PlatformA.ScanState", Good(true)); // Running
subs.Subs["EngineB.ScanState"]("EngineB.ScanState", Bad()); // Stopped
var snap = mgr.SnapshotStates();
snap.Count.ShouldBe(2);
snap.ShouldContain(s => s.TagName == "PlatformA" && s.State == HostRuntimeState.Running);
snap.ShouldContain(s => s.TagName == "EngineB" && s.State == HostRuntimeState.Stopped);
}
[Fact]
public void IsRuntimeHost_recognizes_WinPlatform_and_AppEngine_category_ids()
{
new HostProbeTarget("X", GalaxyRuntimeProbeManager.CategoryWinPlatform).IsRuntimeHost.ShouldBeTrue();
new HostProbeTarget("X", GalaxyRuntimeProbeManager.CategoryAppEngine).IsRuntimeHost.ShouldBeTrue();
new HostProbeTarget("X", 4 /* $Area */).IsRuntimeHost.ShouldBeFalse();
new HostProbeTarget("X", 11 /* $ApplicationObject */).IsRuntimeHost.ShouldBeFalse();
}
}

View File

@@ -0,0 +1,61 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Historian;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests;
[Trait("Category", "Unit")]
public sealed class HistorianQualityMapperTests
{
/// <summary>
/// Rich mapping preserves specific OPC DA subcodes through the historian ToWire path.
/// Before PR 12 the category-only fallback collapsed e.g. BadNotConnected(8) to
/// Bad(0x80000000) so downstream OPC UA clients could not distinguish transport issues
/// from sensor issues. After PR 12 every known subcode round-trips to its canonical
/// uint32 StatusCode and Proxy translation stays byte-for-byte with v1 QualityMapper.
/// </summary>
[Theory]
[InlineData((byte)192, 0x00000000u)] // Good
[InlineData((byte)216, 0x00D80000u)] // Good_LocalOverride
[InlineData((byte)64, 0x40000000u)] // Uncertain
[InlineData((byte)68, 0x40900000u)] // Uncertain_LastUsableValue
[InlineData((byte)80, 0x40930000u)] // Uncertain_SensorNotAccurate
[InlineData((byte)84, 0x40940000u)] // Uncertain_EngineeringUnitsExceeded
[InlineData((byte)88, 0x40950000u)] // Uncertain_SubNormal
[InlineData((byte)0, 0x80000000u)] // Bad
[InlineData((byte)4, 0x80890000u)] // Bad_ConfigurationError
[InlineData((byte)8, 0x808A0000u)] // Bad_NotConnected
[InlineData((byte)12, 0x808B0000u)] // Bad_DeviceFailure
[InlineData((byte)16, 0x808C0000u)] // Bad_SensorFailure
[InlineData((byte)20, 0x80050000u)] // Bad_CommunicationError
[InlineData((byte)24, 0x808D0000u)] // Bad_OutOfService
[InlineData((byte)32, 0x80320000u)] // Bad_WaitingForInitialData
public void Maps_specific_OPC_DA_codes_to_canonical_StatusCode(byte quality, uint expected)
{
HistorianQualityMapper.Map(quality).ShouldBe(expected);
}
[Theory]
[InlineData((byte)200)] // Good — unknown subcode in Good family
[InlineData((byte)255)] // Good — unknown
public void Unknown_good_family_codes_fall_back_to_plain_Good(byte q)
{
HistorianQualityMapper.Map(q).ShouldBe(0x00000000u);
}
[Theory]
[InlineData((byte)100)] // Uncertain — unknown subcode
[InlineData((byte)150)] // Uncertain — unknown
public void Unknown_uncertain_family_codes_fall_back_to_plain_Uncertain(byte q)
{
HistorianQualityMapper.Map(q).ShouldBe(0x40000000u);
}
[Theory]
[InlineData((byte)1)] // Bad — unknown subcode
[InlineData((byte)50)] // Bad — unknown
public void Unknown_bad_family_codes_fall_back_to_plain_Bad(byte q)
{
HistorianQualityMapper.Map(q).ShouldBe(0x80000000u);
}
}