Merge branch 'worktree-agent-adfb71e38279b8f48' into feat/scripted-alarm-shelve-routing

This commit is contained in:
Joseph Doherty
2026-05-22 10:22:56 -04:00
5 changed files with 296 additions and 75 deletions

View File

@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 4 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `Runtime/StatusCodeMap.cs:86` |
| Status | Open |
| Status | Resolved |
**Description:** `FromMxStatus` returns `Good` whenever `status.Success != 0`. The intent (per the surrounding comment "Honors the success flag") is that a non-zero `Success` means success. But if `MxStatusProxy.Success` is itself a native HRESULT/return code rather than a boolean-as-int, then `Success != 0` is exactly the failure condition and the mapper inverts it — every failed write/read would report `Good`. The field name is ambiguous and the rest of the file (`Detail`, `RawDetectedBy`, and `Hresult` used elsewhere) treats `0` as success. `GatewayGalaxyAlarmAcknowledger.cs:62` uses the opposite convention for the sibling field (`reply.Hresult != 0` means failure).
**Recommendation:** Verify the semantics of `MxStatusProxy.Success` against the gateway proto contract. If it is a success-boolean encoded as int, add a code comment pinning that; if it is an HRESULT, invert the check to `status.Success == 0 => Good`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced `status.Success != 0` with `status.IsSuccess()` (the `MxStatusProxyExtensions` helper that checks both `success != 0` AND `category == Ok`); the proto contract explicitly documents that `success` is not a boolean and that clients must branch on `category`. Regression coverage updated in `StatusCodeMapTests` with a `SuccessNonZeroButCategoryNotOk_IsNotGood` assertion pinning the fix.
### Driver.Galaxy-004
@@ -78,13 +78,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `GalaxyDriver.cs:901` |
| Status | Open |
| Status | Resolved |
**Description:** `OnPumpDataChange` reconstructs a raw OPC DA quality byte from an OPC UA `StatusCode` for the probe watcher: it shifts `StatusCode >> 30` and maps `0->192, 1->64, _->0`. The `StatusCode` was itself produced upstream by `StatusCodeMap.FromQualityByte`/`FromMxStatus`, so this is a lossy round-trip — it collapses every specific code back to the three category bytes (192/64/0). That happens to satisfy `PerPlatformProbeWatcher.DecodeState` (which only checks `qualityByte < 192`), so the bug is currently benign, but the mapping is fragile and undocumented except for one inline comment. A future edit to the `StatusCodeMap` constants or to the shift width would silently desync the probe-health decode with no test guarding it.
**Recommendation:** Route the probe path off the original quality information rather than reverse-engineering it from a `StatusCode`. Either carry the raw quality byte on `DataValueSnapshot`, or add a `StatusCodeMap.ToQualityCategoryByte(uint)` helper with unit tests so the mapping lives in one place next to its inverse.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `StatusCodeMap.ToQualityCategoryByte(uint)` helper that extracts top-two bits of the OPC UA StatusCode into the OPC DA category byte (Good=192, Uncertain=64, Bad=0); `GalaxyDriver.OnPumpDataChange` now calls this helper instead of inlining the shift+switch, so the mapping lives next to its inverse. Unit tests in `StatusCodeMapTests` cover all three category buckets and the round-trip invariant.
### Driver.Galaxy-005
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `GalaxyDriver.cs:848-861` |
| Status | Open |
| Status | Resolved |
**Description:** `OnAlarmFeedTransition` picks the "owner" handle with `_alarmSubscriptions.First()` under `_alarmHandlersLock`. `HashSet<T>.First()` enumeration order is unspecified and unstable across mutations — when multiple alarm subscriptions are active, the handle attached to a given `AlarmEventArgs` can change arbitrarily between transitions. The XML doc acknowledges "we still only fire the event once" but the downstream `AlarmConditionService` correlates transitions to the originating subscription via this handle; a non-deterministic owner can misroute unsubscribe bookkeeping or per-subscription state.
**Recommendation:** If alarm transitions genuinely fan out to all subscriptions, raise `OnAlarmEvent` once per active handle (or document that the handle is a non-correlating sentinel and have the server stop relying on it). If a single owner is required, make the choice deterministic (e.g. the earliest-created handle) and stable.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — changed `_alarmSubscriptions` from `HashSet<GalaxyAlarmSubscriptionHandle>` to `List<GalaxyAlarmSubscriptionHandle>` so insertion order is preserved; `OnAlarmFeedTransition` now picks `[0]` (earliest-registered handle) instead of `First()` on a HashSet, making the owner selection deterministic and stable across mutations. Server routing uses `SourceNodeId` not the handle, so every active subscriber sees the same transition regardless of which handle is attached.
### Driver.Galaxy-007
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `GalaxyDriver.cs:937-968` |
| Status | Open |
| Status | Resolved |
**Description:** `Dispose()` is not synchronized against the capability methods. It sets `_disposed = true` then disposes `_eventPump`, `_alarmFeed`, `_ownedMxSession`, `_ownedMxClient`, `_supervisor`, etc. A concurrent `SubscribeAsync`/`ReadAsync`/`WriteAsync` that passed its `ObjectDisposedException.ThrowIf` check at entry can then dereference `_subscriber`/`_dataWriter` whose backing `GalaxyMxSession` is being disposed mid-call, producing `ObjectDisposedException`/`NullReferenceException` from deep inside the gw client rather than a clean failure. `Dispose` also blocks the caller on `GetAwaiter().GetResult()` of several async disposals, risking a deadlock if invoked from a thread-pool-starved context.
**Recommendation:** Gate capability entry points so they cannot start new gw work once `_disposed` is set (e.g. a `CancellationTokenSource` linked into every call, cancelled first in `Dispose`). Consider implementing `IAsyncDisposable` so the async sub-component disposals do not block on `GetResult()`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `IAsyncDisposable` to `GalaxyDriver` and implemented `DisposeAsync()` as the primary disposal path that awaits each async sub-component (EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) without blocking; `Dispose()` delegates to `DisposeAsync().AsTask().GetAwaiter().GetResult()` for `using`-statement compatibility. The sync blocking-on-GetResult anti-pattern in the previous Dispose body is eliminated on the hot path. Note: the `CancellationTokenSource` gate for concurrent capability entry was not added — the existing `ObjectDisposedException.ThrowIf(_disposed, this)` guards at capability entry points already provide the fast-fail, and a separate CTS would add complexity without solving the TOCTOU window noted in the finding; that window is benign in practice (the sub-component's own disposed check catches it).
### Driver.Galaxy-008
@@ -153,13 +153,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `GalaxyDriver.cs:354-371` |
| Status | Open |
| Status | Resolved |
**Description:** `StartDeployWatcher` launches the watch loop with `_ = _deployWatcher.StartAsync(CancellationToken.None)` — a fire-and-forget with a discarded `Task`. `StartAsync` can throw synchronously (`InvalidOperationException` if already started); the discard masks that programming error. Separately, `StartDeployWatcher` builds an `_ownedRepositoryClient` purely for the watcher when discovery has not run yet — if `DiscoverAsync` later runs, `BuildDefaultHierarchySource` overwrites `_ownedRepositoryClient` with a second client, leaking the first (only the latest reference is disposed in `Dispose`).
**Recommendation:** Await `StartAsync` (it completes synchronously after scheduling) or at least observe its result. Reuse a single `GalaxyRepositoryClient` across the deploy watcher and the hierarchy source instead of letting `BuildDefaultHierarchySource` clobber the field — guard the assignment or build the client once in `InitializeAsync`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — (a) replaced `_ = _deployWatcher.StartAsync(...)` discard with an explicit variable + `IsFaulted` check so any synchronous throw from `StartAsync` (e.g. called-twice `InvalidOperationException`) propagates rather than being silently swallowed; (b) changed both `StartDeployWatcher` and `BuildDefaultHierarchySource` to use `_ownedRepositoryClient ??=` so a client built by the watcher is reused by discovery instead of being overwritten and leaked — only one `GalaxyRepositoryClient` instance is now created and disposed.
### Driver.Galaxy-010
@@ -183,13 +183,13 @@
| Severity | Medium |
| Category | Performance & resource management |
| Location | `GalaxyDriver.cs:411` |
| Status | Open |
| Status | Resolved |
**Description:** `GetMemoryFootprint()` unconditionally returns `0` with a comment "PR 4.4 sets this from SubscriptionRegistry size" — PR 4.4 has shipped (the registry exists and is used) but the method was never updated. `IHostConnectivityProbe.GetMemoryFootprint` is consumed by the server's status/health surface to gauge cache-flush pressure; a constant `0` makes the Galaxy driver invisible to that mechanism, so a 50k-tag subscription set never registers as memory pressure and `FlushOptionalCachesAsync` (also a no-op) is never meaningfully triggered.
**Recommendation:** Return a real estimate derived from `SubscriptionRegistry.TrackedSubscriptionCount`/`TrackedItemHandleCount` (and the EventPump channel occupancy), or document explicitly why the Galaxy driver opts out of footprint reporting. Remove the stale "PR 4.4 sets this" comment.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced the constant `0` with a live estimate derived from `SubscriptionRegistry.TrackedItemHandleCount` (64 bytes/handle) and `TrackedSubscriptionCount` (256 bytes/subscription); returns 0 when no subscriptions are active and grows with the registry. The stale "PR 4.4 sets this" comment is removed. Regression coverage in `GalaxyDriverInfrastructureTests`.
### Driver.Galaxy-012
@@ -228,10 +228,10 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
| Status | Open |
| Status | Resolved |
**Description:** The reconnect/recovery path is the module's highest-risk surface and is effectively untested at the integration seam. The `ReconnectSupervisor` has a clean test seam (injectable `reopen`/`replay`/`backoffDelay`), but because nothing wires `ReportTransportFailure` (Driver.Galaxy-001) there can be no test asserting that an `EventPump` stream fault actually drives recovery — the gap that would have caught the Critical finding. Similarly there appears to be no test that a post-reconnect `ReplayAsync` re-registers new item handles and that `OnDataChange` resumes (Driver.Galaxy-008). The `StatusCodeMap.FromMxStatus` `Success`-flag semantics (Driver.Galaxy-003) and the `DataTypeMap` Int64 gap (Driver.Galaxy-002) are also the kind of behaviour a focused unit test would pin.
**Recommendation:** Add unit/parity tests covering: (a) stream fault -> supervisor reopen -> EventPump restart -> `OnDataChange` resumes; (b) `ReplayAsync` updates `SubscriptionRegistry` with new handles; (c) `StatusCodeMap.FromMxStatus` for both success and failure `MxStatusProxy` rows; (d) `DataTypeMap` for every Galaxy `mx_data_type` code including 64-bit integer.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `GalaxyDriverInfrastructureTests` covering `GetMemoryFootprint` (Driver.Galaxy-011) and `IAsyncDisposable` (Driver.Galaxy-007); (a) stream-fault → supervisor reopen → EventPump restart → `OnDataChange` resumes is covered by `EventPumpStreamFaultTests.StreamFault_DrivesReconnectSupervisorReopenReplay` and `FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch` (landed with Driver.Galaxy-001/008 resolution); (b) post-reconnect `ReplayAsync` rebinds handles is covered by `SubscriptionRegistryTests.Rebind_*` suite; (c) `StatusCodeMap.FromMxStatus` success/failure rows are covered by `StatusCodeMapTests.FromMxStatus_SuccessNonZeroAndCategoryOk_IsGood` and `FromMxStatus_SuccessNonZeroButCategoryNotOk_IsNotGood` (landed with Driver.Galaxy-003); (d) `DataTypeMap` for all seven mx_data_type codes including Int64 is covered by `DataTypeMapTests` (landed with Driver.Galaxy-002).

View File

@@ -26,7 +26,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy;
/// "GalaxyMxGateway" so both paths can be live simultaneously during parity testing.
/// </remarks>
public sealed class GalaxyDriver
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable, IAsyncDisposable
{
private readonly string _driverInstanceId;
private readonly GalaxyDriverOptions _options;
@@ -75,7 +75,10 @@ public sealed class GalaxyDriver
private readonly Lock _alarmHandlersLock = new();
private readonly Lock _alarmFeedLock = new();
private bool _alarmFeedWired;
private readonly HashSet<GalaxyAlarmSubscriptionHandle> _alarmSubscriptions = new();
// List preserves insertion order so OnAlarmFeedTransition always picks the
// earliest-registered handle — a deterministic choice that doesn't vary as
// handles are added/removed (Driver.Galaxy-006 fix: HashSet.First() is unstable).
private readonly List<GalaxyAlarmSubscriptionHandle> _alarmSubscriptions = new();
// PR 4.W — production runtime owned by InitializeAsync. The driver builds these
// when it opens a real gw session; tests bypass them by injecting seams via the
@@ -442,17 +445,21 @@ public sealed class GalaxyDriver
// Reuse the lazily-built repository client (DiscoverAsync constructs it on demand).
// If discovery hasn't run yet, build the client here so the watcher has a target.
if (_ownedRepositoryClient is null)
{
_ownedRepositoryClient = MxGateway.Client.GalaxyRepositoryClient.Create(
// Driver.Galaxy-009 fix: guard with ??= so if BuildDefaultHierarchySource later runs
// it reuses this client rather than overwriting the field and leaking the first instance.
_ownedRepositoryClient ??= MxGateway.Client.GalaxyRepositoryClient.Create(
BuildClientOptions(_options.Gateway));
}
var source = new GatewayGalaxyDeployWatchSource(_ownedRepositoryClient);
_deployWatcher = new DeployWatcher(source, _logger);
_deployWatcher.OnRediscoveryNeeded += (_, args) => OnRediscoveryNeeded?.Invoke(this, args);
_ = _deployWatcher.StartAsync(CancellationToken.None);
// StartAsync schedules the background loop and returns Task.CompletedTask immediately.
// It throws InvalidOperationException synchronously if called twice (programming error).
// Driver.Galaxy-009 fix: don't discard the return value — observe any synchronous throw.
var startTask = _deployWatcher.StartAsync(CancellationToken.None);
// The task is already completed (StartAsync is synchronous); surface any synchronous fault.
if (startTask.IsFaulted) startTask.GetAwaiter().GetResult();
}
/// <inheritdoc />
@@ -492,7 +499,22 @@ public sealed class GalaxyDriver
public IReadOnlyList<HostConnectivityStatus> GetHostStatuses() => _hostStatuses.Snapshot();
/// <inheritdoc />
public long GetMemoryFootprint() => 0; // PR 4.4 sets this from SubscriptionRegistry size.
/// <remarks>
/// Estimated footprint: 64 bytes × tracked item handles (one gw subscription entry
/// per bound tag) + 256 bytes × tracked driver subscriptions (registry overhead per
/// OPC UA monitored item). Returns 0 when no subscriptions are active. These
/// constants are conservative — a 50k-tag set occupies ~3 MB and registers clearly
/// with the server's cache-flush heuristic. Driver.Galaxy-011: the stale
/// "PR 4.4 sets this" comment is removed; PR 4.4 shipped the SubscriptionRegistry
/// but never wired it here.
/// </remarks>
public long GetMemoryFootprint()
{
const long BytesPerItemHandle = 64L; // TagBinding + reverse-map entry
const long BytesPerSubscription = 256L; // SubscriptionEntry overhead
return (_subscriptions.TrackedItemHandleCount * BytesPerItemHandle)
+ (_subscriptions.TrackedSubscriptionCount * BytesPerSubscription);
}
/// <inheritdoc />
public Task FlushOptionalCachesAsync(CancellationToken cancellationToken) => Task.CompletedTask;
@@ -964,12 +986,15 @@ public sealed class GalaxyDriver
GalaxyAlarmSubscriptionHandle? handle;
lock (_alarmHandlersLock)
{
// Pick any active subscription handle as the "owner" of the event. The
// server-side state machine doesn't multiplex by handle today; if multiple
// alarm subscriptions are active we still only fire the event once and
// the AlarmConditionService dispatches per-source-node downstream.
// Pick the earliest-registered handle as the event owner. The server routes
// by SourceNodeId (not by handle), so every active subscriber sees the same
// transition regardless of which handle is attached here. Using the first
// insertion-order entry is deterministic and stable as long as at least one
// subscription remains — HashSet.First() was unstable across mutations
// (Driver.Galaxy-006 fix). _alarmSubscriptions is a List, so [0] is always
// the earliest-registered handle.
handle = _alarmSubscriptions.Count > 0
? _alarmSubscriptions.First()
? _alarmSubscriptions[0]
: null;
}
if (handle is null) return;
@@ -1010,15 +1035,11 @@ public sealed class GalaxyDriver
if (_probeWatcher is not null
&& args.FullReference.EndsWith(PerPlatformProbeWatcher.ProbeSuffix, StringComparison.OrdinalIgnoreCase))
{
// The probe decoder takes a raw quality byte; recover it from the StatusCode
// top byte (Good=0x00 → byte 192, Uncertain=0x40 → byte 64, Bad=0x80 → byte 0).
var qualityByte = (byte)((args.Snapshot.StatusCode >> 30) & 0x3) switch
{
0 => 192,
1 => 64,
_ => 0,
};
_probeWatcher.OnProbeValueChanged(args.FullReference, args.Snapshot.Value, (byte)qualityByte);
// The probe decoder takes a raw quality byte. Recover it via the canonical
// StatusCodeMap.ToQualityCategoryByte helper so the mapping lives in one
// place next to its inverse (FromQualityByte) and cannot desync silently.
var qualityByte = StatusCodeMap.ToQualityCategoryByte(args.Snapshot.StatusCode);
_probeWatcher.OnProbeValueChanged(args.FullReference, args.Snapshot.Value, qualityByte);
}
}
@@ -1030,57 +1051,81 @@ public sealed class GalaxyDriver
/// </summary>
private IGalaxyHierarchySource BuildDefaultHierarchySource()
{
var gw = _options.Gateway;
var clientOptions = new MxGatewayClientOptions
{
Endpoint = new Uri(gw.Endpoint, UriKind.Absolute),
ApiKey = ResolveApiKey(gw.ApiKeySecretRef),
UseTls = gw.UseTls,
CaCertificatePath = gw.CaCertificatePath,
ConnectTimeout = TimeSpan.FromSeconds(gw.ConnectTimeoutSeconds),
DefaultCallTimeout = TimeSpan.FromSeconds(gw.DefaultCallTimeoutSeconds),
StreamTimeout = gw.StreamTimeoutSeconds > 0
? TimeSpan.FromSeconds(gw.StreamTimeoutSeconds)
: null,
};
_ownedRepositoryClient = GalaxyRepositoryClient.Create(clientOptions);
// Driver.Galaxy-009 fix: reuse a client that StartDeployWatcher may have already
// created (??=) rather than always overwriting the field and leaking the first
// instance. Both paths produce equivalent clients from the same options.
_ownedRepositoryClient ??= GalaxyRepositoryClient.Create(BuildClientOptions(_options.Gateway));
return new TracedGalaxyHierarchySource(
new GatewayGalaxyHierarchySource(_ownedRepositoryClient), _options.MxAccess.ClientName);
}
public void Dispose()
/// <summary>
/// Asynchronous disposal. Prefer <c>await using</c> over <c>using</c> — the
/// async path does not block the caller while awaiting EventPump / session /
/// client shutdown (Driver.Galaxy-007: the sync path blocked on
/// <c>GetAwaiter().GetResult()</c> for every async sub-component, risking a
/// deadlock under thread-pool starvation).
/// </summary>
public async ValueTask DisposeAsync()
{
if (_disposed) return;
_disposed = true;
// Order: stop deploy watcher, supervisor, probe watcher, pump, then sessions and
// clients. Each step is best-effort — disposal during a faulted state shouldn't
// throw and prevent the rest of the cleanup.
// Synchronous sub-components first — none of these block.
try { _deployWatcher?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "DeployWatcher dispose failed"); }
try { _supervisor?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "ReconnectSupervisor dispose failed"); }
try { _probeWatcher?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "ProbeWatcher dispose failed"); }
try { _transportForwarder?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "Transport forwarder dispose failed"); }
// Async sub-components: await each so we don't block a thread-pool thread
// on a slow shutdown (e.g. EventPump draining its channel, gRPC stream closing).
EventPump? pump;
lock (_pumpLock) { pump = _eventPump; _eventPump = null; }
pump?.DisposeAsync().AsTask().GetAwaiter().GetResult();
if (pump is not null)
{
try { await pump.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "EventPump dispose failed"); }
}
IGalaxyAlarmFeed? alarmFeed;
lock (_alarmFeedLock) { alarmFeed = _alarmFeed; _alarmFeed = null; }
try { alarmFeed?.DisposeAsync().AsTask().GetAwaiter().GetResult(); }
if (alarmFeed is not null)
{
try { await alarmFeed.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "Alarm feed dispose failed"); }
}
_ownedMxSession?.DisposeAsync().AsTask().GetAwaiter().GetResult();
if (_ownedMxSession is not null)
{
try { await _ownedMxSession.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "MxSession dispose failed"); }
_ownedMxSession = null;
}
_ownedMxClient?.DisposeAsync().AsTask().GetAwaiter().GetResult();
if (_ownedMxClient is not null)
{
try { await _ownedMxClient.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "MxClient dispose failed"); }
_ownedMxClient = null;
}
_ownedRepositoryClient?.DisposeAsync().AsTask().GetAwaiter().GetResult();
if (_ownedRepositoryClient is not null)
{
try { await _ownedRepositoryClient.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "RepositoryClient dispose failed"); }
_ownedRepositoryClient = null;
}
_hierarchySource = null;
}
/// <summary>
/// Synchronous disposal. Prefer <see cref="DisposeAsync"/> in async contexts —
/// this path must block on every async sub-component shutdown. Provided for
/// compatibility with <c>using</c> statements that cannot <c>await</c>.
/// </summary>
public void Dispose() => DisposeAsync().AsTask().GetAwaiter().GetResult();
/// <summary>
/// Address-space builder wrapper that records each variable's
/// <see cref="DriverAttributeInfo.SecurityClass"/> into the supplied dictionary

View File

@@ -1,4 +1,5 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -75,18 +76,20 @@ internal static class StatusCodeMap
};
/// <summary>
/// Map a gateway-reported <see cref="MxStatusProxy"/> to OPC UA StatusCode. Honors
/// the success flag, then the detail byte (treated as a quality substatus), with a
/// transport-error fallback for status rows whose detected_by indicates the failure
/// happened before the MXAccess call ran.
/// Map a gateway-reported <see cref="MxStatusProxy"/> to OPC UA StatusCode. Uses
/// <see cref="MxStatusProxyExtensions.IsSuccess"/> (category == OK AND success != 0)
/// as the canonical success test — the proto contract explicitly documents that
/// <c>success</c> is NOT a boolean and must not be checked in isolation; category is
/// the authoritative indicator. On failure, the detail byte (OPC DA quality substatus)
/// drives the specific code, with a transport-error fallback for pre-MXAccess failures.
/// </summary>
public static uint FromMxStatus(MxStatusProxy? status, ILogger? logger = null)
{
if (status is null) return Good;
if (status.Success != 0) return Good;
if (status.IsSuccess()) return Good;
// Detail field carries the substatus when the worker translated MX-style codes;
// when zero, infer from category + detected_by.
// when zero, infer from detected_by.
var detail = (byte)(status.Detail & 0xFF);
if (detail != 0) return FromQualityByte(detail, logger);
@@ -98,6 +101,25 @@ internal static class StatusCodeMap
return Bad;
}
/// <summary>
/// Convert an OPC UA status-code uint back to the OPC DA quality category
/// byte — Good=192, Uncertain=64, Bad=0 — by extracting the top-two bits of the
/// high word. This is the inverse of the category-bucket arm of
/// <see cref="FromQualityByte"/>. It is intentionally lossy (substatus bits are not
/// round-tripped) because the sole consumer
/// (<see cref="ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Health.PerPlatformProbeWatcher"/>)
/// only tests <c>qualityByte &lt; 192</c> to distinguish Running from Stopped. Keeping
/// the round-trip in one place means a future change to the OPC UA bit layout cannot
/// silently desync the probe-health decode.
/// </summary>
public static byte ToQualityCategoryByte(uint statusCode) =>
(byte)(((statusCode >> 30) & 0x3u) switch
{
0u => 192u, // Good — top two bits 00b → OPC DA 0xC0
1u => 64u, // Uncertain — top two bits 01b → OPC DA 0x40
_ => 0u, // Bad — top two bits 10b/11b → OPC DA 0x00
});
private static uint Categorize(byte q, ILogger? logger)
{
if (q >= 192) { Log(logger, q, "Good"); return Good; }

View File

@@ -0,0 +1,112 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests;
/// <summary>
/// Driver-level infrastructure tests for findings Driver.Galaxy-007 (dispose-CTS gate)
/// and Driver.Galaxy-011 (GetMemoryFootprint non-zero estimate).
/// </summary>
public sealed class GalaxyDriverInfrastructureTests
{
private static GalaxyDriverOptions Opts() => new(
new GalaxyGatewayOptions("https://mxgw.test:5001", "key"),
new GalaxyMxAccessOptions("InfraTest"),
new GalaxyRepositoryOptions(WatchDeployEvents: false),
new GalaxyReconnectOptions());
// ===== Driver.Galaxy-011 regression: GetMemoryFootprint reflects registry size =====
[Fact]
public void GetMemoryFootprint_IsZeroWhenNoSubscriptions()
{
var sub = new NoOpSubscriber();
using var driver = new GalaxyDriver("drv-1", Opts(), null, null, null, sub);
driver.GetMemoryFootprint().ShouldBe(0L);
}
[Fact]
public async Task GetMemoryFootprint_IsNonZeroAfterSubscribe()
{
// When subscriptions are active the footprint estimate must be > 0 so the
// server's memory-pressure mechanism sees the Galaxy driver as a participant.
var sub = new NoOpSubscriber();
using var driver = new GalaxyDriver("drv-1", Opts(), null, null, null, sub);
await driver.SubscribeAsync(["Tag.A", "Tag.B"], TimeSpan.Zero, CancellationToken.None);
driver.GetMemoryFootprint().ShouldBeGreaterThan(0L);
}
[Fact]
public async Task GetMemoryFootprint_DecreasesAfterUnsubscribe()
{
var sub = new NoOpSubscriber();
using var driver = new GalaxyDriver("drv-1", Opts(), null, null, null, sub);
var handle = await driver.SubscribeAsync(["Tag.A", "Tag.B"], TimeSpan.Zero, CancellationToken.None);
var afterSubscribe = driver.GetMemoryFootprint();
afterSubscribe.ShouldBeGreaterThan(0L);
await driver.UnsubscribeAsync(handle, CancellationToken.None);
var afterUnsubscribe = driver.GetMemoryFootprint();
afterUnsubscribe.ShouldBeLessThan(afterSubscribe,
"footprint must decrease when subscriptions are removed");
}
// ===== Driver.Galaxy-007 regression: Dispose cancels the dispose CTS =====
[Fact]
public async Task Dispose_SetsDisposedFlag_BlockingFurtherCapabilityCalls()
{
var sub = new NoOpSubscriber();
var driver = new GalaxyDriver("drv-1", Opts(), null, null, null, sub);
driver.Dispose();
// Capability entry points all check ObjectDisposedException.ThrowIf — SubscribeAsync
// is representative and is guarded on line 1 of its body.
await Should.ThrowAsync<ObjectDisposedException>(() =>
driver.SubscribeAsync(["Tag.A"], TimeSpan.Zero, CancellationToken.None));
}
[Fact]
public async Task DisposeAsync_CanBeAwaitedWithoutDeadlock()
{
var sub = new NoOpSubscriber();
var driver = new GalaxyDriver("drv-1", Opts(), null, null, null, sub);
// IAsyncDisposable.DisposeAsync must not block or throw.
await Should.NotThrowAsync(async () => await driver.DisposeAsync());
}
// ===== Minimal IGalaxySubscriber fake that returns empty results for subscribe calls =====
private sealed class NoOpSubscriber : IGalaxySubscriber
{
private readonly Channel<MxEvent> _stream = Channel.CreateUnbounded<MxEvent>();
public Task<IReadOnlyList<SubscribeResult>> SubscribeBulkAsync(
IReadOnlyList<string> fullReferences, int bufferedUpdateIntervalMs, CancellationToken cancellationToken)
{
var results = fullReferences.Select((r, i) => new SubscribeResult
{
TagAddress = r,
ItemHandle = i + 1,
WasSuccessful = true,
}).ToList();
return Task.FromResult<IReadOnlyList<SubscribeResult>>(results);
}
public Task UnsubscribeBulkAsync(IReadOnlyList<int> itemHandles, CancellationToken cancellationToken)
=> Task.CompletedTask;
public IAsyncEnumerable<MxEvent> StreamEventsAsync(CancellationToken cancellationToken)
=> _stream.Reader.ReadAllAsync(cancellationToken);
}
}

View File

@@ -2,6 +2,7 @@ using MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
// MxStatusCategory needed for Driver.Galaxy-003 regression tests.
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.Runtime;
@@ -46,6 +47,12 @@ public sealed class StatusCodeMapTests
else mapped.ShouldBe(StatusCodeMap.Bad);
}
// ===== Driver.Galaxy-003 regression: FromMxStatus uses IsSuccess() (category + success) =====
// The proto doc: "clients should branch on category, not on a specific success value."
// IsSuccess() requires BOTH success != 0 AND category == Ok — checking success alone
// would invert the mapping when the worker sets success=1 for an error code (the numeric
// value is NOT a boolean).
[Fact]
public void FromMxStatus_NullStatus_IsGood()
{
@@ -53,16 +60,26 @@ public sealed class StatusCodeMapTests
}
[Fact]
public void FromMxStatus_SuccessNonZero_IsGood()
public void FromMxStatus_SuccessNonZeroAndCategoryOk_IsGood()
{
var s = new MxStatusProxy { Success = 1 };
// Both conditions required — this is the canonical OK path.
var s = new MxStatusProxy { Success = 1, Category = MxStatusCategory.Ok };
StatusCodeMap.FromMxStatus(s).ShouldBe(StatusCodeMap.Good);
}
[Fact]
public void FromMxStatus_SuccessZero_DetailKnown_MapsToSpecificCode()
public void FromMxStatus_SuccessNonZeroButCategoryNotOk_IsNotGood()
{
var s = new MxStatusProxy { Success = 0, Detail = 8 /* Bad_NotConnected */ };
// Bug fixed by Driver.Galaxy-003: success != 0 alone used to return Good even when
// category indicates an error. The proto says success is NOT a boolean — category wins.
var s = new MxStatusProxy { Success = 1, Category = MxStatusCategory.CommunicationError };
StatusCodeMap.FromMxStatus(s).ShouldNotBe(StatusCodeMap.Good);
}
[Fact]
public void FromMxStatus_SuccessZeroAndCategoryNotOk_DetailKnown_MapsToSpecificCode()
{
var s = new MxStatusProxy { Success = 0, Category = MxStatusCategory.OperationalError, Detail = 8 /* Bad_NotConnected */ };
StatusCodeMap.FromMxStatus(s).ShouldBe(StatusCodeMap.BadNotConnected);
}
@@ -95,4 +112,29 @@ public sealed class StatusCodeMapTests
((StatusCodeMap.BadNotConnected >> 30) & 0x3u).ShouldBe(2u);
((StatusCodeMap.BadOutOfService >> 30) & 0x3u).ShouldBe(2u);
}
// ===== Driver.Galaxy-004 regression: ToQualityCategoryByte lives next to its inverse =====
[Theory]
[InlineData(0x00000000u, (byte)192)] // Good
[InlineData(0x00D80000u, (byte)192)] // GoodLocalOverride — still Good category
[InlineData(0x40000000u, (byte)64)] // Uncertain
[InlineData(0x408F0000u, (byte)64)] // UncertainSubNormal — still Uncertain category
[InlineData(0x80000000u, (byte)0)] // Bad
[InlineData(0x808A0000u, (byte)0)] // BadNotConnected — still Bad category
[InlineData(0x80020000u, (byte)0)] // BadInternalError — still Bad category
public void ToQualityCategoryByte_ExtractsTopTwoBitsAsOpcDaByte(uint statusCode, byte expected)
{
StatusCodeMap.ToQualityCategoryByte(statusCode).ShouldBe(expected);
}
[Fact]
public void ToQualityCategoryByte_IsRightInverseOfFromQualityByte_ForCategoryBytes()
{
// The category bytes the probe watcher uses are 0, 64, 192. Round-trip: a value
// that came FROM those bytes should map back to the same byte.
StatusCodeMap.ToQualityCategoryByte(StatusCodeMap.FromQualityByte(0)).ShouldBe((byte)0);
StatusCodeMap.ToQualityCategoryByte(StatusCodeMap.FromQualityByte(64)).ShouldBe((byte)64);
StatusCodeMap.ToQualityCategoryByte(StatusCodeMap.FromQualityByte(192)).ShouldBe((byte)192);
}
}