Resolve 6 of 7 stability review findings and close test coverage gaps
Fixes P1 StaComThread hang (crash-path faulting via WorkItem queue), P1 subscription fire-and-forget (block+log or ContinueWith on 5 call sites), P2 continuation point leak (PurgeExpired on Retrieve/Release), P2 dashboard bind failure (localhost prefix, bool Start), P3 background loop double-start (task handles + join on stop in 3 files), and P3 config logging exposure (SqlConnectionStringBuilder password masking). Adds FakeMxAccessClient fault injection and 12 new tests. Documents required runtime assemblies in ServiceHosting.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -151,6 +151,24 @@ ZB.MOM.WW.LmxOpcUa.Host.exe install -servicename "LmxOpcUa2" -displayname "LMX O
|
||||
|
||||
See [Redundancy Guide](Redundancy.md) for full deployment details.
|
||||
|
||||
## Required Runtime Assemblies
|
||||
|
||||
The build uses Costura.Fody to embed all NuGet dependencies into the single `ZB.MOM.WW.LmxOpcUa.Host.exe`. However, the following ArchestrA and Historian DLLs are **excluded from embedding** and must be present alongside the executable at runtime:
|
||||
|
||||
| Assembly | Purpose |
|
||||
|----------|---------|
|
||||
| `ArchestrA.MxAccess.dll` | MXAccess COM interop — runtime data access to Galaxy tags |
|
||||
| `aahClientManaged.dll` | Wonderware Historian managed SDK — historical data queries |
|
||||
| `aahClient.dll` | Historian native dependency |
|
||||
| `aahClientCommon.dll` | Historian native dependency |
|
||||
| `Historian.CBE.dll` | Historian native dependency |
|
||||
| `Historian.DPAPI.dll` | Historian native dependency |
|
||||
| `ArchestrA.CloudHistorian.Contract.dll` | Historian contract dependency |
|
||||
|
||||
These DLLs are sourced from the `lib/` folder in the repository and are copied to the build output directory automatically. When deploying, ensure all seven DLLs are in the same directory as the executable.
|
||||
|
||||
These assemblies are not redistributable — they are provided by the AVEVA System Platform and Historian installations on the target machine. The copies in `lib/` are taken from `Program Files (x86)\ArchestrA\Framework\bin` on a machine with the platform installed.
|
||||
|
||||
## Platform Target
|
||||
|
||||
The service must be compiled and run as x86 (32-bit). The MXAccess COM toolkit DLLs in `Program Files (x86)\ArchestrA\Framework\bin` are 32-bit only. Running the service as x64 or AnyCPU (64-bit preferred) causes COM interop failures when creating the `LMXProxyServer` object on the STA thread.
|
||||
|
||||
243
docs/stability-review-20260407.md
Normal file
243
docs/stability-review-20260407.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Stability Review
|
||||
|
||||
Date: 2026-04-07
|
||||
|
||||
Scope:
|
||||
- Service startup/shutdown lifecycle
|
||||
- MXAccess threading and reconnect behavior
|
||||
- OPC UA node manager request paths
|
||||
- Historian history-read paths
|
||||
- Status dashboard hosting
|
||||
- Test coverage around the above
|
||||
|
||||
## Findings
|
||||
|
||||
### P1: `StaComThread` can leave callers blocked forever after pump failure or shutdown races
|
||||
|
||||
Evidence:
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/StaComThread.cs:106`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/StaComThread.cs:132`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/StaComThread.cs:178`
|
||||
|
||||
Details:
|
||||
- `RunAsync` enqueues work and calls `PostThreadMessage(...)`, but it ignores the returned `bool`.
|
||||
- If the STA thread has already exited, or if posting fails for any other reason, the queued `TaskCompletionSource` is never completed or faulted.
|
||||
- `ThreadEntry` logs pump crashes, but it does not drain/fault queued work, does not reset `_nativeThreadId`, and does not prevent later calls from queueing more work unless `Dispose()` happened first.
|
||||
|
||||
Note:
|
||||
- Clean shutdown via `Dispose()` posts `WM_APP+1`, which calls `DrainQueue()` before `PostQuitMessage`, so queued work is drained on the normal shutdown path. The gap is crash and unexpected-exit paths only.
|
||||
- `ThreadEntry` catches crashes and calls `_ready.TrySetException(ex)`, but `_ready` is only awaited during `Start()`. A crash *after* startup completes does not fault any pending or future callers.
|
||||
|
||||
Impact:
|
||||
- Any caller waiting synchronously on these tasks can hang indefinitely after a pump crash (not a clean shutdown).
|
||||
- This is especially dangerous because higher layers regularly use `.GetAwaiter().GetResult()` during connect, disconnect, rebuild, and request processing.
|
||||
|
||||
Recommendation:
|
||||
- Check the return value of `PostThreadMessage`.
|
||||
- If post fails, remove/fault the queued work item immediately.
|
||||
- Mark the worker unusable when the pump exits unexpectedly and fault all remaining queued items.
|
||||
- Add a shutdown/crash-path test that verifies queued callers fail fast instead of hanging.
|
||||
|
||||
**Status: Resolved (2026-04-07)**
|
||||
Fix: Refactored queue to `WorkItem` type with separate `Execute`/`Fault` actions. Added `_pumpExited` flag set in `ThreadEntry` finally block. `DrainAndFaultQueue()` faults all pending TCS instances without executing user actions. `RunAsync` checks `_pumpExited` before enqueueing. `PostThreadMessage` return value is checked — false triggers drain-and-fault. Added crash-path test via `PostQuitMessage`.
|
||||
|
||||
### P1: `LmxNodeManager` discards subscription tasks, so failures can be silent
|
||||
|
||||
Evidence:
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:396`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1906`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1934`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1960`
|
||||
- `tests/ZB.MOM.WW.LmxOpcUa.Tests/Helpers/FakeMxAccessClient.cs:83`
|
||||
|
||||
Details:
|
||||
- Several subscription and unsubscription calls are fire-and-forget.
|
||||
- `SubscribeAlarmTags()` even wraps `SubscribeAsync(...)` in `try/catch`, but because the returned task is not awaited, asynchronous failures bypass that catch.
|
||||
- The test suite mostly uses `FakeMxAccessClient`, whose subscribe/unsubscribe methods complete immediately, so these failure paths are not exercised.
|
||||
|
||||
Impact:
|
||||
- A failed runtime subscribe/unsubscribe can silently leave monitored OPC UA items stale or orphaned.
|
||||
- The service can appear healthy while live updates quietly stop flowing for part of the address space.
|
||||
|
||||
Note:
|
||||
- `LmxNodeManager` also runs a separate `_dataChangeDispatchThread` that batches MXAccess callbacks into OPC UA value updates. Subscription failures upstream mean this thread will simply never receive data for the affected tags, with no indication of the gap. Failures should be cross-referenced with dispatch-thread health to surface silent data loss.
|
||||
|
||||
Recommendation:
|
||||
- Stop discarding these tasks.
|
||||
- If the boundary must remain synchronous, centralize the wait and log/fail deterministically.
|
||||
- Add tests that inject asynchronously failing subscribe/unsubscribe operations.
|
||||
|
||||
**Status: Resolved (2026-04-07)**
|
||||
Fix: `SubscribeTag` and `UnsubscribeTag` (critical monitored-item paths) now use `.GetAwaiter().GetResult()` with try/catch logging. `SubscribeAlarmTags`, `BuildSubtree` alarm subscribes, and `RestoreTransferredSubscriptions` (batch paths) now use `.ContinueWith(OnlyOnFaulted)` to log failures instead of silently discarding tasks.
|
||||
|
||||
### P2: history continuation points can leak memory after expiry
|
||||
|
||||
Evidence:
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Historian/HistoryContinuationPoint.cs:23`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Historian/HistoryContinuationPoint.cs:66`
|
||||
- `tests/ZB.MOM.WW.LmxOpcUa.Tests/Historian/HistoryContinuationPointTests.cs:25`
|
||||
|
||||
Details:
|
||||
- Expired continuation points are purged only from `Store()`.
|
||||
- If a client requests continuation points and then never resumes or releases them, the stored `List<DataValue>` instances remain in memory until another `Store()` happens.
|
||||
- Existing tests cover store/retrieve/release but do not cover expiration or reclamation.
|
||||
|
||||
Impact:
|
||||
- A burst of abandoned history reads can retain large result sets in memory until the next `Store()` call triggers `PurgeExpired()`. On an otherwise idle system with no new history reads, this retention is indefinite.
|
||||
|
||||
Recommendation:
|
||||
- Purge expired entries on `Retrieve()` and `Release()`.
|
||||
- Consider a periodic sweep or a hard cap on stored continuation payloads.
|
||||
- Add an expiry-focused test.
|
||||
|
||||
**Status: Resolved (2026-04-07)**
|
||||
Fix: `PurgeExpired()` now called at the start of both `Retrieve()` and `Release()`. Added internal constructor accepting `TimeSpan timeout` for testability. Added two expiry-focused tests.
|
||||
|
||||
### P2: the status dashboard can fail to bind and disable itself silently
|
||||
|
||||
Evidence:
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Status/StatusWebServer.cs:53`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Status/StatusWebServer.cs:64`
|
||||
- `tests/ZB.MOM.WW.LmxOpcUa.Tests/Status/StatusWebServerTests.cs:30`
|
||||
- `tests/ZB.MOM.WW.LmxOpcUa.Tests/Status/StatusWebServerTests.cs:149`
|
||||
|
||||
Observed test result:
|
||||
- `dotnet test tests\ZB.MOM.WW.LmxOpcUa.Tests\ZB.MOM.WW.LmxOpcUa.Tests.csproj --no-restore --filter StatusWebServerTests`
|
||||
- Result: 9 failed, 0 passed
|
||||
- Failures were all consistent with the listener not starting (`IsRunning == false`, connection refused).
|
||||
|
||||
Details:
|
||||
- `Start()` swallows startup exceptions, logs a warning, and leaves `_listener = null`.
|
||||
- The code binds `http://+:{port}/`, which is more permission-sensitive than a narrower host-specific prefix.
|
||||
- Callers get no explicit failure signal, so the dashboard can simply vanish at runtime.
|
||||
|
||||
Impact:
|
||||
- Operators and external checks can assume a dashboard exists when it does not.
|
||||
- Health visibility degrades exactly when the service most needs diagnosability.
|
||||
|
||||
Note:
|
||||
- The `http://+:{port}/` wildcard prefix requires either administrator privileges or a pre-configured URL ACL (`netsh http add urlacl`). This is also the likely cause of the 9/9 test failures — tests run without elevation will always fail to bind.
|
||||
|
||||
Recommendation:
|
||||
- Fail fast, or at least return an explicit startup status.
|
||||
- Default to `http://localhost:{port}/` unless wildcard binding is explicitly configured — this avoids the ACL requirement for single-machine deployments and fixes the test suite without special privileges.
|
||||
- Add a startup test that asserts the service reports bind failures clearly.
|
||||
|
||||
**Status: Resolved (2026-04-07)**
|
||||
Fix: Changed prefix from `http://+:{port}/` to `http://localhost:{port}/`. `Start()` now returns `bool`. Bind failure logged at Error level. Test suite now passes 9/9.
|
||||
|
||||
### P2: blocking remote I/O is performed directly in request and rebuild paths
|
||||
|
||||
Evidence:
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:586`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:617`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:641`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1228`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1289`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1386`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1526`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1601`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1655`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:1718`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Historian/HistorianDataSource.cs:175`
|
||||
|
||||
Details:
|
||||
- OPC UA read/write/history handlers synchronously block on MXAccess and Historian calls.
|
||||
- Incremental sync also performs blocking subscribe/unsubscribe operations while holding the node-manager lock.
|
||||
- Historian connection establishment uses polling plus `Thread.Sleep(250)`, so slow connects directly occupy request threads.
|
||||
|
||||
Impact:
|
||||
- Slow runtime dependencies can starve OPC UA worker threads and make rebuilds stall the namespace lock.
|
||||
- This is not just a latency issue; it turns transient backend slowness into whole-service responsiveness problems.
|
||||
|
||||
Recommendation:
|
||||
- Move I/O out of locked sections.
|
||||
- Propagate cancellation/timeouts explicitly through the request path.
|
||||
- Add load/fault tests against the real async MXAccess client behavior, not only synchronous fakes.
|
||||
|
||||
### P3: several background loops can be started multiple times and are not joined on shutdown
|
||||
|
||||
Evidence:
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/GalaxyRepository/ChangeDetectionService.cs:58`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.Monitor.cs:15`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Status/StatusWebServer.cs:57`
|
||||
|
||||
Details:
|
||||
- `Start()` methods overwrite cancellation tokens and launch `Task.Run(...)` without keeping the returned `Task`.
|
||||
- Calling `Start()` twice leaks the earlier loop and its CTS.
|
||||
- `Stop()` only cancels; it does not wait for loop completion.
|
||||
|
||||
Impact:
|
||||
- Duplicate starts or restart paths become nondeterministic.
|
||||
- Shutdown can race active loops that are still touching shared state.
|
||||
|
||||
Recommendation:
|
||||
- Guard against duplicate starts.
|
||||
- Keep the background task handle and wait for orderly exit during stop/dispose.
|
||||
|
||||
**Status: Resolved (2026-04-07)**
|
||||
Fix: All three services (`ChangeDetectionService`, `MxAccessClient.Monitor`, `StatusWebServer`) now store the `Task` returned by `Task.Run`. `Start()` cancels+joins any previous loop before launching a new one. `Stop()` cancels the token and waits on the task with a 5-second timeout.
|
||||
|
||||
### P3: startup logging exposes sensitive configuration
|
||||
|
||||
Evidence:
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/ConfigurationValidator.cs:71`
|
||||
- `src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/ConfigurationValidator.cs:118`
|
||||
|
||||
Details:
|
||||
- The validator logs the full Galaxy repository connection string and detailed authentication-related settings.
|
||||
- In many deployments, the connection string will contain credentials.
|
||||
|
||||
Impact:
|
||||
- Credential exposure in logs increases operational risk and complicates incident handling.
|
||||
|
||||
Details on scope:
|
||||
- The primary exposure is `GalaxyRepository.ConnectionString` logged verbatim at `ConfigurationValidator.cs:72`. When using SQL authentication, this contains the password in the connection string.
|
||||
- Historian credentials (`UserName`/`Password`) are checked for emptiness but not logged as values — this section is safe.
|
||||
- LDAP `ServiceAccountDn` is checked for emptiness but not logged as a value — also safe.
|
||||
|
||||
Recommendation:
|
||||
- Redact secrets before logging. Parse the connection string and mask or omit password segments.
|
||||
- Log connection targets (server, database) and non-sensitive settings only.
|
||||
|
||||
**Status: Resolved (2026-04-07)**
|
||||
Fix: Added `SanitizeConnectionString` helper using `SqlConnectionStringBuilder` to mask passwords with `********`. Falls back to `(unparseable)` if the string can't be parsed.
|
||||
|
||||
## Test Coverage Gaps
|
||||
|
||||
### ~~Real async failure modes are under-tested~~ (Resolved)
|
||||
|
||||
`FakeMxAccessClient` now supports fault injection via `SubscribeException`, `UnsubscribeException`, `ReadException`, and `WriteException` properties. When set, the corresponding async methods return `Task.FromException`. Three tests in `LmxNodeManagerSubscriptionFaultTests` verify that subscribe/unsubscribe faults are caught and logged instead of silently discarded, and that ref-count bookkeeping survives a transient fault.
|
||||
|
||||
### ~~Historian lifecycle coverage is minimal~~ (Partially resolved)
|
||||
|
||||
Six lifecycle tests added in `HistorianDataSourceLifecycleTests`: post-dispose rejection for all four read methods (`ReadRawAsync`, `ReadAggregateAsync`, `ReadAtTimeAsync`, `ReadEventsAsync`), double-dispose idempotency, and aggregate column mapping.
|
||||
|
||||
Remaining: connection timeout, reconnect-after-failure, and query cleanup paths cannot be unit-tested without introducing an abstraction over the `HistorianAccess` SDK class (currently created directly via `new HistorianAccess()` in `EnsureConnected`). Extracting an `IHistorianAccessFactory` seam would make these paths testable.
|
||||
|
||||
### ~~Continuation-point expiry is not tested~~ (Resolved)
|
||||
|
||||
Two expiry tests added: `Retrieve_ExpiredContinuationPoint_ReturnsNull` and `Release_PurgesExpiredEntries`.
|
||||
|
||||
## Commands Run
|
||||
|
||||
Successful:
|
||||
- `dotnet test tests\ZB.MOM.WW.LmxOpcUa.Tests\ZB.MOM.WW.LmxOpcUa.Tests.csproj --no-restore --filter HistoryContinuationPointTests`
|
||||
- `dotnet test tests\ZB.MOM.WW.LmxOpcUa.Tests\ZB.MOM.WW.LmxOpcUa.Tests.csproj --no-restore --filter ChangeDetectionServiceTests`
|
||||
- `dotnet test tests\ZB.MOM.WW.LmxOpcUa.Tests\ZB.MOM.WW.LmxOpcUa.Tests.csproj --no-restore --filter StaComThreadTests`
|
||||
|
||||
Failed:
|
||||
- `dotnet test tests\ZB.MOM.WW.LmxOpcUa.Tests\ZB.MOM.WW.LmxOpcUa.Tests.csproj --no-restore --filter StatusWebServerTests`
|
||||
|
||||
Timed out:
|
||||
- `dotnet test tests\ZB.MOM.WW.LmxOpcUa.Tests\ZB.MOM.WW.LmxOpcUa.Tests.csproj --no-restore`
|
||||
|
||||
## Bottom Line
|
||||
|
||||
The most serious risks are not style issues. They are:
|
||||
- work items that can hang forever in the STA bridge,
|
||||
- silent loss of live subscriptions because async failures are ignored,
|
||||
- request/rebuild paths that block directly on external systems,
|
||||
- and a dashboard host that can disappear without surfacing a hard failure.
|
||||
|
||||
Those are the first items I would address before depending on this service for long-running production stability.
|
||||
@@ -1,4 +1,5 @@
|
||||
using System;
|
||||
using System.Data.SqlClient;
|
||||
using System.Linq;
|
||||
using Opc.Ua;
|
||||
using Serilog;
|
||||
@@ -70,7 +71,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Configuration
|
||||
// Galaxy Repository
|
||||
Log.Information(
|
||||
"GalaxyRepository.ConnectionString={ConnectionString}, ChangeDetectionInterval={ChangeInterval}s, CommandTimeout={CmdTimeout}s, ExtendedAttributes={ExtendedAttributes}",
|
||||
config.GalaxyRepository.ConnectionString, config.GalaxyRepository.ChangeDetectionIntervalSeconds,
|
||||
SanitizeConnectionString(config.GalaxyRepository.ConnectionString), config.GalaxyRepository.ChangeDetectionIntervalSeconds,
|
||||
config.GalaxyRepository.CommandTimeoutSeconds, config.GalaxyRepository.ExtendedAttributes);
|
||||
|
||||
if (string.IsNullOrWhiteSpace(config.GalaxyRepository.ConnectionString))
|
||||
@@ -210,5 +211,22 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Configuration
|
||||
Log.Information("=== Configuration {Status} ===", valid ? "Valid" : "INVALID");
|
||||
return valid;
|
||||
}
|
||||
|
||||
private static string SanitizeConnectionString(string connectionString)
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(connectionString))
|
||||
return "(empty)";
|
||||
try
|
||||
{
|
||||
var builder = new SqlConnectionStringBuilder(connectionString);
|
||||
if (!string.IsNullOrEmpty(builder.Password))
|
||||
builder.Password = "********";
|
||||
return builder.ConnectionString;
|
||||
}
|
||||
catch
|
||||
{
|
||||
return "(unparseable)";
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -16,6 +16,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.GalaxyRepository
|
||||
|
||||
private readonly IGalaxyRepository _repository;
|
||||
private CancellationTokenSource? _cts;
|
||||
private Task? _pollTask;
|
||||
|
||||
/// <summary>
|
||||
/// Initializes a new change detector for Galaxy deploy timestamps.
|
||||
@@ -55,8 +56,11 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.GalaxyRepository
|
||||
/// </summary>
|
||||
public void Start()
|
||||
{
|
||||
if (_cts != null)
|
||||
Stop();
|
||||
|
||||
_cts = new CancellationTokenSource();
|
||||
Task.Run(() => PollLoopAsync(_cts.Token));
|
||||
_pollTask = Task.Run(() => PollLoopAsync(_cts.Token));
|
||||
Log.Information("Change detection started (interval={Interval}s)", _intervalSeconds);
|
||||
}
|
||||
|
||||
@@ -66,6 +70,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.GalaxyRepository
|
||||
public void Stop()
|
||||
{
|
||||
_cts?.Cancel();
|
||||
try { _pollTask?.Wait(TimeSpan.FromSeconds(5)); } catch { /* timeout or faulted */ }
|
||||
_pollTask = null;
|
||||
Log.Information("Change detection stopped");
|
||||
}
|
||||
|
||||
|
||||
@@ -15,7 +15,14 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
private static readonly ILogger Log = Serilog.Log.ForContext<HistoryContinuationPointManager>();
|
||||
|
||||
private readonly ConcurrentDictionary<Guid, StoredContinuation> _store = new();
|
||||
private readonly TimeSpan _timeout = TimeSpan.FromMinutes(5);
|
||||
private readonly TimeSpan _timeout;
|
||||
|
||||
public HistoryContinuationPointManager() : this(TimeSpan.FromMinutes(5)) { }
|
||||
|
||||
internal HistoryContinuationPointManager(TimeSpan timeout)
|
||||
{
|
||||
_timeout = timeout;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Stores remaining data values and returns a continuation point identifier.
|
||||
@@ -35,6 +42,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
/// </summary>
|
||||
public List<DataValue>? Retrieve(byte[] continuationPoint)
|
||||
{
|
||||
PurgeExpired();
|
||||
if (continuationPoint == null || continuationPoint.Length != 16)
|
||||
return null;
|
||||
|
||||
@@ -56,6 +64,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
/// </summary>
|
||||
public void Release(byte[] continuationPoint)
|
||||
{
|
||||
PurgeExpired();
|
||||
if (continuationPoint == null || continuationPoint.Length != 16)
|
||||
return;
|
||||
|
||||
|
||||
@@ -7,13 +7,18 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
{
|
||||
public sealed partial class MxAccessClient
|
||||
{
|
||||
private Task? _monitorTask;
|
||||
|
||||
/// <summary>
|
||||
/// Starts the background monitor that reconnects dropped sessions and watches the probe tag for staleness.
|
||||
/// </summary>
|
||||
public void StartMonitor()
|
||||
{
|
||||
if (_monitorCts != null)
|
||||
StopMonitor();
|
||||
|
||||
_monitorCts = new CancellationTokenSource();
|
||||
Task.Run(() => MonitorLoopAsync(_monitorCts.Token));
|
||||
_monitorTask = Task.Run(() => MonitorLoopAsync(_monitorCts.Token));
|
||||
Log.Information("MxAccess monitor started (interval={Interval}s)", _config.MonitorIntervalSeconds);
|
||||
}
|
||||
|
||||
@@ -23,6 +28,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
public void StopMonitor()
|
||||
{
|
||||
_monitorCts?.Cancel();
|
||||
try { _monitorTask?.Wait(TimeSpan.FromSeconds(5)); } catch { /* timeout or faulted */ }
|
||||
_monitorTask = null;
|
||||
}
|
||||
|
||||
private async Task MonitorLoopAsync(CancellationToken ct)
|
||||
|
||||
@@ -21,12 +21,13 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
private readonly TaskCompletionSource<bool> _ready = new();
|
||||
|
||||
private readonly Thread _thread;
|
||||
private readonly ConcurrentQueue<Action> _workItems = new();
|
||||
private readonly ConcurrentQueue<WorkItem> _workItems = new();
|
||||
private long _appMessages;
|
||||
private long _dispatchedMessages;
|
||||
private bool _disposed;
|
||||
private DateTime _lastLogTime;
|
||||
private volatile uint _nativeThreadId;
|
||||
private volatile bool _pumpExited;
|
||||
|
||||
private long _totalMessages;
|
||||
private long _workItemsExecuted;
|
||||
@@ -47,7 +48,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
/// <summary>
|
||||
/// Gets a value indicating whether the STA thread is running and able to accept work.
|
||||
/// </summary>
|
||||
public bool IsRunning => _nativeThreadId != 0 && !_disposed;
|
||||
public bool IsRunning => _nativeThreadId != 0 && !_disposed && !_pumpExited;
|
||||
|
||||
/// <summary>
|
||||
/// Stops the STA thread and releases the message-pump resources used for COM interop.
|
||||
@@ -59,7 +60,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
|
||||
try
|
||||
{
|
||||
if (_nativeThreadId != 0)
|
||||
if (_nativeThreadId != 0 && !_pumpExited)
|
||||
PostThreadMessage(_nativeThreadId, WM_APP + 1, IntPtr.Zero, IntPtr.Zero);
|
||||
_thread.Join(TimeSpan.FromSeconds(5));
|
||||
}
|
||||
@@ -68,6 +69,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
Log.Warning(ex, "Error shutting down STA COM thread");
|
||||
}
|
||||
|
||||
DrainAndFaultQueue();
|
||||
Log.Information("STA COM thread stopped");
|
||||
}
|
||||
|
||||
@@ -89,9 +91,12 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
public Task RunAsync(Action action)
|
||||
{
|
||||
if (_disposed) throw new ObjectDisposedException(nameof(StaComThread));
|
||||
if (_pumpExited) throw new InvalidOperationException("STA COM thread pump has exited");
|
||||
|
||||
var tcs = new TaskCompletionSource<bool>();
|
||||
_workItems.Enqueue(() =>
|
||||
_workItems.Enqueue(new WorkItem
|
||||
{
|
||||
Execute = () =>
|
||||
{
|
||||
try
|
||||
{
|
||||
@@ -102,8 +107,16 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
{
|
||||
tcs.TrySetException(ex);
|
||||
}
|
||||
},
|
||||
Fault = ex => tcs.TrySetException(ex)
|
||||
});
|
||||
PostThreadMessage(_nativeThreadId, WM_APP, IntPtr.Zero, IntPtr.Zero);
|
||||
|
||||
if (!PostThreadMessage(_nativeThreadId, WM_APP, IntPtr.Zero, IntPtr.Zero))
|
||||
{
|
||||
_pumpExited = true;
|
||||
DrainAndFaultQueue();
|
||||
}
|
||||
|
||||
return tcs.Task;
|
||||
}
|
||||
|
||||
@@ -116,9 +129,12 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
public Task<T> RunAsync<T>(Func<T> func)
|
||||
{
|
||||
if (_disposed) throw new ObjectDisposedException(nameof(StaComThread));
|
||||
if (_pumpExited) throw new InvalidOperationException("STA COM thread pump has exited");
|
||||
|
||||
var tcs = new TaskCompletionSource<T>();
|
||||
_workItems.Enqueue(() =>
|
||||
_workItems.Enqueue(new WorkItem
|
||||
{
|
||||
Execute = () =>
|
||||
{
|
||||
try
|
||||
{
|
||||
@@ -128,8 +144,16 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
{
|
||||
tcs.TrySetException(ex);
|
||||
}
|
||||
},
|
||||
Fault = ex => tcs.TrySetException(ex)
|
||||
});
|
||||
PostThreadMessage(_nativeThreadId, WM_APP, IntPtr.Zero, IntPtr.Zero);
|
||||
|
||||
if (!PostThreadMessage(_nativeThreadId, WM_APP, IntPtr.Zero, IntPtr.Zero))
|
||||
{
|
||||
_pumpExited = true;
|
||||
DrainAndFaultQueue();
|
||||
}
|
||||
|
||||
return tcs.Task;
|
||||
}
|
||||
|
||||
@@ -180,6 +204,11 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
Log.Error(ex, "STA COM thread crashed");
|
||||
_ready.TrySetException(ex);
|
||||
}
|
||||
finally
|
||||
{
|
||||
_pumpExited = true;
|
||||
DrainAndFaultQueue();
|
||||
}
|
||||
}
|
||||
|
||||
private void DrainQueue()
|
||||
@@ -189,7 +218,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
_workItemsExecuted++;
|
||||
try
|
||||
{
|
||||
workItem();
|
||||
workItem.Execute();
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
@@ -198,6 +227,22 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
}
|
||||
}
|
||||
|
||||
private void DrainAndFaultQueue()
|
||||
{
|
||||
var faultException = new InvalidOperationException("STA COM thread pump has exited");
|
||||
while (_workItems.TryDequeue(out var workItem))
|
||||
{
|
||||
try
|
||||
{
|
||||
workItem.Fault(faultException);
|
||||
}
|
||||
catch
|
||||
{
|
||||
// Faulting a TCS should not throw, but guard against it
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private void LogPumpStatsIfDue()
|
||||
{
|
||||
var now = DateTime.UtcNow;
|
||||
@@ -208,6 +253,12 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.MxAccess
|
||||
_lastLogTime = now;
|
||||
}
|
||||
|
||||
private sealed class WorkItem
|
||||
{
|
||||
public Action Execute { get; set; }
|
||||
public Action<Exception> Fault { get; set; }
|
||||
}
|
||||
|
||||
#region Win32 PInvoke
|
||||
|
||||
[StructLayout(LayoutKind.Sequential)]
|
||||
|
||||
@@ -3,6 +3,7 @@ using System.Collections.Concurrent;
|
||||
using System.Collections.Generic;
|
||||
using System.Linq;
|
||||
using System.Threading;
|
||||
using System.Threading.Tasks;
|
||||
using Opc.Ua;
|
||||
using Opc.Ua.Server;
|
||||
using Serilog;
|
||||
@@ -391,14 +392,11 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
{
|
||||
if (string.IsNullOrEmpty(tag) || !_tagToVariableNode.ContainsKey(tag))
|
||||
continue;
|
||||
try
|
||||
{
|
||||
_mxAccessClient.SubscribeAsync(tag, (_, _) => { });
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Failed to auto-subscribe to alarm tag {Tag}", tag);
|
||||
}
|
||||
var alarmTag = tag;
|
||||
_mxAccessClient.SubscribeAsync(alarmTag, (_, _) => { })
|
||||
.ContinueWith(t => Log.Warning(t.Exception?.InnerException,
|
||||
"Failed to auto-subscribe to alarm tag {Tag}", alarmTag),
|
||||
TaskContinuationOptions.OnlyOnFaulted);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -895,14 +893,11 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
{
|
||||
if (string.IsNullOrEmpty(tag) || !_tagToVariableNode.ContainsKey(tag))
|
||||
continue;
|
||||
try
|
||||
{
|
||||
_mxAccessClient.SubscribeAsync(tag, (_, _) => { });
|
||||
}
|
||||
catch
|
||||
{
|
||||
/* ignore */
|
||||
}
|
||||
var subtreeAlarmTag = tag;
|
||||
_mxAccessClient.SubscribeAsync(subtreeAlarmTag, (_, _) => { })
|
||||
.ContinueWith(t => Log.Warning(t.Exception?.InnerException,
|
||||
"Failed to subscribe alarm tag in subtree {Tag}", subtreeAlarmTag),
|
||||
TaskContinuationOptions.OnlyOnFaulted);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1903,7 +1898,16 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
|
||||
if (shouldSubscribe)
|
||||
_ = _mxAccessClient.SubscribeAsync(fullTagReference, (_, _) => { });
|
||||
{
|
||||
try
|
||||
{
|
||||
_mxAccessClient.SubscribeAsync(fullTagReference, (_, _) => { }).GetAwaiter().GetResult();
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Failed to subscribe tag {Tag}", fullTagReference);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -1931,7 +1935,16 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
|
||||
if (shouldUnsubscribe)
|
||||
_ = _mxAccessClient.UnsubscribeAsync(fullTagReference);
|
||||
{
|
||||
try
|
||||
{
|
||||
_mxAccessClient.UnsubscribeAsync(fullTagReference).GetAwaiter().GetResult();
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Failed to unsubscribe tag {Tag}", fullTagReference);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -1957,7 +1970,13 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
|
||||
foreach (var tagRef in tagsToSubscribe)
|
||||
_ = _mxAccessClient.SubscribeAsync(tagRef, (_, _) => { });
|
||||
{
|
||||
var transferTag = tagRef;
|
||||
_mxAccessClient.SubscribeAsync(transferTag, (_, _) => { })
|
||||
.ContinueWith(t => Log.Warning(t.Exception?.InnerException,
|
||||
"Failed to restore subscription for transferred tag {Tag}", transferTag),
|
||||
TaskContinuationOptions.OnlyOnFaulted);
|
||||
}
|
||||
}
|
||||
|
||||
private void OnMxAccessDataChange(string address, Vtq vtq)
|
||||
|
||||
@@ -18,6 +18,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
private readonly StatusReportService _reportService;
|
||||
private CancellationTokenSource? _cts;
|
||||
private HttpListener? _listener;
|
||||
private Task? _listenTask;
|
||||
|
||||
/// <summary>
|
||||
/// Initializes a new dashboard web server bound to the supplied report service and HTTP port.
|
||||
@@ -46,23 +47,25 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
/// <summary>
|
||||
/// Starts the HTTP listener and background request loop for the status dashboard.
|
||||
/// </summary>
|
||||
public void Start()
|
||||
public bool Start()
|
||||
{
|
||||
try
|
||||
{
|
||||
_listener = new HttpListener();
|
||||
_listener.Prefixes.Add($"http://+:{_port}/");
|
||||
_listener.Prefixes.Add($"http://localhost:{_port}/");
|
||||
_listener.Start();
|
||||
|
||||
_cts = new CancellationTokenSource();
|
||||
Task.Run(() => ListenLoopAsync(_cts.Token));
|
||||
_listenTask = Task.Run(() => ListenLoopAsync(_cts.Token));
|
||||
|
||||
Log.Information("Status dashboard started on http://localhost:{Port}/", _port);
|
||||
return true;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Failed to start status dashboard on port {Port}", _port);
|
||||
Log.Error(ex, "Failed to start status dashboard on port {Port}", _port);
|
||||
_listener = null;
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -83,6 +86,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
}
|
||||
|
||||
_listener = null;
|
||||
try { _listenTask?.Wait(TimeSpan.FromSeconds(5)); } catch { /* timeout or faulted */ }
|
||||
_listenTask = null;
|
||||
Log.Information("Status dashboard stopped");
|
||||
}
|
||||
|
||||
|
||||
@@ -46,6 +46,26 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Helpers
|
||||
/// </summary>
|
||||
public int ReconnectCount { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// When set, <see cref="SubscribeAsync"/> returns a faulted task with this exception.
|
||||
/// </summary>
|
||||
public Exception? SubscribeException { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// When set, <see cref="UnsubscribeAsync"/> returns a faulted task with this exception.
|
||||
/// </summary>
|
||||
public Exception? UnsubscribeException { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// When set, <see cref="ReadAsync"/> returns a faulted task with this exception.
|
||||
/// </summary>
|
||||
public Exception? ReadException { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// When set, <see cref="WriteAsync"/> returns a faulted task with this exception.
|
||||
/// </summary>
|
||||
public Exception? WriteException { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Occurs when tests explicitly simulate a connection-state transition.
|
||||
/// </summary>
|
||||
@@ -82,6 +102,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Helpers
|
||||
/// <param name="callback">The callback that should receive simulated value changes.</param>
|
||||
public Task SubscribeAsync(string fullTagReference, Action<string, Vtq> callback)
|
||||
{
|
||||
if (SubscribeException != null)
|
||||
return Task.FromException(SubscribeException);
|
||||
_subscriptions[fullTagReference] = callback;
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
@@ -92,6 +114,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Helpers
|
||||
/// <param name="fullTagReference">The Galaxy attribute reference to stop monitoring.</param>
|
||||
public Task UnsubscribeAsync(string fullTagReference)
|
||||
{
|
||||
if (UnsubscribeException != null)
|
||||
return Task.FromException(UnsubscribeException);
|
||||
_subscriptions.TryRemove(fullTagReference, out _);
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
@@ -104,6 +128,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Helpers
|
||||
/// <returns>The seeded VTQ value or a bad not-connected VTQ when the tag was not populated.</returns>
|
||||
public Task<Vtq> ReadAsync(string fullTagReference, CancellationToken ct = default)
|
||||
{
|
||||
if (ReadException != null)
|
||||
return Task.FromException<Vtq>(ReadException);
|
||||
if (TagValues.TryGetValue(fullTagReference, out var vtq))
|
||||
return Task.FromResult(vtq);
|
||||
return Task.FromResult(Vtq.Bad(Quality.BadNotConnected));
|
||||
@@ -118,6 +144,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Helpers
|
||||
/// <returns>A completed task returning the configured write outcome.</returns>
|
||||
public Task<bool> WriteAsync(string fullTagReference, object value, CancellationToken ct = default)
|
||||
{
|
||||
if (WriteException != null)
|
||||
return Task.FromException<bool>(WriteException);
|
||||
WrittenValues.Add((fullTagReference, value));
|
||||
if (WriteResult)
|
||||
TagValues[fullTagReference] = Vtq.Good(value);
|
||||
|
||||
@@ -0,0 +1,83 @@
|
||||
using System;
|
||||
using System.Threading;
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Configuration;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Historian;
|
||||
|
||||
namespace ZB.MOM.WW.LmxOpcUa.Tests.Historian
|
||||
{
|
||||
/// <summary>
|
||||
/// Verifies Historian data source lifecycle behavior: dispose safety,
|
||||
/// post-dispose rejection, and double-dispose idempotency.
|
||||
/// </summary>
|
||||
public class HistorianDataSourceLifecycleTests
|
||||
{
|
||||
private static HistorianConfiguration DefaultConfig => new()
|
||||
{
|
||||
Enabled = true,
|
||||
ServerName = "test-historian",
|
||||
Port = 32568,
|
||||
IntegratedSecurity = true,
|
||||
CommandTimeoutSeconds = 5
|
||||
};
|
||||
|
||||
[Fact]
|
||||
public void ReadRawAsync_AfterDispose_ThrowsObjectDisposedException()
|
||||
{
|
||||
var ds = new HistorianDataSource(DefaultConfig);
|
||||
ds.Dispose();
|
||||
|
||||
Should.Throw<ObjectDisposedException>(() =>
|
||||
ds.ReadRawAsync("Tag1", DateTime.UtcNow.AddHours(-1), DateTime.UtcNow, 100)
|
||||
.GetAwaiter().GetResult());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ReadAggregateAsync_AfterDispose_ThrowsObjectDisposedException()
|
||||
{
|
||||
var ds = new HistorianDataSource(DefaultConfig);
|
||||
ds.Dispose();
|
||||
|
||||
Should.Throw<ObjectDisposedException>(() =>
|
||||
ds.ReadAggregateAsync("Tag1", DateTime.UtcNow.AddHours(-1), DateTime.UtcNow, 60000, "Average")
|
||||
.GetAwaiter().GetResult());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ReadAtTimeAsync_AfterDispose_ThrowsObjectDisposedException()
|
||||
{
|
||||
var ds = new HistorianDataSource(DefaultConfig);
|
||||
ds.Dispose();
|
||||
|
||||
Should.Throw<ObjectDisposedException>(() =>
|
||||
ds.ReadAtTimeAsync("Tag1", new[] { DateTime.UtcNow })
|
||||
.GetAwaiter().GetResult());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ReadEventsAsync_AfterDispose_ThrowsObjectDisposedException()
|
||||
{
|
||||
var ds = new HistorianDataSource(DefaultConfig);
|
||||
ds.Dispose();
|
||||
|
||||
Should.Throw<ObjectDisposedException>(() =>
|
||||
ds.ReadEventsAsync(null, DateTime.UtcNow.AddHours(-1), DateTime.UtcNow, 100)
|
||||
.GetAwaiter().GetResult());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Dispose_CalledTwice_DoesNotThrow()
|
||||
{
|
||||
var ds = new HistorianDataSource(DefaultConfig);
|
||||
ds.Dispose();
|
||||
Should.NotThrow(() => ds.Dispose());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ExtractAggregateValue_UnknownColumn_ReturnsNull()
|
||||
{
|
||||
HistorianDataSource.MapAggregateToColumn(new Opc.Ua.NodeId(99999)).ShouldBeNull();
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -94,6 +94,32 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Historian
|
||||
mgr.Retrieve(cp).ShouldBeNull();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Retrieve_ExpiredContinuationPoint_ReturnsNull()
|
||||
{
|
||||
var mgr = new HistoryContinuationPointManager(TimeSpan.FromMilliseconds(1));
|
||||
var values = CreateTestValues(5);
|
||||
var cp = mgr.Store(values);
|
||||
|
||||
System.Threading.Thread.Sleep(50);
|
||||
|
||||
mgr.Retrieve(cp).ShouldBeNull();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Release_PurgesExpiredEntries()
|
||||
{
|
||||
var mgr = new HistoryContinuationPointManager(TimeSpan.FromMilliseconds(1));
|
||||
var cp1 = mgr.Store(CreateTestValues(3));
|
||||
var cp2 = mgr.Store(CreateTestValues(5));
|
||||
|
||||
System.Threading.Thread.Sleep(50);
|
||||
|
||||
// Release one — purge should clean both expired entries
|
||||
mgr.Release(cp1);
|
||||
mgr.Retrieve(cp2).ShouldBeNull();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void MultipleContinuationPoints_IndependentRetrieval()
|
||||
{
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
using System;
|
||||
using System.Collections.Concurrent;
|
||||
using System.Runtime.InteropServices;
|
||||
using System.Threading;
|
||||
using System.Threading.Tasks;
|
||||
using Shouldly;
|
||||
@@ -13,6 +14,9 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.MxAccess
|
||||
/// </summary>
|
||||
public class StaComThreadTests : IDisposable
|
||||
{
|
||||
[DllImport("user32.dll")]
|
||||
private static extern void PostQuitMessage(int nExitCode);
|
||||
|
||||
private readonly StaComThread _thread;
|
||||
|
||||
/// <summary>
|
||||
@@ -101,5 +105,20 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.MxAccess
|
||||
|
||||
results.Count.ShouldBe(3);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Confirms that after the message pump exits, subsequent RunAsync calls throw instead of hanging.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public async Task RunAsync_AfterPumpExit_ThrowsInsteadOfHanging()
|
||||
{
|
||||
// Kill the pump from inside by posting WM_QUIT
|
||||
await _thread.RunAsync(() => PostQuitMessage(0));
|
||||
await Task.Delay(100); // let pump exit
|
||||
|
||||
_thread.IsRunning.ShouldBe(false);
|
||||
Should.Throw<InvalidOperationException>(() =>
|
||||
_thread.RunAsync(() => { }).GetAwaiter().GetResult());
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,104 @@
|
||||
using System;
|
||||
using System.Threading.Tasks;
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.LmxOpcUa.Tests.Helpers;
|
||||
|
||||
namespace ZB.MOM.WW.LmxOpcUa.Tests.OpcUa
|
||||
{
|
||||
/// <summary>
|
||||
/// Verifies that subscription and unsubscription failures in the MXAccess client
|
||||
/// are handled gracefully by the node manager instead of silently lost.
|
||||
/// </summary>
|
||||
public class LmxNodeManagerSubscriptionFaultTests
|
||||
{
|
||||
/// <summary>
|
||||
/// Confirms that a faulted SubscribeAsync is caught and logged rather than silently discarded.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public async Task SubscribeTag_WhenClientFaults_DoesNotThrowAndDoesNotHang()
|
||||
{
|
||||
var mxClient = new FakeMxAccessClient
|
||||
{
|
||||
SubscribeException = new InvalidOperationException("COM connection lost")
|
||||
};
|
||||
var fixture = OpcUaServerFixture.WithFakeMxAccessClient(mxClient);
|
||||
await fixture.InitializeAsync();
|
||||
try
|
||||
{
|
||||
var nodeManager = fixture.Service.NodeManagerInstance!;
|
||||
|
||||
// SubscribeTag should catch the fault — not throw and not hang
|
||||
Should.NotThrow(() => nodeManager.SubscribeTag("TestMachine_001.MachineID"));
|
||||
}
|
||||
finally
|
||||
{
|
||||
await fixture.DisposeAsync();
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Confirms that a faulted UnsubscribeAsync is caught and logged rather than silently discarded.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public async Task UnsubscribeTag_WhenClientFaults_DoesNotThrowAndDoesNotHang()
|
||||
{
|
||||
var mxClient = new FakeMxAccessClient();
|
||||
var fixture = OpcUaServerFixture.WithFakeMxAccessClient(mxClient);
|
||||
await fixture.InitializeAsync();
|
||||
try
|
||||
{
|
||||
var nodeManager = fixture.Service.NodeManagerInstance!;
|
||||
|
||||
// Subscribe first (succeeds)
|
||||
nodeManager.SubscribeTag("TestMachine_001.MachineID");
|
||||
mxClient.ActiveSubscriptionCount.ShouldBe(1);
|
||||
|
||||
// Now inject fault for unsubscribe
|
||||
mxClient.UnsubscribeException = new InvalidOperationException("COM connection lost");
|
||||
|
||||
// UnsubscribeTag should catch the fault — not throw and not hang
|
||||
Should.NotThrow(() => nodeManager.UnsubscribeTag("TestMachine_001.MachineID"));
|
||||
}
|
||||
finally
|
||||
{
|
||||
await fixture.DisposeAsync();
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Confirms that subscription failure does not corrupt the ref-count bookkeeping,
|
||||
/// allowing a retry to succeed after the fault clears.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public async Task SubscribeTag_AfterFaultClears_CanSubscribeAgain()
|
||||
{
|
||||
var mxClient = new FakeMxAccessClient
|
||||
{
|
||||
SubscribeException = new InvalidOperationException("transient fault")
|
||||
};
|
||||
var fixture = OpcUaServerFixture.WithFakeMxAccessClient(mxClient);
|
||||
await fixture.InitializeAsync();
|
||||
try
|
||||
{
|
||||
var nodeManager = fixture.Service.NodeManagerInstance!;
|
||||
|
||||
// First subscribe faults (caught)
|
||||
nodeManager.SubscribeTag("TestMachine_001.MachineID");
|
||||
mxClient.ActiveSubscriptionCount.ShouldBe(0); // subscribe failed
|
||||
|
||||
// Clear the fault
|
||||
mxClient.SubscribeException = null;
|
||||
|
||||
// Unsubscribe to reset ref count, then subscribe again
|
||||
nodeManager.UnsubscribeTag("TestMachine_001.MachineID");
|
||||
nodeManager.SubscribeTag("TestMachine_001.MachineID");
|
||||
mxClient.ActiveSubscriptionCount.ShouldBe(1);
|
||||
}
|
||||
finally
|
||||
{
|
||||
await fixture.DisposeAsync();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -145,7 +145,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Status
|
||||
var server2 = new StatusWebServer(
|
||||
new StatusReportService(new HealthCheckService(), 10),
|
||||
new Random().Next(19000, 20000));
|
||||
server2.Start();
|
||||
server2.Start().ShouldBe(true);
|
||||
server2.IsRunning.ShouldBe(true);
|
||||
server2.Stop();
|
||||
}
|
||||
|
||||
@@ -34,6 +34,10 @@
|
||||
<HintPath>..\..\lib\ArchestrA.MxAccess.dll</HintPath>
|
||||
<EmbedInteropTypes>false</EmbedInteropTypes>
|
||||
</Reference>
|
||||
<Reference Include="aahClientManaged">
|
||||
<HintPath>..\..\lib\aahClientManaged.dll</HintPath>
|
||||
<EmbedInteropTypes>false</EmbedInteropTypes>
|
||||
</Reference>
|
||||
</ItemGroup>
|
||||
|
||||
<ItemGroup>
|
||||
|
||||
Reference in New Issue
Block a user