Resolve Worker-001, Worker-002, Worker-003 code-review findings

Worker-001: WnWrapAlarmConsumer armed a System.Threading.Timer whose OnPoll
callback ran GetXmlCurrentAlarms2 on a thread-pool thread against the
Apartment-threaded wnwrap COM object, which can deadlock on cross-apartment
marshaling. Removed the pollTimer/pollIntervalMs fields, OnPoll, the
poll-interval constructor parameter, and the timer arm/disposal. Polls are
driven externally by the STA via StaRuntime.InvokeAsync(PollOnce).

Worker-002: RunHeartbeatLoopAsync delayed a full HeartbeatInterval before
the first heartbeat. Restructured so the first beat is sent immediately on
entering the loop and the delay applies only between subsequent beats.

Worker-003: ProcessCommandAsync silently returned without a reply when
_state was not a command-serving state after dispatch. Both drop sites now
log a WorkerCommandResultDropped diagnostic with correlation_id via
IWorkerLogger; _state is now volatile.

Three pre-existing tests that asserted strict frame ordering were updated to
tolerate an interleaved first heartbeat (Worker-002 consequence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-18 20:59:46 -04:00
parent e967e85973
commit 53e3973209
8 changed files with 323 additions and 97 deletions
+7 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 15 |
| Open findings | 12 |
## Checklist coverage
@@ -33,13 +33,13 @@
| Severity | High |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:204-207` |
| Status | Open |
| Status | Resolved |
**Description:** When constructed with `pollIntervalMilliseconds > 0`, `Subscribe` starts a `System.Threading.Timer` whose `OnPoll` callback runs `PollOnce()` — which calls `wwAlarmConsumerClass.GetXmlCurrentAlarms2` — on a thread-pool thread. The wnwrap CLSID is registered `ThreadingModel=Apartment`; calling its methods off the owning STA violates the hard rule that all COM calls happen on the dedicated STA thread, and can deadlock on cross-apartment marshaling when the STA is not pumping. The production path (default constructor, interval 0) is safe, but the public 3-arg constructor leaves this footgun callable, and tests/live-smoke use it.
**Recommendation:** Remove the internal `Timer` entirely (production already drives `PollOnce` from the STA), or document and gate it so it can only be used from an STA thread. At minimum, make the timer-driven mode unreachable from any production wiring.
**Resolution:** _(open)_
**Resolution:** 2026-05-18 — Removed the off-STA timer infrastructure from `WnWrapAlarmConsumer`: the `Timer? pollTimer` and `pollIntervalMs` fields, the `DefaultPollIntervalMilliseconds` constant, the `OnPoll` callback, the timer-arming arm in `Subscribe`, and the timer disposal block in `Dispose`. The `pollIntervalMilliseconds` parameter is gone from both public constructors (the test-seam ctor is now 2-arg: `wwAlarmConsumerClass` + `maxAlarmsPerFetch`), so the off-STA footgun is structurally unreachable. `PollOnce()` remains the public STA-driven entry point. The stale "poll … on a timer below" comment was corrected. Verified by the regression tests `WnWrapAlarmConsumer_has_no_internal_timer_field` and `WnWrapAlarmConsumer_exposes_no_poll_interval_constructor_parameter`; the `AlarmsLiveSmokeTests` call site was updated to the 2-arg constructor.
### Worker-002
@@ -48,13 +48,13 @@
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:545-549` |
| Status | Open |
| Status | Resolved |
**Description:** `RunHeartbeatLoopAsync` calls `await Task.Delay(_sessionOptions.HeartbeatInterval, ...)` before sending the first heartbeat. The gateway therefore receives no heartbeat for the first full interval (default 5s) after the worker reaches `Ready`. If the gateway's liveness watchdog expects a heartbeat sooner, a healthy worker can be misclassified as hung at startup.
**Recommendation:** Send an initial heartbeat immediately on entering the loop, or move the `Task.Delay` to the end of the loop body.
**Resolution:** _(open)_
**Resolution:** 2026-05-18 — Restructured `RunHeartbeatLoopAsync` so the `Task.Delay(HeartbeatInterval)` is applied between beats only, not before the first. A `firstBeat` guard skips the delay on the initial iteration, so the gateway sees a heartbeat as soon as the worker is `Ready`; cancellation behavior is preserved (the loop still observes the token and the delay still throws on cancellation). Verified by the regression test `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop`. Three pre-existing tests (`WorkerPipeClientTests.RunAsync_ConnectsToPipeAndCompletesHandshake`, `WorkerPipeClientTests.RunAsync_RetriesUntilPipeServerAppears`, `WorkerPipeSessionTests.RunAsync_WhenCommandThrowsAfterShutdown_DropsLateFaultAndWritesShutdownAck`) assumed strict frame ordering and were updated to skip the now-interleaved first heartbeat while still asserting the same shutdown-ack behavior.
### Worker-003
@@ -63,13 +63,13 @@
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403`, `:416-419` |
| Status | Open |
| Status | Resolved |
**Description:** `ProcessCommandAsync` checks `_state` after `DispatchAsync` completes and silently `return`s without writing a `WorkerCommandReply` (or fault) when `_state` is not `Ready`/`ExecutingCommand`. `_state` is a plain field mutated from multiple tasks (heartbeat loop, event-drain loop, shutdown). A command that completes successfully while `_state` has transitioned will have its reply dropped with no diagnostic, and the gateway's correlation-id wait then hangs until its own timeout. The `_state` read is also not synchronized.
**Recommendation:** Always attempt to write the reply/fault for an in-flight command, or explicitly reject in-flight commands with a `Canceled`/`WorkerUnavailable` reply during state transitions. Make `_state` access thread-safe (volatile or locked).
**Resolution:** _(open)_
**Resolution:** 2026-05-18 — Both silent-drop `return` sites in `ProcessCommandAsync` (the post-`DispatchAsync` success path and the exception path) now call a new `LogCommandResultDropped` helper before returning. The helper logs an Information event named `WorkerCommandResultDropped` via the session's `IWorkerLogger`, carrying the command's `correlation_id` plus `command_method` and `worker_state`, so a stuck gateway correlation-id wait is now traceable. The `_state` field was made `volatile` (`WorkerState` is an int-backed protobuf enum, so volatile is valid) so cross-thread reads observe the latest value without tearing; this is a low-risk, non-behavioral change and did not destabilize any test. Verified by the regression test `RunAsync_WhenReplyIsDroppedAfterShutdown_LogsDiagnostic`.
### Worker-004
@@ -1,7 +1,6 @@
using System;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using MxGateway.Contracts.Proto;
using MxGateway.Worker.MxAccess;
@@ -77,13 +76,11 @@ public sealed class AlarmsLiveSmokeTests
Log($"Pump duration: {PumpDuration.TotalSeconds:F0}s; transition wait timeout: {TransitionWaitTimeout.TotalSeconds:F0}s");
MxAccessEventQueue queue = new MxAccessEventQueue();
// pollIntervalMs=0 disables the internal Timer; we drive PollOnce
// manually from the STA below to avoid threadpool→STA marshaling
// (the wnwrap COM is ThreadingModel=Apartment, and this test
// doesn't run a Win32 message pump on its STA).
// The consumer owns no internal timer; we drive PollOnce manually
// from the STA below (the wnwrap COM is ThreadingModel=Apartment,
// and this test doesn't run a Win32 message pump on its STA).
WnWrapAlarmConsumer consumer = new WnWrapAlarmConsumer(
new WNWRAPCONSUMERLib.wwAlarmConsumerClass(),
pollIntervalMilliseconds: 0,
maxAlarmsPerFetch: 1024);
MxAccessAlarmEventSink sink = new MxAccessAlarmEventSink(queue, new MxAccessEventMapper());
using AlarmDispatcher dispatcher = new AlarmDispatcher(consumer, sink, SessionId);
@@ -92,13 +89,10 @@ public sealed class AlarmsLiveSmokeTests
dispatcher.Subscribe(SubscriptionExpression);
Log("Subscribe -> ok. Driving PollOnce manually from this STA...");
// The wnwrap COM object is ThreadingModel=Apartment. The consumer's
// internal Timer would fire on a threadpool thread and deadlock on
// cross-apartment marshaling without a Win32 message pump. For the
// smoke test we constructed the consumer with pollIntervalMs=0
// (Timer disabled) and drive PollOnce manually here on the STA.
// Production hosting will route polls through the worker's
// StaRuntime in a follow-up PR.
// The wnwrap COM object is ThreadingModel=Apartment. The consumer
// owns no internal timer, so we drive PollOnce manually here on the
// STA. Production hosting routes polls through the worker's
// StaRuntime.
// 1. Wait for the first transition (any kind), then keep waiting
// for one with kind=Raise so the alarm is currently Active when
@@ -77,7 +77,9 @@ public sealed class WorkerPipeClientTests
},
});
WorkerEnvelope shutdownAck = await reader.ReadAsync();
WorkerEnvelope shutdownAck = await ReadUntilAsync(
reader,
WorkerEnvelope.BodyOneofCase.WorkerShutdownAck);
Assert.Equal(WorkerEnvelope.BodyOneofCase.WorkerShutdownAck, shutdownAck.BodyCase);
await clientTask;
}
@@ -120,7 +122,9 @@ public sealed class WorkerPipeClientTests
Assert.Equal(WorkerEnvelope.BodyOneofCase.WorkerReady, (await reader.ReadAsync()).BodyCase);
await writer.WriteAsync(CreateShutdown());
Assert.Equal(WorkerEnvelope.BodyOneofCase.WorkerShutdownAck, (await reader.ReadAsync()).BodyCase);
Assert.Equal(
WorkerEnvelope.BodyOneofCase.WorkerShutdownAck,
(await ReadUntilAsync(reader, WorkerEnvelope.BodyOneofCase.WorkerShutdownAck)).BodyCase);
await clientTask;
}
@@ -143,6 +147,25 @@ public sealed class WorkerPipeClientTests
await Assert.ThrowsAsync<TimeoutException>(async () => await client.RunAsync(workerOptions));
}
/// <summary>
/// Reads frames until one matching the expected body case is found,
/// skipping interleaved heartbeats (the first heartbeat is emitted
/// immediately on entering the heartbeat loop — see Worker-002).
/// </summary>
private static async Task<WorkerEnvelope> ReadUntilAsync(
WorkerFrameReader reader,
WorkerEnvelope.BodyOneofCase expectedBody)
{
while (true)
{
WorkerEnvelope envelope = await reader.ReadAsync();
if (envelope.BodyCase == expectedBody)
{
return envelope;
}
}
}
private static WorkerPipeSession CreateSession(
Stream stream,
WorkerFrameProtocolOptions options)
@@ -383,16 +383,126 @@ public sealed class WorkerPipeSessionTests
await pipePair.GatewayWriter
.WriteAsync(CreateShutdownEnvelope(), cancellation.Token);
WorkerEnvelope firstEnvelopeAfterShutdown = await pipePair.GatewayReader
.ReadAsync(cancellation.Token);
// The first heartbeat is emitted immediately on entering the loop
// (Worker-002), so skip any interleaved heartbeats; the late fault
// must still be dropped — no WorkerFault may precede the ack.
WorkerEnvelope envelopeAfterShutdown;
do
{
envelopeAfterShutdown = await pipePair.GatewayReader.ReadAsync(cancellation.Token);
Assert.NotEqual(
WorkerEnvelope.BodyOneofCase.WorkerFault,
envelopeAfterShutdown.BodyCase);
}
while (envelopeAfterShutdown.BodyCase == WorkerEnvelope.BodyOneofCase.WorkerHeartbeat);
Assert.Equal(WorkerEnvelope.BodyOneofCase.WorkerShutdownAck, firstEnvelopeAfterShutdown.BodyCase);
Assert.Equal(ProtocolStatusCode.Ok, firstEnvelopeAfterShutdown.WorkerShutdownAck.Status.Code);
Assert.Equal(WorkerEnvelope.BodyOneofCase.WorkerShutdownAck, envelopeAfterShutdown.BodyCase);
Assert.Equal(ProtocolStatusCode.Ok, envelopeAfterShutdown.WorkerShutdownAck.Status.Code);
Task completedTask = await Task.WhenAny(runTask, Task.Delay(TimeSpan.FromSeconds(2), cancellation.Token));
Assert.Same(runTask, completedTask);
await runTask;
}
/// <summary>
/// Worker-002 regression: the first heartbeat must be emitted
/// immediately on entering the heartbeat loop, not after a full
/// HeartbeatInterval. A long interval is configured so a delay-first
/// loop would fail to deliver a heartbeat inside the assertion window.
/// </summary>
[Fact]
public async Task RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop()
{
using CancellationTokenSource cancellation = new(TimeSpan.FromSeconds(10));
using PipePair pipePair = await PipePair.CreateAsync(cancellation.Token);
FakeRuntimeSession runtime = new();
WorkerPipeSession session = CreatePipeSession(
pipePair.WorkerStream,
runtime,
new WorkerPipeSessionOptions
{
// A deliberately long interval: a delay-before-first-beat
// loop would not produce a heartbeat for 30s.
HeartbeatInterval = TimeSpan.FromSeconds(30),
HeartbeatGrace = TimeSpan.FromSeconds(60),
});
Task runTask = session.RunAsync(cancellation.Token);
await CompleteGatewayHandshakeAsync(pipePair, cancellation.Token);
DateTimeOffset start = DateTimeOffset.UtcNow;
using CancellationTokenSource heartbeatWait = CancellationTokenSource
.CreateLinkedTokenSource(cancellation.Token);
heartbeatWait.CancelAfter(TimeSpan.FromSeconds(5));
WorkerEnvelope heartbeat = await ReadUntilAsync(
pipePair.GatewayReader,
WorkerEnvelope.BodyOneofCase.WorkerHeartbeat,
heartbeatWait.Token);
TimeSpan elapsed = DateTimeOffset.UtcNow - start;
Assert.Equal(WorkerEnvelope.BodyOneofCase.WorkerHeartbeat, heartbeat.BodyCase);
Assert.True(
elapsed < TimeSpan.FromSeconds(5),
$"First heartbeat took {elapsed}, expected well under the 30s interval.");
await SendShutdownAndWaitAsync(pipePair, runTask, cancellation.Token);
}
/// <summary>
/// Worker-003 regression: when a command completes after the worker
/// has transitioned out of a command-serving state, the dropped
/// reply must be logged with a diagnostic rather than discarded
/// silently, so a stuck gateway correlation wait can be traced.
/// </summary>
[Fact]
public async Task RunAsync_WhenReplyIsDroppedAfterShutdown_LogsDiagnostic()
{
using CancellationTokenSource cancellation = new(TimeSpan.FromSeconds(10));
using PipePair pipePair = await PipePair.CreateAsync(cancellation.Token);
FakeRuntimeSession runtime = new()
{
BlockDispatch = true,
};
RecordingWorkerLogger logger = new();
WorkerFrameProtocolOptions options = CreateOptions();
WorkerPipeSession session = new(
new WorkerFrameReader(pipePair.WorkerStream, options),
new WorkerFrameWriter(pipePair.WorkerStream, options),
options,
() => 1234,
new WorkerPipeSessionOptions
{
HeartbeatInterval = TimeSpan.FromSeconds(1),
HeartbeatGrace = TimeSpan.FromSeconds(5),
},
() => runtime,
logger);
Task runTask = session.RunAsync(cancellation.Token);
await CompleteGatewayHandshakeAsync(pipePair, cancellation.Token);
await pipePair.GatewayWriter.WriteAsync(
CreateCommandEnvelope("command-dropped-after-shutdown"),
cancellation.Token);
Assert.True(runtime.DispatchStarted.Wait(TimeSpan.FromSeconds(2)));
await pipePair.GatewayWriter
.WriteAsync(CreateShutdownEnvelope(), cancellation.Token);
WorkerEnvelope shutdownAck = await ReadUntilAsync(
pipePair.GatewayReader,
WorkerEnvelope.BodyOneofCase.WorkerShutdownAck,
cancellation.Token);
Assert.Equal(ProtocolStatusCode.Ok, shutdownAck.WorkerShutdownAck.Status.Code);
Task completedTask = await Task.WhenAny(runTask, Task.Delay(TimeSpan.FromSeconds(3), cancellation.Token));
Assert.Same(runTask, completedTask);
await runTask;
Assert.Contains(
logger.Events,
entry => entry.EventName == "WorkerCommandResultDropped"
&& entry.Fields.TryGetValue("correlation_id", out object? correlationId)
&& (string?)correlationId == "command-dropped-after-shutdown");
}
private static WorkerPipeSession CreateSession(
Stream inbound,
Stream outbound,
@@ -619,6 +729,69 @@ public sealed class WorkerPipeSessionTests
buffer[3] = (byte)(value >> 24);
}
private sealed class RecordingWorkerLogger : MxGateway.Worker.Bootstrap.IWorkerLogger
{
private readonly object gate = new();
private readonly List<LogEntry> events = new();
/// <summary>Gets a snapshot of the recorded log entries.</summary>
public IReadOnlyList<LogEntry> Events
{
get
{
lock (gate)
{
return new List<LogEntry>(events);
}
}
}
/// <summary>Records an informational log event.</summary>
public void Information(string eventName, IReadOnlyDictionary<string, object?> fields)
{
Record(eventName, fields);
}
/// <summary>Records an error log event.</summary>
public void Error(string eventName, IReadOnlyDictionary<string, object?> fields)
{
Record(eventName, fields);
}
private void Record(string eventName, IReadOnlyDictionary<string, object?> fields)
{
Dictionary<string, object?> copy = new();
foreach (KeyValuePair<string, object?> field in fields)
{
copy[field.Key] = field.Value;
}
lock (gate)
{
events.Add(new LogEntry(eventName, copy));
}
}
/// <summary>A single recorded log entry.</summary>
public sealed class LogEntry
{
/// <summary>Initializes a recorded log entry.</summary>
/// <param name="eventName">The log event name.</param>
/// <param name="fields">The log event fields.</param>
public LogEntry(string eventName, IReadOnlyDictionary<string, object?> fields)
{
EventName = eventName;
Fields = fields;
}
/// <summary>Gets the log event name.</summary>
public string EventName { get; }
/// <summary>Gets the log event fields.</summary>
public IReadOnlyDictionary<string, object?> Fields { get; }
}
}
private sealed class FakeRuntimeSession : IWorkerRuntimeSession
{
private readonly ManualResetEventSlim releaseDispatch = new(false);
@@ -1,5 +1,7 @@
using System;
using System.Linq;
using System.Reflection;
using System.Threading;
using MxGateway.Worker.MxAccess;
namespace MxGateway.Worker.Tests.MxAccess;
@@ -109,4 +111,42 @@ public sealed class WnWrapAlarmConsumerXmlTests
Assert.False(WnWrapAlarmConsumer.TryParseHexGuid(hex, out Guid guid));
Assert.Equal(Guid.Empty, guid);
}
/// <summary>
/// Worker-001 regression: the consumer must own no internal
/// <see cref="Timer"/>. A thread-pool timer calling the
/// apartment-threaded wnwrap COM object off its owning STA can
/// deadlock on cross-apartment marshaling, so the timer field and
/// callback must not exist on the type.
/// </summary>
[Fact]
public void WnWrapAlarmConsumer_has_no_internal_timer_field()
{
FieldInfo[] fields = typeof(WnWrapAlarmConsumer)
.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
Assert.DoesNotContain(fields, field => field.FieldType == typeof(Timer));
Assert.Null(typeof(WnWrapAlarmConsumer).GetMethod(
"OnPoll",
BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic));
}
/// <summary>
/// Worker-001 regression: no public constructor may accept a
/// poll-interval parameter. A non-zero poll interval was the only
/// way to arm the off-STA timer; removing the parameter makes the
/// footgun structurally unreachable.
/// </summary>
[Fact]
public void WnWrapAlarmConsumer_exposes_no_poll_interval_constructor_parameter()
{
foreach (ConstructorInfo constructor in typeof(WnWrapAlarmConsumer)
.GetConstructors(BindingFlags.Instance | BindingFlags.Public))
{
Assert.DoesNotContain(
constructor.GetParameters(),
parameter => parameter.Name is not null
&& parameter.Name.IndexOf("poll", StringComparison.OrdinalIgnoreCase) >= 0);
}
}
}
+38 -2
View File
@@ -29,7 +29,11 @@ public sealed class WorkerPipeSession
private readonly HashSet<Task> _activeCommandTasks = new();
private IWorkerRuntimeSession? _runtimeSession;
private long _nextSequence;
private WorkerState _state = WorkerState.Starting;
// Mutated from the message loop, command tasks, the heartbeat loop and the
// shutdown path; volatile so cross-thread reads observe the latest state
// without tearing (WorkerState is an int-backed protobuf enum).
private volatile WorkerState _state = WorkerState.Starting;
private bool _acceptingCommands = true;
private bool _watchdogFaultSent;
private bool _shutdownTimedOut;
@@ -398,6 +402,7 @@ public sealed class WorkerPipeSession
MxCommandReply reply = await runtimeSession.DispatchAsync(staCommand).ConfigureAwait(false);
if (_state is not WorkerState.Ready and not WorkerState.ExecutingCommand)
{
LogCommandResultDropped(envelope.CorrelationId, staCommand.MethodName);
return;
}
@@ -415,6 +420,7 @@ public sealed class WorkerPipeSession
{
if (_state is not WorkerState.Ready and not WorkerState.ExecutingCommand)
{
LogCommandResultDropped(envelope.CorrelationId, staCommand.MethodName);
return;
}
@@ -428,6 +434,25 @@ public sealed class WorkerPipeSession
}
}
/// <summary>
/// Logs that a completed command result was dropped because the
/// worker is no longer in a command-serving state (typically a
/// shutdown that raced the command's completion). Without this
/// diagnostic the gateway's correlation-id wait blocks until its own
/// timeout with no trace of why no reply arrived.
/// </summary>
private void LogCommandResultDropped(string correlationId, string commandMethod)
{
_logger?.Information(
"WorkerCommandResultDropped",
new Dictionary<string, object?>
{
["correlation_id"] = correlationId,
["command_method"] = commandMethod,
["worker_state"] = _state.ToString(),
});
}
private async Task ShutdownAsync(
WorkerShutdown shutdown,
CancellationToken cancellationToken)
@@ -544,9 +569,20 @@ public sealed class WorkerPipeSession
private async Task RunHeartbeatLoopAsync(CancellationToken cancellationToken)
{
// The first heartbeat is sent immediately on entering the loop so the
// gateway's liveness watchdog sees a beat as soon as the worker is
// Ready; the delay is applied between subsequent beats only. A
// delay-before-first-beat loop would leave the gateway without a
// heartbeat for a full HeartbeatInterval after startup.
bool firstBeat = true;
while (!cancellationToken.IsCancellationRequested)
{
await Task.Delay(_sessionOptions.HeartbeatInterval, cancellationToken).ConfigureAwait(false);
if (!firstBeat)
{
await Task.Delay(_sessionOptions.HeartbeatInterval, cancellationToken).ConfigureAwait(false);
}
firstBeat = false;
IWorkerRuntimeSession? runtimeSession = _runtimeSession;
if (runtimeSession is null)
{
@@ -42,8 +42,8 @@ public interface IMxAccessAlarmConsumer : IDisposable
/// Subscription string follows AVEVA's canonical format:
/// <c>\\&lt;node&gt;\Galaxy!&lt;area&gt;</c>. The literal "Galaxy" is
/// the provider name (regardless of the configured Galaxy database
/// name). Calling Subscribe also begins polling on the consumer's
/// internal timer.
/// name). Subscribe does not start any polling of its own; the caller
/// drives polls explicitly via <see cref="PollOnce"/>.
/// </summary>
void Subscribe(string subscription);
@@ -88,10 +88,8 @@ public interface IMxAccessAlarmConsumer : IDisposable
/// <summary>
/// Drives a single synchronous poll of the underlying alarm source.
/// Implementations that use an internal <see cref="System.Threading.Timer"/>
/// are constructed with <c>pollIntervalMilliseconds=0</c> in production so
/// the timer is disabled; the worker's STA drives polls via
/// <c>StaRuntime.InvokeAsync</c> instead, satisfying the
/// The production consumer owns no internal timer; the worker's STA
/// drives polls via <c>StaRuntime.InvokeAsync</c>, satisfying the
/// <c>ThreadingModel=Apartment</c> requirement of
/// <c>wwAlarmConsumerClass</c>. Fake implementations should no-op.
/// This method must be invoked on the thread that created the consumer
@@ -2,7 +2,6 @@ using System;
using System.Collections.Generic;
using System.Globalization;
using System.Runtime.InteropServices;
using System.Threading;
using System.Xml;
using WNWRAPCONSUMERLib;
@@ -31,15 +30,16 @@ namespace MxGateway.Worker.MxAccess;
/// <strong>Threading.</strong> The wnwrap CLSID is registered with
/// <c>ThreadingModel=Apartment</c>. The consumer must be created
/// and operated from an STA thread; the worker's
/// <see cref="MxAccessStaSession"/> already runs an STA pump that
/// is the natural host. Polling cadence is governed by
/// <see cref="PollIntervalMilliseconds"/> on a dedicated timer the
/// consumer owns; in production the worker's STA dispatcher should
/// marshal each callback onto the STA thread before invoking
/// <c>GetXmlCurrentAlarms2</c>. For now (test-grade), this consumer
/// calls the COM API on whichever thread the timer fires it on —
/// the worker bootstrap will gain a thin "run-on-STA" wrapper as
/// part of A.3 dispatcher wiring.
/// <see cref="MxAccessStaSession"/> runs an STA pump that hosts it.
/// The consumer owns <em>no</em> internal timer: every COM call
/// (<c>Subscribe</c>, <c>PollOnce</c>, <c>AcknowledgeBy*</c>) must
/// be invoked on the STA that created the consumer. Polling cadence
/// is driven externally by the worker's STA via
/// <c>StaRuntime.InvokeAsync(() =&gt; consumer.PollOnce())</c>, which
/// keeps every <c>GetXmlCurrentAlarms2</c> call on the apartment that
/// owns the COM object. A thread-pool timer would call the COM API
/// off the owning STA and can deadlock on cross-apartment marshaling
/// when the STA is not pumping messages, so no such timer exists.
/// </para>
/// </remarks>
public sealed class WnWrapAlarmConsumer : IMxAccessAlarmConsumer
@@ -47,52 +47,39 @@ public sealed class WnWrapAlarmConsumer : IMxAccessAlarmConsumer
private const string DefaultProductName = "OtOpcUa.MxGateway";
private const string DefaultApplicationName = "OtOpcUa.MxGateway.Worker";
private const string DefaultVersion = "1.0";
private const int DefaultPollIntervalMilliseconds = 500;
private const int DefaultMaxAlarmsPerFetch = 1024;
private readonly object syncRoot = new object();
private readonly Dictionary<Guid, MxAlarmSnapshotRecord> latestSnapshot =
new Dictionary<Guid, MxAlarmSnapshotRecord>();
private readonly int pollIntervalMs;
private readonly int maxAlarmsPerFetch;
private wwAlarmConsumerClass? client;
private wwAlarmConsumerClass? ackClient;
private string subscriptionExpression = string.Empty;
private Timer? pollTimer;
private bool subscribed;
private bool disposed;
/// <summary>
/// Production constructor — creates the wnwrap COM object on the
/// current thread (must be the worker's STA) and disables the
/// internal <see cref="Timer"/> (<c>pollIntervalMilliseconds=0</c>).
/// Polling is driven externally by the STA via
/// <c>StaRuntime.InvokeAsync(() =&gt; consumer.PollOnce())</c> so
/// that every COM call stays on the STA that owns the apartment.
/// current thread (which must be the worker's STA). Polling is driven
/// externally by the STA via
/// <c>StaRuntime.InvokeAsync(() =&gt; consumer.PollOnce())</c> so that
/// every COM call stays on the STA that owns the apartment.
/// </summary>
public WnWrapAlarmConsumer()
: this(new wwAlarmConsumerClass(), pollIntervalMilliseconds: 0, DefaultMaxAlarmsPerFetch)
: this(new wwAlarmConsumerClass(), DefaultMaxAlarmsPerFetch)
{
}
/// <summary>
/// Test seam / explicit construction — inject a pre-created COM
/// client and tune the poll cadence. <c>pollIntervalMilliseconds == 0</c>
/// disables the internal <see cref="Timer"/> entirely; the caller
/// must drive <see cref="PollOnce"/> manually (used by hosts that
/// marshal polls onto a foreign STA, and by live smoke tests that
/// pump from the STA they own).
/// Test seam / explicit construction.
/// </summary>
public WnWrapAlarmConsumer(
wwAlarmConsumerClass client,
int pollIntervalMilliseconds,
int maxAlarmsPerFetch)
{
this.client = client ?? throw new ArgumentNullException(nameof(client));
this.pollIntervalMs = pollIntervalMilliseconds < 0
? DefaultPollIntervalMilliseconds
: pollIntervalMilliseconds;
this.maxAlarmsPerFetch = maxAlarmsPerFetch > 0
? maxAlarmsPerFetch
: DefaultMaxAlarmsPerFetch;
@@ -101,8 +88,6 @@ public sealed class WnWrapAlarmConsumer : IMxAccessAlarmConsumer
/// <inheritdoc />
public event EventHandler<MxAlarmTransitionEvent>? AlarmTransitionEmitted;
public int PollIntervalMilliseconds => pollIntervalMs;
/// <inheritdoc />
public void Subscribe(string subscription)
{
@@ -136,7 +121,9 @@ public sealed class WnWrapAlarmConsumer : IMxAccessAlarmConsumer
}
// hWnd=0: wnwrap supports a pull-based model — no message pump
// is required. We poll GetXmlCurrentAlarms2 on a timer below.
// is required. GetXmlCurrentAlarms2 is polled by the worker's STA
// via StaRuntime.InvokeAsync(() => consumer.PollOnce()); this type
// owns no internal timer.
int reg = com.IwwAlarmConsumer_RegisterConsumer(
hWnd: 0,
szProductName: DefaultProductName,
@@ -201,10 +188,6 @@ public sealed class WnWrapAlarmConsumer : IMxAccessAlarmConsumer
subscriptionExpression = subscription;
subscribed = true;
if (pollIntervalMs > 0)
{
pollTimer = new Timer(OnPoll, state: null, dueTime: 0, period: pollIntervalMs);
}
}
}
@@ -294,31 +277,14 @@ public sealed class WnWrapAlarmConsumer : IMxAccessAlarmConsumer
}
}
private void OnPoll(object? _)
{
if (disposed) return;
try
{
PollOnce();
}
catch (Exception ex)
{
// Swallow — the poll loop must not propagate exceptions out of
// the timer callback, or the worker process tears down. The
// EventQueue fault counter (wired in by the future A.3 dispatcher)
// is the right place to surface poll failures; for now the
// exception is intentionally silent so the timer keeps firing.
_ = ex;
}
}
/// <summary>
/// Synchronously poll the wnwrap consumer once and dispatch any
/// transitions. Public so STA-bound hosts can drive polling from
/// the thread that owns the COM object instead of relying on the
/// internal <see cref="Timer"/> (which fires on a thread-pool
/// thread and blocks indefinitely on cross-apartment marshaling
/// when the host STA isn't pumping messages).
/// transitions. STA-bound hosts drive polling by calling this from
/// the thread that owns the COM object. The consumer deliberately
/// owns no internal timer: a thread-pool timer would call the
/// apartment-threaded COM object off its owning STA and can block
/// indefinitely on cross-apartment marshaling when the STA is not
/// pumping messages.
/// </summary>
public void PollOnce()
{
@@ -524,21 +490,17 @@ public sealed class WnWrapAlarmConsumer : IMxAccessAlarmConsumer
/// <inheritdoc />
public void Dispose()
{
Timer? timerToDispose;
wwAlarmConsumerClass? clientToDispose;
wwAlarmConsumerClass? ackClientToDispose;
lock (syncRoot)
{
if (disposed) return;
disposed = true;
timerToDispose = pollTimer;
pollTimer = null;
clientToDispose = client;
client = null;
ackClientToDispose = ackClient;
ackClient = null;
}
timerToDispose?.Dispose();
ReleaseConsumerCom(clientToDispose);
ReleaseConsumerCom(ackClientToDispose);
}