docs(lmxproxy): add STA message pump gap analysis with implementation guide

Documents when the full STA+Application.Run() approach is needed (secured/verified writes), why our first attempt failed, the correct pattern using Form.BeginInvoke(), and tradeoffs vs fire-and-forget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 05:02:15 -04:00
parent 467fdc34d8
commit 73b2b2f6d7
1 changed files with 167 additions and 0 deletions
--- a/lmxproxy/docs/sta_gap.md
+++ b/lmxproxy/docs/sta_gap.md
@@ -0,0 +1,167 @@
+# STA Message Pump Gap — OnWriteComplete COM Callback
+
+**Status**: Documented gap. Fire-and-forget workaround in place (deviation #7). Full fix deferred until secured/verified writes are needed.
+
+## When This Matters
+
+The current fire-and-forget write approach works for **supervisory writes** where:
+- Security is handled at the LmxProxy API key level, not MxAccess attribute level
+- Writes succeed synchronously (no secured/verified write requirements)
+- Write confirmation is handled at the application level (read-back in `WriteBatchAndWait`)
+
+This gap becomes a **blocking issue** if any of these scenarios arise:
+- **Secured writes (MxAccess error 1012)**: Attribute requires ArchestrA user authentication. `OnWriteComplete` returns the error, and the caller must retry with `WriteSecured()`.
+- **Verified writes (MxAccess error 1013)**: Attribute requires two-user verification. Same retry pattern.
+- **Write failure detection**: MxAccess accepts the `Write()` call but can't complete it (e.g., downstream device failure). `OnWriteComplete` is the only notification of this — without it, the caller assumes success.
+
+## Root Cause
+
+The MxAccess documentation (Write() Method) states: *"Upon completion of the write, your program receives notification of the success/failure status through the OnWriteComplete() event"* and *"that item should not be taken off advise or removed from the internal tables until the OnWriteComplete() event is received."*
+
+`OnWriteComplete` **should** fire after every `Write()` call. It doesn't in our service because:
+- MxAccess is a COM component designed for Windows Forms apps with a UI message loop
+- COM event callbacks are delivered via the Windows message pump
+- Our Topshelf Windows service has no message pump — `Write()` is called from thread pool threads (`Task.Run`) with no message loop
+- `OnDataChange` works because MxAccess fires it proactively on its own internal threads; `OnWriteComplete` is a response callback that needs message-pump-based marshaling
+
+## Correct Solution: Dedicated STA Thread + `Application.Run()`
+
+Based on research (Stephen Toub, MSDN Magazine 2007; Microsoft Learn COM interop docs; community patterns), the correct approach is a dedicated STA thread running a Windows Forms message pump via `Application.Run()`.
+
+### Architecture
+
+```
+Service main thread (MTA)
+    │
+    ├── gRPC server threads (handle client RPCs)
+    │       │
+    │       └── Marshal COM calls via Form.BeginInvoke() ──┐
+    │                                                       │
+    └── Dedicated STA thread                                │
+            │                                               │
+            ├── Creates LMXProxyServerClass COM object      │
+            ├── Wires event handlers (OnDataChange,         │
+            │   OnWriteComplete, OperationComplete)         │
+            ├── Runs Application.Run() ← continuous         │
+            │   message pump                                │
+            │                                               │
+            └── Hidden Form receives BeginInvoke calls ◄────┘
+                    │
+                    ├── Executes COM operations (Read, Write,
+                    │   AddItem, AdviseSupervisory, etc.)
+                    │
+                    └── COM callbacks delivered via message pump
+                        (OnWriteComplete, OnDataChange, etc.)
+```
+
+### Implementation Pattern
+
+```csharp
+// In MxAccessClient constructor or Start():
+var initDone = new ManualResetEventSlim(false);
+
+_staThread = new Thread(() =>
+{
+    // 1. Create hidden form for marshaling
+    _marshalForm = new Form();
+    _marshalForm.CreateHandle(); // force HWND creation without showing
+
+    // 2. Create COM objects ON THIS THREAD
+    _lmxProxy = new LMXProxyServerClass();
+    _lmxProxy.OnDataChange += OnDataChange;
+    _lmxProxy.OnWriteComplete += OnWriteComplete;
+
+    // 3. Signal that init is complete
+    initDone.Set();
+
+    // 4. Run message pump (blocks forever, pumps COM callbacks)
+    Application.Run();
+});
+_staThread.Name = "MxAccess-STA";
+_staThread.IsBackground = true;
+_staThread.SetApartmentState(ApartmentState.STA);
+_staThread.Start();
+
+initDone.Wait(); // wait for COM objects to be ready
+```
+
+### Dispatching Work to the STA Thread
+
+```csharp
+// All COM calls must go through the hidden form's invoke:
+public Task<Vtq> ReadAsync(string address, CancellationToken ct)
+{
+    var tcs = new TaskCompletionSource<Vtq>();
+    _marshalForm.BeginInvoke((Action)(() =>
+    {
+        try
+        {
+            // COM call executes on STA thread
+            int handle = _lmxProxy.AddItem(_connectionHandle, address);
+            _lmxProxy.AdviseSupervisory(_connectionHandle, handle);
+            // ... etc
+            tcs.SetResult(vtq);
+        }
+        catch (Exception ex)
+        {
+            tcs.SetException(ex);
+        }
+    }));
+    return tcs.Task;
+}
+```
+
+### Shutdown
+
+```csharp
+// To stop the message pump:
+_marshalForm.BeginInvoke((Action)(() =>
+{
+    // Clean up COM objects on STA thread
+    // ... UnAdvise, RemoveItem, Unregister ...
+    Marshal.ReleaseComObject(_lmxProxy);
+    Application.ExitThread(); // stops Application.Run()
+}));
+_staThread.Join(TimeSpan.FromSeconds(10));
+```
+
+### Why Our First Attempt Failed
+
+Our original `StaDispatchThread` (Phase 2) used `BlockingCollection.Take()` to wait for work items, with `Application.DoEvents()` between items. This failed because:
+
+| Our failed approach | Correct approach |
+|---|---|
+| `BlockingCollection.Take()` blocks the STA thread, preventing the message pump from running | `Application.Run()` runs continuously, pumping messages at all times |
+| `Application.DoEvents()` only pumps messages already in the queue at that instant | Message pump runs an infinite loop, processing messages as they arrive |
+| Work dispatched by enqueueing to `BlockingCollection` | Work dispatched via `Form.BeginInvoke()` which posts a Windows message to the STA thread's queue |
+
+The key difference: `BeginInvoke` posts a `WM_` message that the message pump processes alongside COM callbacks. `BlockingCollection` bypasses the message pump entirely.
+
+## Drawbacks of the STA Approach
+
+### Performance
+- **All COM calls serialize onto one thread.** Under load (batch reads of 100+ tags), operations queue up single-file. Current `Task.Run` approach allows MxAccess's internal marshaling to handle some concurrency.
+- **Double context switch per operation.** Caller → STA thread (invoke) → wait → back to caller. Adds ~0.1-1ms per call. Negligible for single reads, noticeable for large batch operations.
+
+### Safety
+- **Single point of failure.** If the STA thread dies, all MxAccess operations stop. Recovery requires tearing down and recreating the thread + all COM objects.
+- **Deadlock risk.** If STA thread code synchronously waits on something that needs the STA thread (circular dependency), the message pump freezes. All waits must be async/non-blocking.
+- **Reentrancy.** While pumping messages, inbound COM callbacks can reenter your code during another COM call. Event handlers must be reentrant-safe.
+
+### Complexity
+- Every COM call needs `_marshalForm.BeginInvoke()` wrapping.
+- COM object affinity to STA thread is hard to enforce at compile time.
+- Unit tests need STA thread support or must use fakes.
+
+## Decision
+
+Fire-and-forget is the correct choice for now. Revisit when secured/verified writes are needed.
+
+## References
+
+- [.NET Matters: Handling Messages in Console Apps — Stephen Toub, MSDN Magazine 2007](https://learn.microsoft.com/en-us/archive/msdn-magazine/2007/june/net-matters-handling-messages-in-console-apps)
+- [How to: Support COM Interop by Displaying Each Windows Form on Its Own Thread — Microsoft Learn](https://learn.microsoft.com/en-us/dotnet/desktop/winforms/advanced/how-to-support-com-interop-by-displaying-each-windows-form-on-its-own-thread)
+- [.NET Windows Service needs STAThread — hirenppatel](https://hirenppatel.wordpress.com/2012/11/24/net-windows-service-needs-to-use-stathread-instead-of-mtathread/)
+- [Application.Run() In a Windows Service — PC Review](https://www.pcreview.co.uk/threads/application-run-in-a-windows-service.3087159/)
+- [Build a message pump for a Windows service? — CodeProject](https://www.codeproject.com/Messages/1365966/Build-a-message-pump-for-a-Windows-service.aspx)
+- MxAccess Toolkit User's Guide — Write() Method, OnWriteComplete Callback sections