docs(lmxproxy): update deviations for STA resolution, OnWriteComplete, subscribe fix

- Deviation #2: document three STA iterations (failed → Task.Run → StaComThread)
- Deviation #7: mark resolved — OnWriteComplete now works via STA message pump
- Deviation #8: note awaited subscription creation fixes flaky subscribe test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-03-22 23:52:09 -04:00
parent b218773ab0
commit 59d143e4c8

View File

@@ -9,12 +9,13 @@ Decisions made during implementation that differ from or extend the original pla
**Why**: protoc.exe from 2.71.0 crashes with access violation (exit code 0xC0000005) on windev (Windows 10, x64). The 2.68.1 version works reliably. **Why**: protoc.exe from 2.71.0 crashes with access violation (exit code 0xC0000005) on windev (Windows 10, x64). The 2.68.1 version works reliably.
**How to apply**: If upgrading Grpc.Tools in the future, test protoc on windev first. **How to apply**: If upgrading Grpc.Tools in the future, test protoc on windev first.
## 2. STA Dispatch Thread replaced with Task.Run ## 2. STA threading — three iterations
**Plan specified**: Dedicated STA thread with `BlockingCollection<Action>` dispatch queue and `Application.DoEvents()` message pump for all COM operations. **Plan specified**: Dedicated STA thread with `BlockingCollection<Action>` dispatch queue and `Application.DoEvents()` message pump.
**Actual**: `Task.Run` on thread pool (MTA) for all COM operations, matching the v1 pattern. **Iteration 1 (failed)**: `StaDispatchThread` with `BlockingCollection.Take()` + `Application.DoEvents()`. Failed because `Take()` blocked the STA thread, preventing the message pump from running. COM callbacks never fired.
**Why**: The STA thread's message pump (`Application.DoEvents()`) between work items was insufficient — when a COM call like `AdviseSupervisory` was dispatched and the thread blocked waiting for the next work item, COM event callbacks (`OnDataChange`, `OnWriteComplete`) never fired because there was no active message pump during the wait. MxAccess works from MTA threads because COM marshaling handles cross-apartment calls, and events fire on their own threads. **Iteration 2 (partial)**: Replaced with `Task.Run` on thread pool (MTA). `OnDataChange` worked (MxAccess fires it on its own threads), but `OnWriteComplete` never fired (needs message-pump-based marshaling). Writes used fire-and-forget as a workaround.
**How to apply**: Do not reintroduce STA threading for MxAccess. The `System.Windows.Forms` reference was removed from the Host csproj. **Iteration 3 (current)**: `StaComThread` with Win32 `GetMessage`/`DispatchMessage` loop. Work dispatched via `PostThreadMessage(WM_APP)` which wakes the message pump. COM callbacks (`OnDataChange`, `OnWriteComplete`) are delivered between work items via `DispatchMessage`. All COM objects created and called on this single STA thread.
**How to apply**: All MxAccess COM calls must go through `_staThread.RunAsync()`. Never call COM objects directly from thread pool threads. See `docs/sta_gap.md` for the full design analysis.
## 3. TypedValue property-level `_setCase` tracking ## 3. TypedValue property-level `_setCase` tracking
@@ -44,20 +45,20 @@ Decisions made during implementation that differ from or extend the original pla
**Why**: The `x-api-key` header always carries the caller's valid key (for interceptor auth). The `CheckApiKey` RPC is designed for clients to test whether a *different* key is valid, so it must check the body key independently. **Why**: The `x-api-key` header always carries the caller's valid key (for interceptor auth). The `CheckApiKey` RPC is designed for clients to test whether a *different* key is valid, so it must check the body key independently.
**How to apply**: `ScadaGrpcService` receives `ApiKeyService` as an optional constructor parameter. **How to apply**: `ScadaGrpcService` receives `ApiKeyService` as an optional constructor parameter.
## 7. Write uses fire-and-forget (OnWriteComplete callback not delivered) ## 7. OnWriteComplete callback — resolved via STA message pump
**Plan specified**: Wait for `OnWriteComplete` COM callback to confirm write success. **Plan specified**: Wait for `OnWriteComplete` COM callback to confirm write success.
**Actual**: Write is confirmed by `_lmxProxy.Write()` returning without throwing. The `OnWriteComplete` callback is kept wired for diagnostic logging but never awaited. **History**: Initially implemented as fire-and-forget because `OnWriteComplete` never fired — the Host had no Windows message pump to deliver the COM callback. See `docs/sta_gap.md` for the full analysis.
**Why**: The MxAccess documentation (Write() Method, p.47) explicitly states: *"Upon completion of the write, your program receives notification of the success/failure status through the OnWriteComplete() event"* and *"that item should not be taken off advise or removed from the internal tables until the OnWriteComplete() event is received."* So `OnWriteComplete` **should** fire — the issue is COM event delivery, not MxAccess behavior. The MxAccess sample applications are all Windows Forms apps with a UI message loop (`Application.Run()`). COM event callbacks are delivered via the Windows message pump. Our v2 Host runs as a headless Topshelf Windows service with no message loop. `Write()` is called from a thread pool thread (`Task.Run`), and the `OnWriteComplete` callback needs to be marshaled back to the calling apartment — which can't happen without a message pump. `OnDataChange` works because MxAccess fires it proactively on its own internal thread whenever data changes. `OnWriteComplete` is a response to a specific `Write()` call and appears to require message-pump-based marshaling to deliver. **Resolution**: `StaComThread` (a dedicated STA thread with a Win32 `GetMessage`/`DispatchMessage` loop) was introduced, providing a proper message pump. All COM operations are now dispatched to this thread via `PostThreadMessage(WM_APP)`. The message pump delivers `OnWriteComplete` callbacks between work items.
**Risk**: For simple supervisory writes, fire-and-forget is safe — if `Write()` returns without a COM exception, MxAccess accepted the write. However, for secured writes (error 1012) or verified writes (error 1013), `OnWriteComplete` is the only way to learn that the write was rejected and must be retried with `WriteSecured()`. If secured/verified writes are ever needed, this must be revisited — either by running a message pump on a dedicated thread or by using a polling-based confirmation. **Current behavior**: Write dispatches `_lmxProxy.Write()` on the STA thread, registers a `TaskCompletionSource` in `_pendingWrites`, then awaits the callback with a timeout. `OnWriteComplete` resolves or rejects the TCS with `MxStatusMapper` error details. If the callback doesn't arrive within the write timeout, falls back to success (fire-and-forget safety net). Clean up (UnAdvise + RemoveItem) happens on the STA thread after the callback or timeout.
**How to apply**: Do not await `OnWriteComplete` for write confirmation. The `Write()` COM call succeeding (not throwing a COM exception) is the confirmation. Clean up (UnAdvise + RemoveItem) happens immediately after the write in a finally block. Keep `OnWriteComplete` wired — if COM threading is ever fixed (e.g., dedicated STA thread with proper message pump), the callback could be re-enabled. **How to apply**: Writes now get real confirmation from MxAccess. Secured write (1012) and verified write (1013) rejections are surfaced as exceptions via `OnWriteComplete`. The timeout fallback ensures writes don't hang if the callback is delayed.
## 8. SubscriptionManager must create MxAccess COM subscriptions ## 8. SubscriptionManager must create MxAccess COM subscriptions
**Plan specified**: SubscriptionManager manages per-client channels and routes updates from MxAccess. **Plan specified**: SubscriptionManager manages per-client channels and routes updates from MxAccess.
**Actual**: SubscriptionManager must also call `IScadaClient.SubscribeAsync()` to create the underlying COM subscriptions when a tag is first subscribed, and dispose them when the last client unsubscribes. **Actual**: SubscriptionManager must also call `IScadaClient.SubscribeAsync()` to create the underlying COM subscriptions when a tag is first subscribed, and dispose them when the last client unsubscribes.
**Why**: The Phase 2 implementation tracked client-to-tag routing in internal dictionaries but never called `MxAccessClient.SubscribeAsync()` to create the actual MxAccess COM subscriptions (`AddItem` + `AdviseSupervisory`). Without the COM subscription, `OnDataChange` never fired and no updates were delivered to clients. This caused the `Subscribe_ReceivesUpdates` integration test to receive 0 updates over 30 seconds. **Why**: The Phase 2 implementation tracked client-to-tag routing in internal dictionaries but never called `MxAccessClient.SubscribeAsync()` to create the actual MxAccess COM subscriptions (`AddItem` + `AdviseSupervisory`). Without the COM subscription, `OnDataChange` never fired and no updates were delivered to clients. This caused the `Subscribe_ReceivesUpdates` integration test to receive 0 updates over 30 seconds.
**How to apply**: `SubscriptionManager.Subscribe()` collects newly-seen tags (those without an existing `TagSubscription`) and calls `_scadaClient.SubscribeAsync()` for them, passing `OnTagValueChanged` as the callback. The returned `IAsyncDisposable` handles are tracked in `_mxAccessHandles` per address and disposed in `UnsubscribeClient()` when the last client for a tag leaves. **How to apply**: `SubscriptionManager.SubscribeAsync()` collects newly-seen tags (those without an existing `TagSubscription`) and **awaits** `_scadaClient.SubscribeAsync()` for them, passing `OnTagValueChanged` as the callback. The await ensures the COM subscription is fully established before the channel reader is returned — this prevents a race where the initial `OnDataChange` (first value delivery after `AdviseSupervisory`) fires before the gRPC stream handler starts reading. Previously this was fire-and-forget (`_ = CreateMxAccessSubscriptionsAsync()`), causing intermittent `Subscribe_ReceivesUpdates` test failures (0 updates in 30s).
--- ---