diff --git a/lmxproxy/docs/deviations.md b/lmxproxy/docs/deviations.md index 96bf2a9..0f32ee1 100644 --- a/lmxproxy/docs/deviations.md +++ b/lmxproxy/docs/deviations.md @@ -9,12 +9,13 @@ Decisions made during implementation that differ from or extend the original pla **Why**: protoc.exe from 2.71.0 crashes with access violation (exit code 0xC0000005) on windev (Windows 10, x64). The 2.68.1 version works reliably. **How to apply**: If upgrading Grpc.Tools in the future, test protoc on windev first. -## 2. STA Dispatch Thread replaced with Task.Run +## 2. STA threading — three iterations -**Plan specified**: Dedicated STA thread with `BlockingCollection` dispatch queue and `Application.DoEvents()` message pump for all COM operations. -**Actual**: `Task.Run` on thread pool (MTA) for all COM operations, matching the v1 pattern. -**Why**: The STA thread's message pump (`Application.DoEvents()`) between work items was insufficient — when a COM call like `AdviseSupervisory` was dispatched and the thread blocked waiting for the next work item, COM event callbacks (`OnDataChange`, `OnWriteComplete`) never fired because there was no active message pump during the wait. MxAccess works from MTA threads because COM marshaling handles cross-apartment calls, and events fire on their own threads. -**How to apply**: Do not reintroduce STA threading for MxAccess. The `System.Windows.Forms` reference was removed from the Host csproj. +**Plan specified**: Dedicated STA thread with `BlockingCollection` dispatch queue and `Application.DoEvents()` message pump. +**Iteration 1 (failed)**: `StaDispatchThread` with `BlockingCollection.Take()` + `Application.DoEvents()`. Failed because `Take()` blocked the STA thread, preventing the message pump from running. COM callbacks never fired. +**Iteration 2 (partial)**: Replaced with `Task.Run` on thread pool (MTA). `OnDataChange` worked (MxAccess fires it on its own threads), but `OnWriteComplete` never fired (needs message-pump-based marshaling). Writes used fire-and-forget as a workaround. +**Iteration 3 (current)**: `StaComThread` with Win32 `GetMessage`/`DispatchMessage` loop. Work dispatched via `PostThreadMessage(WM_APP)` which wakes the message pump. COM callbacks (`OnDataChange`, `OnWriteComplete`) are delivered between work items via `DispatchMessage`. All COM objects created and called on this single STA thread. +**How to apply**: All MxAccess COM calls must go through `_staThread.RunAsync()`. Never call COM objects directly from thread pool threads. See `docs/sta_gap.md` for the full design analysis. ## 3. TypedValue property-level `_setCase` tracking @@ -44,20 +45,20 @@ Decisions made during implementation that differ from or extend the original pla **Why**: The `x-api-key` header always carries the caller's valid key (for interceptor auth). The `CheckApiKey` RPC is designed for clients to test whether a *different* key is valid, so it must check the body key independently. **How to apply**: `ScadaGrpcService` receives `ApiKeyService` as an optional constructor parameter. -## 7. Write uses fire-and-forget (OnWriteComplete callback not delivered) +## 7. OnWriteComplete callback — resolved via STA message pump **Plan specified**: Wait for `OnWriteComplete` COM callback to confirm write success. -**Actual**: Write is confirmed by `_lmxProxy.Write()` returning without throwing. The `OnWriteComplete` callback is kept wired for diagnostic logging but never awaited. -**Why**: The MxAccess documentation (Write() Method, p.47) explicitly states: *"Upon completion of the write, your program receives notification of the success/failure status through the OnWriteComplete() event"* and *"that item should not be taken off advise or removed from the internal tables until the OnWriteComplete() event is received."* So `OnWriteComplete` **should** fire — the issue is COM event delivery, not MxAccess behavior. The MxAccess sample applications are all Windows Forms apps with a UI message loop (`Application.Run()`). COM event callbacks are delivered via the Windows message pump. Our v2 Host runs as a headless Topshelf Windows service with no message loop. `Write()` is called from a thread pool thread (`Task.Run`), and the `OnWriteComplete` callback needs to be marshaled back to the calling apartment — which can't happen without a message pump. `OnDataChange` works because MxAccess fires it proactively on its own internal thread whenever data changes. `OnWriteComplete` is a response to a specific `Write()` call and appears to require message-pump-based marshaling to deliver. -**Risk**: For simple supervisory writes, fire-and-forget is safe — if `Write()` returns without a COM exception, MxAccess accepted the write. However, for secured writes (error 1012) or verified writes (error 1013), `OnWriteComplete` is the only way to learn that the write was rejected and must be retried with `WriteSecured()`. If secured/verified writes are ever needed, this must be revisited — either by running a message pump on a dedicated thread or by using a polling-based confirmation. -**How to apply**: Do not await `OnWriteComplete` for write confirmation. The `Write()` COM call succeeding (not throwing a COM exception) is the confirmation. Clean up (UnAdvise + RemoveItem) happens immediately after the write in a finally block. Keep `OnWriteComplete` wired — if COM threading is ever fixed (e.g., dedicated STA thread with proper message pump), the callback could be re-enabled. +**History**: Initially implemented as fire-and-forget because `OnWriteComplete` never fired — the Host had no Windows message pump to deliver the COM callback. See `docs/sta_gap.md` for the full analysis. +**Resolution**: `StaComThread` (a dedicated STA thread with a Win32 `GetMessage`/`DispatchMessage` loop) was introduced, providing a proper message pump. All COM operations are now dispatched to this thread via `PostThreadMessage(WM_APP)`. The message pump delivers `OnWriteComplete` callbacks between work items. +**Current behavior**: Write dispatches `_lmxProxy.Write()` on the STA thread, registers a `TaskCompletionSource` in `_pendingWrites`, then awaits the callback with a timeout. `OnWriteComplete` resolves or rejects the TCS with `MxStatusMapper` error details. If the callback doesn't arrive within the write timeout, falls back to success (fire-and-forget safety net). Clean up (UnAdvise + RemoveItem) happens on the STA thread after the callback or timeout. +**How to apply**: Writes now get real confirmation from MxAccess. Secured write (1012) and verified write (1013) rejections are surfaced as exceptions via `OnWriteComplete`. The timeout fallback ensures writes don't hang if the callback is delayed. ## 8. SubscriptionManager must create MxAccess COM subscriptions **Plan specified**: SubscriptionManager manages per-client channels and routes updates from MxAccess. **Actual**: SubscriptionManager must also call `IScadaClient.SubscribeAsync()` to create the underlying COM subscriptions when a tag is first subscribed, and dispose them when the last client unsubscribes. **Why**: The Phase 2 implementation tracked client-to-tag routing in internal dictionaries but never called `MxAccessClient.SubscribeAsync()` to create the actual MxAccess COM subscriptions (`AddItem` + `AdviseSupervisory`). Without the COM subscription, `OnDataChange` never fired and no updates were delivered to clients. This caused the `Subscribe_ReceivesUpdates` integration test to receive 0 updates over 30 seconds. -**How to apply**: `SubscriptionManager.Subscribe()` collects newly-seen tags (those without an existing `TagSubscription`) and calls `_scadaClient.SubscribeAsync()` for them, passing `OnTagValueChanged` as the callback. The returned `IAsyncDisposable` handles are tracked in `_mxAccessHandles` per address and disposed in `UnsubscribeClient()` when the last client for a tag leaves. +**How to apply**: `SubscriptionManager.SubscribeAsync()` collects newly-seen tags (those without an existing `TagSubscription`) and **awaits** `_scadaClient.SubscribeAsync()` for them, passing `OnTagValueChanged` as the callback. The await ensures the COM subscription is fully established before the channel reader is returned — this prevents a race where the initial `OnDataChange` (first value delivery after `AdviseSupervisory`) fires before the gRPC stream handler starts reading. Previously this was fire-and-forget (`_ = CreateMxAccessSubscriptionsAsync()`), causing intermittent `Subscribe_ReceivesUpdates` test failures (0 updates in 30s). ---