From e2c204b62bab21b9af5937dd4bd41e534aa2ef26 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sun, 22 Mar 2026 04:38:30 -0400 Subject: [PATCH] docs(lmxproxy): add execution prompt to fix failing write integration tests --- lmxproxy/EXECUTE_FIX_WRITE_TESTS.md | 119 ++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 lmxproxy/EXECUTE_FIX_WRITE_TESTS.md diff --git a/lmxproxy/EXECUTE_FIX_WRITE_TESTS.md b/lmxproxy/EXECUTE_FIX_WRITE_TESTS.md new file mode 100644 index 0000000..5f1239a --- /dev/null +++ b/lmxproxy/EXECUTE_FIX_WRITE_TESTS.md @@ -0,0 +1,119 @@ +# LmxProxy v2 — Fix Failing Write Integration Tests + +Run this prompt with Claude Code from the `lmxproxy/` directory. + +## Prompt + +You are debugging and fixing 2 failing integration tests for the LmxProxy v2 gRPC proxy service. The tests are `WriteAndReadBack` and `WriteBatchAndWait` in the integration test project. + +### Context + +Read these documents before starting: + +1. `CLAUDE.md` — project-level instructions and architecture +2. `docs/deviations.md` — deviation #7 describes the failure (OnWriteComplete COM callback not firing) +3. `mxaccess_documentation.md` — the official MxAccess Toolkit reference. Search for "OnWriteComplete", "Write() method", and "AdviseSupervisory" sections. + +Read these source files: + +4. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.ReadWrite.cs` — current v2 write implementation +5. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.EventHandlers.cs` — OnWriteComplete callback handler +6. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — COM event wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`) +7. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.ReadWrite.cs` — v1 write implementation (for comparison) +8. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.EventHandlers.cs` — v1 OnWriteComplete handler +9. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteTests.cs` — failing WriteAndReadBack test +10. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteBatchAndWaitTests.cs` — failing WriteBatchAndWait test + +### Problem Statement + +The `OnWriteComplete` COM callback from MxAccess never fires within the write timeout (default 5s). The write operation follows this flow: + +1. `AddItem()` — registers the tag with MxAccess +2. `AdviseSupervisory()` — establishes a supervisory connection for the tag +3. Store a `TaskCompletionSource` in `_pendingWrites[itemHandle]` +4. `Write()` — sends the write to MxAccess +5. Wait for `OnWriteComplete` callback to resolve the TCS — **this never fires, causing a timeout** + +The v1 code used the same pattern and presumably worked, so the issue is either: + +- (a) MxAccess completes the write synchronously and never fires `OnWriteComplete` for simple (non-secured, non-verified) writes. The documentation says: "The LMXProxyInterface triggers an event for OnWriteComplete when your program calls the Write() or WriteSecured() function." However, this may not be true in all configurations or for all attribute types. +- (b) The COM event subscription wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`) is correct syntactically but the callback doesn't fire because the thread that called `Write()` (a thread pool thread via `Task.Run`) isn't pumping COM messages. Note: STA threading was abandoned (see deviation #2 in `docs/deviations.md`) because it caused other issues with `OnDataChange` callbacks. +- (c) There's a difference in how v1 vs v2 initializes or interacts with the COM object that affects event delivery. + +### Investigation Steps + +These steps require SSH access to windev where the v2 Host is deployed. + +**Step 1: Check if the write actually succeeds despite no callback.** + +The `WriteAndReadBack` test writes a value and then reads it back. The test fails because `WriteAsync` throws a `TimeoutException` (OnWriteComplete never fires). But the write itself may have succeeded at the MxAccess level. + +SSH to windev (`ssh windev`) and: +- Check the v2 Host service logs in `C:\publish-v2\logs\` for any write-related log entries +- Look for "Write failed", "WriteComplete", or "timeout" messages + +**Step 2: Add a fire-and-forget write mode.** + +If the write succeeds at the MxAccess level but `OnWriteComplete` never fires, the simplest fix is to bypass the callback wait. Modify `MxAccessClient.ReadWrite.cs`: + +- After calling `_lmxProxy.Write()`, immediately resolve the TCS with success instead of waiting for the callback +- Keep the `OnWriteComplete` handler wired up — if it does fire, it can log the result for diagnostics but shouldn't block the write path +- Add a configuration option `WriteConfirmationMode` with values `FireAndForget` (default) and `WaitForCallback`, so the behavior can be switched if needed + +The rationale: the MxAccess documentation's sample application (Chapter 6) uses `OnWriteComplete` to detect whether a *secured* or *verified* write is needed, then retries with `WriteSecured()`. For simple supervisory writes (which is what LmxProxy does), the Write() call itself is the confirmation — if it doesn't throw, the write was accepted. + +**Step 3: Implement the fix.** + +In `MxAccessClient.ReadWrite.cs`, modify `SetupWriteOperationAsync`: + +``` +// After _lmxProxy.Write(): +// Immediately complete the write — OnWriteComplete may not fire for supervisory writes +tcs.TrySetResult(true); +``` + +Remove the `_pendingWrites[itemHandle] = tcs` tracking since it's no longer needed for the default path. Keep `OnWriteComplete` wired for logging/diagnostics. + +Clean up `WaitForWriteCompletionAsync` — in fire-and-forget mode, the TCS is already completed so the await returns immediately. The cleanup (UnAdvise + RemoveItem) should still happen. + +**Step 4: Consider an alternative approach — poll-based write confirmation.** + +If fire-and-forget is too loose (we want to confirm the write took effect), consider a poll-based approach similar to `WriteBatchAndWaitAsync`: after writing, read the tag back and compare. This is more reliable than depending on a COM callback. However, this adds latency and may not be needed — the `WriteAndReadBack` test already does a read-back verification. + +**Step 5: Build and deploy.** + +After making changes: + +```bash +ssh windev "cd C:\source\lmxproxy && git pull && dotnet build src\ZB.MOM.WW.LmxProxy.Host -c Release -r win-x86 --no-self-contained -o C:\publish-v2" +``` + +Restart the v2 service: +```bash +ssh windev "net stop LmxProxyV2 && net start LmxProxyV2" +``` + +**Step 6: Run the integration tests.** + +From the Mac: +```bash +cd tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests +dotnet test --filter "WriteAndReadBack|WriteBatchAndWait" -v n +``` + +Both tests should pass. If they do, run the full integration test suite: +```bash +dotnet test -v n +``` + +**Step 7: Update deviations document.** + +If the fix works, update `docs/deviations.md` deviation #7 to reflect the resolution. Change the status from pending to resolved and document what was done. + +### Guardrails + +1. **Do not reintroduce STA threading.** The MTA/Task.Run approach works for `OnDataChange` callbacks and subscriptions. Do not change the threading model. +2. **Do not modify the integration tests** unless they have a genuine bug (e.g., wrong assertion, wrong tag name). The tests define the expected behavior — fix the implementation to match. +3. **Do not modify the proto file or gRPC contracts.** This is a Host-side implementation fix only. +4. **Keep the OnWriteComplete handler wired.** Even if we don't wait on it, the callback provides diagnostic information (security errors, verified write requirements) that should be logged. +5. **Commit with message:** `fix(lmxproxy): resolve write timeout — bypass OnWriteComplete callback for supervisory writes`