Files
scadalink-design/lmxproxy/EXECUTE_FIX_WRITE_TESTS.md

7.0 KiB

LmxProxy v2 — Fix Failing Write Integration Tests

Run this prompt with Claude Code from the lmxproxy/ directory.

Prompt

You are debugging and fixing 2 failing integration tests for the LmxProxy v2 gRPC proxy service. The tests are WriteAndReadBack and WriteBatchAndWait in the integration test project.

Context

Read these documents before starting:

  1. CLAUDE.md — project-level instructions and architecture
  2. docs/deviations.md — deviation #7 describes the failure (OnWriteComplete COM callback not firing)
  3. mxaccess_documentation.md — the official MxAccess Toolkit reference. Search for "OnWriteComplete", "Write() method", and "AdviseSupervisory" sections.

Read these source files:

  1. src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.ReadWrite.cs — current v2 write implementation
  2. src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.EventHandlers.cs — OnWriteComplete callback handler
  3. src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs — COM event wiring (_lmxProxy.OnWriteComplete += OnWriteComplete)
  4. src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.ReadWrite.cs — v1 write implementation (for comparison)
  5. src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.EventHandlers.cs — v1 OnWriteComplete handler
  6. tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteTests.cs — failing WriteAndReadBack test
  7. tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteBatchAndWaitTests.cs — failing WriteBatchAndWait test

Problem Statement

The OnWriteComplete COM callback from MxAccess never fires within the write timeout (default 5s). The write operation follows this flow:

  1. AddItem() — registers the tag with MxAccess
  2. AdviseSupervisory() — establishes a supervisory connection for the tag
  3. Store a TaskCompletionSource<bool> in _pendingWrites[itemHandle]
  4. Write() — sends the write to MxAccess
  5. Wait for OnWriteComplete callback to resolve the TCS — this never fires, causing a timeout

The v1 code used the same pattern and presumably worked, so the issue is either:

  • (a) MxAccess completes the write synchronously and never fires OnWriteComplete for simple (non-secured, non-verified) writes. The documentation says: "The LMXProxyInterface triggers an event for OnWriteComplete when your program calls the Write() or WriteSecured() function." However, this may not be true in all configurations or for all attribute types.
  • (b) The COM event subscription wiring (_lmxProxy.OnWriteComplete += OnWriteComplete) is correct syntactically but the callback doesn't fire because the thread that called Write() (a thread pool thread via Task.Run) isn't pumping COM messages. Note: STA threading was abandoned (see deviation #2 in docs/deviations.md) because it caused other issues with OnDataChange callbacks.
  • (c) There's a difference in how v1 vs v2 initializes or interacts with the COM object that affects event delivery.

Investigation Steps

These steps require SSH access to windev where the v2 Host is deployed.

Step 1: Check if the write actually succeeds despite no callback.

The WriteAndReadBack test writes a value and then reads it back. The test fails because WriteAsync throws a TimeoutException (OnWriteComplete never fires). But the write itself may have succeeded at the MxAccess level.

SSH to windev (ssh windev) and:

  • Check the v2 Host service logs in C:\publish-v2\logs\ for any write-related log entries
  • Look for "Write failed", "WriteComplete", or "timeout" messages

Step 2: Add a fire-and-forget write mode.

If the write succeeds at the MxAccess level but OnWriteComplete never fires, the simplest fix is to bypass the callback wait. Modify MxAccessClient.ReadWrite.cs:

  • After calling _lmxProxy.Write(), immediately resolve the TCS with success instead of waiting for the callback
  • Keep the OnWriteComplete handler wired up — if it does fire, it can log the result for diagnostics but shouldn't block the write path
  • Add a configuration option WriteConfirmationMode with values FireAndForget (default) and WaitForCallback, so the behavior can be switched if needed

The rationale: the MxAccess documentation's sample application (Chapter 6) uses OnWriteComplete to detect whether a secured or verified write is needed, then retries with WriteSecured(). For simple supervisory writes (which is what LmxProxy does), the Write() call itself is the confirmation — if it doesn't throw, the write was accepted.

Step 3: Implement the fix.

In MxAccessClient.ReadWrite.cs, modify SetupWriteOperationAsync:

// After _lmxProxy.Write():
// Immediately complete the write — OnWriteComplete may not fire for supervisory writes
tcs.TrySetResult(true);

Remove the _pendingWrites[itemHandle] = tcs tracking since it's no longer needed for the default path. Keep OnWriteComplete wired for logging/diagnostics.

Clean up WaitForWriteCompletionAsync — in fire-and-forget mode, the TCS is already completed so the await returns immediately. The cleanup (UnAdvise + RemoveItem) should still happen.

Step 4: Consider an alternative approach — poll-based write confirmation.

If fire-and-forget is too loose (we want to confirm the write took effect), consider a poll-based approach similar to WriteBatchAndWaitAsync: after writing, read the tag back and compare. This is more reliable than depending on a COM callback. However, this adds latency and may not be needed — the WriteAndReadBack test already does a read-back verification.

Step 5: Build and deploy.

After making changes:

ssh windev "cd C:\source\lmxproxy && git pull && dotnet build src\ZB.MOM.WW.LmxProxy.Host -c Release -r win-x86 --no-self-contained -o C:\publish-v2"

Restart the v2 service:

ssh windev "net stop LmxProxyV2 && net start LmxProxyV2"

Step 6: Run the integration tests.

From the Mac:

cd tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests
dotnet test --filter "WriteAndReadBack|WriteBatchAndWait" -v n

Both tests should pass. If they do, run the full integration test suite:

dotnet test -v n

Step 7: Update deviations document.

If the fix works, update docs/deviations.md deviation #7 to reflect the resolution. Change the status from pending to resolved and document what was done.

Guardrails

  1. Do not reintroduce STA threading. The MTA/Task.Run approach works for OnDataChange callbacks and subscriptions. Do not change the threading model.
  2. Do not modify the integration tests unless they have a genuine bug (e.g., wrong assertion, wrong tag name). The tests define the expected behavior — fix the implementation to match.
  3. Do not modify the proto file or gRPC contracts. This is a Host-side implementation fix only.
  4. Keep the OnWriteComplete handler wired. Even if we don't wait on it, the callback provides diagnostic information (security errors, verified write requirements) that should be logged.
  5. Commit with message: fix(lmxproxy): resolve write timeout — bypass OnWriteComplete callback for supervisory writes