docs(lmxproxy): add execution prompt to fix failing write integration tests

This commit is contained in:
Joseph Doherty
2026-03-22 04:38:30 -04:00
parent 7079f6eed4
commit e2c204b62b

View File

@@ -0,0 +1,119 @@
# LmxProxy v2 — Fix Failing Write Integration Tests
Run this prompt with Claude Code from the `lmxproxy/` directory.
## Prompt
You are debugging and fixing 2 failing integration tests for the LmxProxy v2 gRPC proxy service. The tests are `WriteAndReadBack` and `WriteBatchAndWait` in the integration test project.
### Context
Read these documents before starting:
1. `CLAUDE.md` — project-level instructions and architecture
2. `docs/deviations.md` — deviation #7 describes the failure (OnWriteComplete COM callback not firing)
3. `mxaccess_documentation.md` — the official MxAccess Toolkit reference. Search for "OnWriteComplete", "Write() method", and "AdviseSupervisory" sections.
Read these source files:
4. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.ReadWrite.cs` — current v2 write implementation
5. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.EventHandlers.cs` — OnWriteComplete callback handler
6. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — COM event wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`)
7. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.ReadWrite.cs` — v1 write implementation (for comparison)
8. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.EventHandlers.cs` — v1 OnWriteComplete handler
9. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteTests.cs` — failing WriteAndReadBack test
10. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteBatchAndWaitTests.cs` — failing WriteBatchAndWait test
### Problem Statement
The `OnWriteComplete` COM callback from MxAccess never fires within the write timeout (default 5s). The write operation follows this flow:
1. `AddItem()` — registers the tag with MxAccess
2. `AdviseSupervisory()` — establishes a supervisory connection for the tag
3. Store a `TaskCompletionSource<bool>` in `_pendingWrites[itemHandle]`
4. `Write()` — sends the write to MxAccess
5. Wait for `OnWriteComplete` callback to resolve the TCS — **this never fires, causing a timeout**
The v1 code used the same pattern and presumably worked, so the issue is either:
- (a) MxAccess completes the write synchronously and never fires `OnWriteComplete` for simple (non-secured, non-verified) writes. The documentation says: "The LMXProxyInterface triggers an event for OnWriteComplete when your program calls the Write() or WriteSecured() function." However, this may not be true in all configurations or for all attribute types.
- (b) The COM event subscription wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`) is correct syntactically but the callback doesn't fire because the thread that called `Write()` (a thread pool thread via `Task.Run`) isn't pumping COM messages. Note: STA threading was abandoned (see deviation #2 in `docs/deviations.md`) because it caused other issues with `OnDataChange` callbacks.
- (c) There's a difference in how v1 vs v2 initializes or interacts with the COM object that affects event delivery.
### Investigation Steps
These steps require SSH access to windev where the v2 Host is deployed.
**Step 1: Check if the write actually succeeds despite no callback.**
The `WriteAndReadBack` test writes a value and then reads it back. The test fails because `WriteAsync` throws a `TimeoutException` (OnWriteComplete never fires). But the write itself may have succeeded at the MxAccess level.
SSH to windev (`ssh windev`) and:
- Check the v2 Host service logs in `C:\publish-v2\logs\` for any write-related log entries
- Look for "Write failed", "WriteComplete", or "timeout" messages
**Step 2: Add a fire-and-forget write mode.**
If the write succeeds at the MxAccess level but `OnWriteComplete` never fires, the simplest fix is to bypass the callback wait. Modify `MxAccessClient.ReadWrite.cs`:
- After calling `_lmxProxy.Write()`, immediately resolve the TCS with success instead of waiting for the callback
- Keep the `OnWriteComplete` handler wired up — if it does fire, it can log the result for diagnostics but shouldn't block the write path
- Add a configuration option `WriteConfirmationMode` with values `FireAndForget` (default) and `WaitForCallback`, so the behavior can be switched if needed
The rationale: the MxAccess documentation's sample application (Chapter 6) uses `OnWriteComplete` to detect whether a *secured* or *verified* write is needed, then retries with `WriteSecured()`. For simple supervisory writes (which is what LmxProxy does), the Write() call itself is the confirmation — if it doesn't throw, the write was accepted.
**Step 3: Implement the fix.**
In `MxAccessClient.ReadWrite.cs`, modify `SetupWriteOperationAsync`:
```
// After _lmxProxy.Write():
// Immediately complete the write — OnWriteComplete may not fire for supervisory writes
tcs.TrySetResult(true);
```
Remove the `_pendingWrites[itemHandle] = tcs` tracking since it's no longer needed for the default path. Keep `OnWriteComplete` wired for logging/diagnostics.
Clean up `WaitForWriteCompletionAsync` — in fire-and-forget mode, the TCS is already completed so the await returns immediately. The cleanup (UnAdvise + RemoveItem) should still happen.
**Step 4: Consider an alternative approach — poll-based write confirmation.**
If fire-and-forget is too loose (we want to confirm the write took effect), consider a poll-based approach similar to `WriteBatchAndWaitAsync`: after writing, read the tag back and compare. This is more reliable than depending on a COM callback. However, this adds latency and may not be needed — the `WriteAndReadBack` test already does a read-back verification.
**Step 5: Build and deploy.**
After making changes:
```bash
ssh windev "cd C:\source\lmxproxy && git pull && dotnet build src\ZB.MOM.WW.LmxProxy.Host -c Release -r win-x86 --no-self-contained -o C:\publish-v2"
```
Restart the v2 service:
```bash
ssh windev "net stop LmxProxyV2 && net start LmxProxyV2"
```
**Step 6: Run the integration tests.**
From the Mac:
```bash
cd tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests
dotnet test --filter "WriteAndReadBack|WriteBatchAndWait" -v n
```
Both tests should pass. If they do, run the full integration test suite:
```bash
dotnet test -v n
```
**Step 7: Update deviations document.**
If the fix works, update `docs/deviations.md` deviation #7 to reflect the resolution. Change the status from pending to resolved and document what was done.
### Guardrails
1. **Do not reintroduce STA threading.** The MTA/Task.Run approach works for `OnDataChange` callbacks and subscriptions. Do not change the threading model.
2. **Do not modify the integration tests** unless they have a genuine bug (e.g., wrong assertion, wrong tag name). The tests define the expected behavior — fix the implementation to match.
3. **Do not modify the proto file or gRPC contracts.** This is a Host-side implementation fix only.
4. **Keep the OnWriteComplete handler wired.** Even if we don't wait on it, the callback provides diagnostic information (security errors, verified write requirements) that should be logged.
5. **Commit with message:** `fix(lmxproxy): resolve write timeout — bypass OnWriteComplete callback for supervisory writes`