docs(lmxproxy): add execution prompt to fix failing write integration tests
This commit is contained in:
119
lmxproxy/EXECUTE_FIX_WRITE_TESTS.md
Normal file
119
lmxproxy/EXECUTE_FIX_WRITE_TESTS.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# LmxProxy v2 — Fix Failing Write Integration Tests
|
||||
|
||||
Run this prompt with Claude Code from the `lmxproxy/` directory.
|
||||
|
||||
## Prompt
|
||||
|
||||
You are debugging and fixing 2 failing integration tests for the LmxProxy v2 gRPC proxy service. The tests are `WriteAndReadBack` and `WriteBatchAndWait` in the integration test project.
|
||||
|
||||
### Context
|
||||
|
||||
Read these documents before starting:
|
||||
|
||||
1. `CLAUDE.md` — project-level instructions and architecture
|
||||
2. `docs/deviations.md` — deviation #7 describes the failure (OnWriteComplete COM callback not firing)
|
||||
3. `mxaccess_documentation.md` — the official MxAccess Toolkit reference. Search for "OnWriteComplete", "Write() method", and "AdviseSupervisory" sections.
|
||||
|
||||
Read these source files:
|
||||
|
||||
4. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.ReadWrite.cs` — current v2 write implementation
|
||||
5. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.EventHandlers.cs` — OnWriteComplete callback handler
|
||||
6. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — COM event wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`)
|
||||
7. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.ReadWrite.cs` — v1 write implementation (for comparison)
|
||||
8. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.EventHandlers.cs` — v1 OnWriteComplete handler
|
||||
9. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteTests.cs` — failing WriteAndReadBack test
|
||||
10. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteBatchAndWaitTests.cs` — failing WriteBatchAndWait test
|
||||
|
||||
### Problem Statement
|
||||
|
||||
The `OnWriteComplete` COM callback from MxAccess never fires within the write timeout (default 5s). The write operation follows this flow:
|
||||
|
||||
1. `AddItem()` — registers the tag with MxAccess
|
||||
2. `AdviseSupervisory()` — establishes a supervisory connection for the tag
|
||||
3. Store a `TaskCompletionSource<bool>` in `_pendingWrites[itemHandle]`
|
||||
4. `Write()` — sends the write to MxAccess
|
||||
5. Wait for `OnWriteComplete` callback to resolve the TCS — **this never fires, causing a timeout**
|
||||
|
||||
The v1 code used the same pattern and presumably worked, so the issue is either:
|
||||
|
||||
- (a) MxAccess completes the write synchronously and never fires `OnWriteComplete` for simple (non-secured, non-verified) writes. The documentation says: "The LMXProxyInterface triggers an event for OnWriteComplete when your program calls the Write() or WriteSecured() function." However, this may not be true in all configurations or for all attribute types.
|
||||
- (b) The COM event subscription wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`) is correct syntactically but the callback doesn't fire because the thread that called `Write()` (a thread pool thread via `Task.Run`) isn't pumping COM messages. Note: STA threading was abandoned (see deviation #2 in `docs/deviations.md`) because it caused other issues with `OnDataChange` callbacks.
|
||||
- (c) There's a difference in how v1 vs v2 initializes or interacts with the COM object that affects event delivery.
|
||||
|
||||
### Investigation Steps
|
||||
|
||||
These steps require SSH access to windev where the v2 Host is deployed.
|
||||
|
||||
**Step 1: Check if the write actually succeeds despite no callback.**
|
||||
|
||||
The `WriteAndReadBack` test writes a value and then reads it back. The test fails because `WriteAsync` throws a `TimeoutException` (OnWriteComplete never fires). But the write itself may have succeeded at the MxAccess level.
|
||||
|
||||
SSH to windev (`ssh windev`) and:
|
||||
- Check the v2 Host service logs in `C:\publish-v2\logs\` for any write-related log entries
|
||||
- Look for "Write failed", "WriteComplete", or "timeout" messages
|
||||
|
||||
**Step 2: Add a fire-and-forget write mode.**
|
||||
|
||||
If the write succeeds at the MxAccess level but `OnWriteComplete` never fires, the simplest fix is to bypass the callback wait. Modify `MxAccessClient.ReadWrite.cs`:
|
||||
|
||||
- After calling `_lmxProxy.Write()`, immediately resolve the TCS with success instead of waiting for the callback
|
||||
- Keep the `OnWriteComplete` handler wired up — if it does fire, it can log the result for diagnostics but shouldn't block the write path
|
||||
- Add a configuration option `WriteConfirmationMode` with values `FireAndForget` (default) and `WaitForCallback`, so the behavior can be switched if needed
|
||||
|
||||
The rationale: the MxAccess documentation's sample application (Chapter 6) uses `OnWriteComplete` to detect whether a *secured* or *verified* write is needed, then retries with `WriteSecured()`. For simple supervisory writes (which is what LmxProxy does), the Write() call itself is the confirmation — if it doesn't throw, the write was accepted.
|
||||
|
||||
**Step 3: Implement the fix.**
|
||||
|
||||
In `MxAccessClient.ReadWrite.cs`, modify `SetupWriteOperationAsync`:
|
||||
|
||||
```
|
||||
// After _lmxProxy.Write():
|
||||
// Immediately complete the write — OnWriteComplete may not fire for supervisory writes
|
||||
tcs.TrySetResult(true);
|
||||
```
|
||||
|
||||
Remove the `_pendingWrites[itemHandle] = tcs` tracking since it's no longer needed for the default path. Keep `OnWriteComplete` wired for logging/diagnostics.
|
||||
|
||||
Clean up `WaitForWriteCompletionAsync` — in fire-and-forget mode, the TCS is already completed so the await returns immediately. The cleanup (UnAdvise + RemoveItem) should still happen.
|
||||
|
||||
**Step 4: Consider an alternative approach — poll-based write confirmation.**
|
||||
|
||||
If fire-and-forget is too loose (we want to confirm the write took effect), consider a poll-based approach similar to `WriteBatchAndWaitAsync`: after writing, read the tag back and compare. This is more reliable than depending on a COM callback. However, this adds latency and may not be needed — the `WriteAndReadBack` test already does a read-back verification.
|
||||
|
||||
**Step 5: Build and deploy.**
|
||||
|
||||
After making changes:
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\source\lmxproxy && git pull && dotnet build src\ZB.MOM.WW.LmxProxy.Host -c Release -r win-x86 --no-self-contained -o C:\publish-v2"
|
||||
```
|
||||
|
||||
Restart the v2 service:
|
||||
```bash
|
||||
ssh windev "net stop LmxProxyV2 && net start LmxProxyV2"
|
||||
```
|
||||
|
||||
**Step 6: Run the integration tests.**
|
||||
|
||||
From the Mac:
|
||||
```bash
|
||||
cd tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests
|
||||
dotnet test --filter "WriteAndReadBack|WriteBatchAndWait" -v n
|
||||
```
|
||||
|
||||
Both tests should pass. If they do, run the full integration test suite:
|
||||
```bash
|
||||
dotnet test -v n
|
||||
```
|
||||
|
||||
**Step 7: Update deviations document.**
|
||||
|
||||
If the fix works, update `docs/deviations.md` deviation #7 to reflect the resolution. Change the status from pending to resolved and document what was done.
|
||||
|
||||
### Guardrails
|
||||
|
||||
1. **Do not reintroduce STA threading.** The MTA/Task.Run approach works for `OnDataChange` callbacks and subscriptions. Do not change the threading model.
|
||||
2. **Do not modify the integration tests** unless they have a genuine bug (e.g., wrong assertion, wrong tag name). The tests define the expected behavior — fix the implementation to match.
|
||||
3. **Do not modify the proto file or gRPC contracts.** This is a Host-side implementation fix only.
|
||||
4. **Keep the OnWriteComplete handler wired.** Even if we don't wait on it, the callback provides diagnostic information (security errors, verified write requirements) that should be logged.
|
||||
5. **Commit with message:** `fix(lmxproxy): resolve write timeout — bypass OnWriteComplete callback for supervisory writes`
|
||||
Reference in New Issue
Block a user