chore(lmxproxy): switch health probe tag to DevAppEngine.Scheduler.ScanTime, remove temp prompts
AppEngine built-in tag is always present and constantly updating (~1s), making it a more reliable probe than a user-deployed TestChildObject tag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,119 +0,0 @@
|
||||
# LmxProxy v2 — Fix Failing Write Integration Tests
|
||||
|
||||
Run this prompt with Claude Code from the `lmxproxy/` directory.
|
||||
|
||||
## Prompt
|
||||
|
||||
You are debugging and fixing 2 failing integration tests for the LmxProxy v2 gRPC proxy service. The tests are `WriteAndReadBack` and `WriteBatchAndWait` in the integration test project.
|
||||
|
||||
### Context
|
||||
|
||||
Read these documents before starting:
|
||||
|
||||
1. `CLAUDE.md` — project-level instructions and architecture
|
||||
2. `docs/deviations.md` — deviation #7 describes the failure (OnWriteComplete COM callback not firing)
|
||||
3. `mxaccess_documentation.md` — the official MxAccess Toolkit reference. Search for "OnWriteComplete", "Write() method", and "AdviseSupervisory" sections.
|
||||
|
||||
Read these source files:
|
||||
|
||||
4. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.ReadWrite.cs` — current v2 write implementation
|
||||
5. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.EventHandlers.cs` — OnWriteComplete callback handler
|
||||
6. `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — COM event wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`)
|
||||
7. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.ReadWrite.cs` — v1 write implementation (for comparison)
|
||||
8. `src-reference/ZB.MOM.WW.LmxProxy.Host/Implementation/MxAccessClient.EventHandlers.cs` — v1 OnWriteComplete handler
|
||||
9. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteTests.cs` — failing WriteAndReadBack test
|
||||
10. `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteBatchAndWaitTests.cs` — failing WriteBatchAndWait test
|
||||
|
||||
### Problem Statement
|
||||
|
||||
The `OnWriteComplete` COM callback from MxAccess never fires within the write timeout (default 5s). The write operation follows this flow:
|
||||
|
||||
1. `AddItem()` — registers the tag with MxAccess
|
||||
2. `AdviseSupervisory()` — establishes a supervisory connection for the tag
|
||||
3. Store a `TaskCompletionSource<bool>` in `_pendingWrites[itemHandle]`
|
||||
4. `Write()` — sends the write to MxAccess
|
||||
5. Wait for `OnWriteComplete` callback to resolve the TCS — **this never fires, causing a timeout**
|
||||
|
||||
The v1 code used the same pattern and presumably worked, so the issue is either:
|
||||
|
||||
- (a) MxAccess completes the write synchronously and never fires `OnWriteComplete` for simple (non-secured, non-verified) writes. The documentation says: "The LMXProxyInterface triggers an event for OnWriteComplete when your program calls the Write() or WriteSecured() function." However, this may not be true in all configurations or for all attribute types.
|
||||
- (b) The COM event subscription wiring (`_lmxProxy.OnWriteComplete += OnWriteComplete`) is correct syntactically but the callback doesn't fire because the thread that called `Write()` (a thread pool thread via `Task.Run`) isn't pumping COM messages. Note: STA threading was abandoned (see deviation #2 in `docs/deviations.md`) because it caused other issues with `OnDataChange` callbacks.
|
||||
- (c) There's a difference in how v1 vs v2 initializes or interacts with the COM object that affects event delivery.
|
||||
|
||||
### Investigation Steps
|
||||
|
||||
These steps require SSH access to windev where the v2 Host is deployed.
|
||||
|
||||
**Step 1: Check if the write actually succeeds despite no callback.**
|
||||
|
||||
The `WriteAndReadBack` test writes a value and then reads it back. The test fails because `WriteAsync` throws a `TimeoutException` (OnWriteComplete never fires). But the write itself may have succeeded at the MxAccess level.
|
||||
|
||||
SSH to windev (`ssh windev`) and:
|
||||
- Check the v2 Host service logs in `C:\publish-v2\logs\` for any write-related log entries
|
||||
- Look for "Write failed", "WriteComplete", or "timeout" messages
|
||||
|
||||
**Step 2: Add a fire-and-forget write mode.**
|
||||
|
||||
If the write succeeds at the MxAccess level but `OnWriteComplete` never fires, the simplest fix is to bypass the callback wait. Modify `MxAccessClient.ReadWrite.cs`:
|
||||
|
||||
- After calling `_lmxProxy.Write()`, immediately resolve the TCS with success instead of waiting for the callback
|
||||
- Keep the `OnWriteComplete` handler wired up — if it does fire, it can log the result for diagnostics but shouldn't block the write path
|
||||
- Add a configuration option `WriteConfirmationMode` with values `FireAndForget` (default) and `WaitForCallback`, so the behavior can be switched if needed
|
||||
|
||||
The rationale: the MxAccess documentation's sample application (Chapter 6) uses `OnWriteComplete` to detect whether a *secured* or *verified* write is needed, then retries with `WriteSecured()`. For simple supervisory writes (which is what LmxProxy does), the Write() call itself is the confirmation — if it doesn't throw, the write was accepted.
|
||||
|
||||
**Step 3: Implement the fix.**
|
||||
|
||||
In `MxAccessClient.ReadWrite.cs`, modify `SetupWriteOperationAsync`:
|
||||
|
||||
```
|
||||
// After _lmxProxy.Write():
|
||||
// Immediately complete the write — OnWriteComplete may not fire for supervisory writes
|
||||
tcs.TrySetResult(true);
|
||||
```
|
||||
|
||||
Remove the `_pendingWrites[itemHandle] = tcs` tracking since it's no longer needed for the default path. Keep `OnWriteComplete` wired for logging/diagnostics.
|
||||
|
||||
Clean up `WaitForWriteCompletionAsync` — in fire-and-forget mode, the TCS is already completed so the await returns immediately. The cleanup (UnAdvise + RemoveItem) should still happen.
|
||||
|
||||
**Step 4: Consider an alternative approach — poll-based write confirmation.**
|
||||
|
||||
If fire-and-forget is too loose (we want to confirm the write took effect), consider a poll-based approach similar to `WriteBatchAndWaitAsync`: after writing, read the tag back and compare. This is more reliable than depending on a COM callback. However, this adds latency and may not be needed — the `WriteAndReadBack` test already does a read-back verification.
|
||||
|
||||
**Step 5: Build and deploy.**
|
||||
|
||||
After making changes:
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\source\lmxproxy && git pull && dotnet build src\ZB.MOM.WW.LmxProxy.Host -c Release -r win-x86 --no-self-contained -o C:\publish-v2"
|
||||
```
|
||||
|
||||
Restart the v2 service:
|
||||
```bash
|
||||
ssh windev "net stop LmxProxyV2 && net start LmxProxyV2"
|
||||
```
|
||||
|
||||
**Step 6: Run the integration tests.**
|
||||
|
||||
From the Mac:
|
||||
```bash
|
||||
cd tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests
|
||||
dotnet test --filter "WriteAndReadBack|WriteBatchAndWait" -v n
|
||||
```
|
||||
|
||||
Both tests should pass. If they do, run the full integration test suite:
|
||||
```bash
|
||||
dotnet test -v n
|
||||
```
|
||||
|
||||
**Step 7: Update deviations document.**
|
||||
|
||||
If the fix works, update `docs/deviations.md` deviation #7 to reflect the resolution. Change the status from pending to resolved and document what was done.
|
||||
|
||||
### Guardrails
|
||||
|
||||
1. **Do not reintroduce STA threading.** The MTA/Task.Run approach works for `OnDataChange` callbacks and subscriptions. Do not change the threading model.
|
||||
2. **Do not modify the integration tests** unless they have a genuine bug (e.g., wrong assertion, wrong tag name). The tests define the expected behavior — fix the implementation to match.
|
||||
3. **Do not modify the proto file or gRPC contracts.** This is a Host-side implementation fix only.
|
||||
4. **Keep the OnWriteComplete handler wired.** Even if we don't wait on it, the callback provides diagnostic information (security errors, verified write requirements) that should be logged.
|
||||
5. **Commit with message:** `fix(lmxproxy): resolve write timeout — bypass OnWriteComplete callback for supervisory writes`
|
||||
@@ -1,98 +0,0 @@
|
||||
# LmxProxy v2 Rebuild — Execution Prompt
|
||||
|
||||
Run this prompt with Claude Code from the `lmxproxy/` directory to execute all 7 phases of the rebuild autonomously.
|
||||
|
||||
## Prompt
|
||||
|
||||
You are executing a pre-approved implementation plan for rebuilding the LmxProxy gRPC proxy service. All design decisions have been made and documented. You do NOT need to ask for approval — execute each phase completely, then move to the next.
|
||||
|
||||
### Context
|
||||
|
||||
Read these documents in order before starting:
|
||||
|
||||
1. `docs/plans/2026-03-21-lmxproxy-v2-rebuild-design.md` — the approved design
|
||||
2. `CLAUDE.md` — project-level instructions
|
||||
3. `docs/requirements/HighLevelReqs.md` — high-level requirements
|
||||
4. `docs/requirements/Component-*.md` — all component requirements (10 files)
|
||||
5. `docs/lmxproxy_updates.md` — authoritative v2 protocol specification
|
||||
|
||||
### Execution Order
|
||||
|
||||
Execute phases in this exact order. Each phase has a detailed plan in `docs/plans/`:
|
||||
|
||||
1. **Phase 1**: `docs/plans/phase-1-protocol-domain-types.md`
|
||||
2. **Phase 2**: `docs/plans/phase-2-host-core.md`
|
||||
3. **Phase 3**: `docs/plans/phase-3-host-grpc-security-config.md`
|
||||
4. **Phase 4**: `docs/plans/phase-4-host-health-metrics.md`
|
||||
5. **Phase 5**: `docs/plans/phase-5-client-core.md`
|
||||
6. **Phase 6**: `docs/plans/phase-6-client-extras.md`
|
||||
7. **Phase 7**: `docs/plans/phase-7-integration-deployment.md`
|
||||
|
||||
### How to Execute Each Phase
|
||||
|
||||
For each phase:
|
||||
|
||||
1. Read the phase plan document completely before writing any code.
|
||||
2. Read any referenced requirements documents for that phase.
|
||||
3. Execute each step in the plan in order.
|
||||
4. After all steps, run `dotnet build` and `dotnet test` to verify.
|
||||
5. If build or tests fail, fix the issues before proceeding.
|
||||
6. Commit the phase with message: `feat(lmxproxy): phase N — <description>`
|
||||
7. Push to remote: `git push`
|
||||
8. Move to the next phase.
|
||||
|
||||
### Guardrails (MUST follow)
|
||||
|
||||
1. **Proto is the source of truth** — any wire format question is resolved by reading `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Protos/scada.proto`, not the code-first contracts.
|
||||
2. **No v1 code in the new build** — the `src-reference/` directory is for reading only. Do not copy-paste and modify; write fresh code guided by the plan.
|
||||
3. **Cross-stack tests in Phase 1** — Host proto serialize to Client code-first deserialize (and vice versa) must pass before any business logic.
|
||||
4. **COM calls only on STA dispatch thread** — no `Task.Run` for COM operations. All go through the `StaDispatchThread` dispatch queue.
|
||||
5. **status_code is canonical for quality** — `symbolic_name` is always derived from lookup, never independently set.
|
||||
6. **Unit tests before integration** — every phase includes unit tests. Integration tests are Phase 7 only.
|
||||
7. **Each phase must compile and pass tests** before the next phase begins. Do not skip failing tests.
|
||||
8. **No string serialization heuristics** — v2 uses native TypedValue. No `double.TryParse` or `bool.TryParse` on values.
|
||||
9. **Do not modify requirements or design docs** — if you find a conflict, follow the design doc's resolution (section 11).
|
||||
10. **Do not ask for user approval** — all decisions are pre-approved in the design document.
|
||||
|
||||
### Error Recovery
|
||||
|
||||
- If a build fails, read the error messages carefully, fix the code, and rebuild.
|
||||
- If a test fails, fix the implementation (not the test) unless the test has a clear bug.
|
||||
- If a step in the plan is ambiguous, consult the requirements document for that component.
|
||||
- If the requirements are ambiguous, consult the design document's resolution table (section 11).
|
||||
- If you cannot resolve an issue after 3 attempts, skip that step, leave a `// TODO: <description>` comment, and continue.
|
||||
|
||||
### Phase 7 Special Instructions
|
||||
|
||||
Phase 7 requires SSH access to windev (10.100.0.48). See `windev.md` in the repo root for connection details:
|
||||
- SSH: `ssh windev` (passwordless)
|
||||
- Default shell: cmd.exe, use `powershell -Command` for PowerShell
|
||||
- Git and .NET SDK 10 are installed
|
||||
- The existing v1 LmxProxy service is at `C:\publish\` on port 50051
|
||||
|
||||
For Veeam backups, SSH to the Veeam server:
|
||||
- SSH: `ssh dohertj2@10.100.0.30` (passwordless)
|
||||
- Use `Add-PSSnapin VeeamPSSnapin` for Veeam PowerShell
|
||||
|
||||
### Commit Messages
|
||||
|
||||
Use this format for each phase commit:
|
||||
|
||||
- Phase 1: `feat(lmxproxy): phase 1 — v2 protocol types and domain model`
|
||||
- Phase 2: `feat(lmxproxy): phase 2 — host core (MxAccessClient, SessionManager, SubscriptionManager)`
|
||||
- Phase 3: `feat(lmxproxy): phase 3 — host gRPC server, security, configuration, service hosting`
|
||||
- Phase 4: `feat(lmxproxy): phase 4 — host health monitoring, metrics, status web server`
|
||||
- Phase 5: `feat(lmxproxy): phase 5 — client core (ILmxProxyClient, connection, read/write/subscribe)`
|
||||
- Phase 6: `feat(lmxproxy): phase 6 — client extras (builder, factory, DI, streaming extensions)`
|
||||
- Phase 7: `feat(lmxproxy): phase 7 — integration tests, deployment to windev, v1 cutover`
|
||||
|
||||
### After All Phases
|
||||
|
||||
When all 7 phases are complete:
|
||||
|
||||
1. Run `dotnet build ZB.MOM.WW.LmxProxy.slnx` to verify the full solution builds.
|
||||
2. Run `dotnet test` to verify all unit tests pass.
|
||||
3. Verify the integration tests passed in Phase 7.
|
||||
4. Create a final commit if any cleanup was needed.
|
||||
5. Push all changes.
|
||||
6. Report: total files created, total tests, build status, integration test results.
|
||||
@@ -1,95 +0,0 @@
|
||||
# LmxProxy Requirements Documentation Prompt
|
||||
|
||||
Use this prompt with Claude Code to generate the requirements documentation for the LmxProxy project. Run from the `lmxproxy/` directory.
|
||||
|
||||
## Prompt
|
||||
|
||||
Create requirements documentation for the LmxProxy project in `docs/requirements/`. Follow the same structure used in the ScadaLink project (`docs/requirements/` in the parent repo) — a high-level requirements doc and per-component breakout documents.
|
||||
|
||||
### Context
|
||||
|
||||
LmxProxy is a gRPC proxy service that bridges SCADA clients to industrial automation systems (primarily AVEVA/Wonderware System Platform via ArchestrA.MXAccess). It consists of two projects:
|
||||
|
||||
1. **ZB.MOM.WW.LmxProxy.Host** — A .NET Framework 4.8 Windows service (Topshelf) that runs on the same machine as System Platform. It connects to MXAccess (COM interop, x86) and exposes a gRPC server for remote SCADA operations (read, write, subscribe, batch operations). It handles session management, API key authentication, TLS, health checks, performance metrics, and subscription management.
|
||||
|
||||
2. **ZB.MOM.WW.LmxProxy.Client** — A .NET 10 class library providing a typed gRPC client for consuming the LmxProxy service. It uses protobuf-net.Grpc (code-first, no .proto files). It includes connection management, retry policies, TLS support, streaming extensions, DI integration, and a builder pattern for configuration.
|
||||
|
||||
### What to Generate
|
||||
|
||||
**1. `docs/requirements/HighLevelReqs.md`** — High-level requirements covering:
|
||||
- System purpose and architecture (proxy pattern, why it exists)
|
||||
- Deployment model (runs on System Platform machine, clients connect remotely)
|
||||
- Communication protocol (gRPC, HTTP/2, code-first and proto-based)
|
||||
- Session lifecycle (connect, session ID, disconnect, no idle timeout)
|
||||
- Authentication model (API key via metadata header, configurable enforcement)
|
||||
- TLS/security model (optional TLS, mutual TLS support, certificate validation)
|
||||
- Data model (VTQ — Value/Timestamp/Quality, OPC-style quality codes)
|
||||
- Operations (read, read batch, write, write batch, write-and-wait, subscribe)
|
||||
- Subscription model (server-streaming, tag-based, sampling interval)
|
||||
- Health monitoring and metrics
|
||||
- Service hosting (Topshelf Windows service, service recovery)
|
||||
- Configuration (appsettings.json sections)
|
||||
- Scale considerations
|
||||
- Protocol versioning (v1 string-based, v2 OPC UA-aligned typed values)
|
||||
|
||||
**2. Component documents** — One `Component-<Name>.md` for each logical component:
|
||||
|
||||
- **Component-GrpcServer.md** — The gRPC service implementation (ScadaGrpcService). Session validation, request routing to MxAccessClient, subscription lifecycle, error handling, proto-based serialization.
|
||||
|
||||
- **Component-MxAccessClient.md** — The MXAccess COM interop wrapper. Connection lifecycle (Become/Stash-like state machine), tag registration, read/write operations, subscription via advise callbacks, event handling, x86/COM threading constraints. This is the core component.
|
||||
|
||||
- **Component-SessionManager.md** — Client session tracking, session creation/destruction, session-to-client mapping, concurrent session limits.
|
||||
|
||||
- **Component-Security.md** — API key authentication (ApiKeyService, ApiKeyInterceptor), key file management, role-based permissions (ReadOnly/ReadWrite), TLS certificate management.
|
||||
|
||||
- **Component-SubscriptionManager.md** — Tag subscription lifecycle, channel-based update delivery, sampling intervals, backpressure (channel full modes), subscription cleanup on disconnect.
|
||||
|
||||
- **Component-Configuration.md** — appsettings.json structure, configuration validation, TLS configuration, service recovery configuration, connection timeouts, retry policies.
|
||||
|
||||
- **Component-HealthAndMetrics.md** — Health check service (test tag reads, stale data detection), performance metrics (operation counts, latencies, percentiles), status web server (HTTP status endpoint).
|
||||
|
||||
- **Component-ServiceHost.md** — Topshelf service hosting, Program.cs entry point, Serilog logging setup, service install/uninstall, service recovery (Windows SCM restart policies).
|
||||
|
||||
- **Component-Client.md** — The LmxProxyClient library. Builder pattern, connection management, retry with Polly, keep-alive pings, streaming extensions, DI registration (ServiceCollectionExtensions), factory pattern, TLS configuration.
|
||||
|
||||
- **Component-Protocol.md** — The gRPC protocol specification. Proto definition, code-first contracts (IScadaService), message schemas, VTQ format, quality codes, v1 vs v2 differences.
|
||||
|
||||
### Document Structure (per component)
|
||||
|
||||
Each component doc must follow this structure exactly:
|
||||
```
|
||||
# Component: <Name>
|
||||
|
||||
## Purpose
|
||||
<1-2 sentence description>
|
||||
|
||||
## Location
|
||||
<Which project(s) and key files>
|
||||
|
||||
## Responsibilities
|
||||
<Bulleted list of what this component does>
|
||||
|
||||
## <Detail sections>
|
||||
<Numbered or named sections with specific design details>
|
||||
|
||||
## Dependencies
|
||||
<What this component depends on>
|
||||
|
||||
## Interactions
|
||||
<How this component interacts with others>
|
||||
```
|
||||
|
||||
### Sources
|
||||
|
||||
Derive requirements from:
|
||||
- The source code in `src/ZB.MOM.WW.LmxProxy.Host/` and `src/ZB.MOM.WW.LmxProxy.Client/`
|
||||
- The protocol docs in `docs/lmxproxy_protocol.md` and `docs/lmxproxy_updates.md`
|
||||
- The appsettings.json configuration files
|
||||
|
||||
### Rules
|
||||
|
||||
- Write requirements as design decisions, not aspirational statements. Describe what the system **does**, not what it **should** do.
|
||||
- Include specific values from configuration (ports, timeouts, intervals, limits).
|
||||
- Cross-reference between documents using component names.
|
||||
- Keep the high-level doc focused on system-wide concerns; push implementation details to component docs.
|
||||
- Do not invent features not present in the source code.
|
||||
@@ -31,8 +31,8 @@ namespace ZB.MOM.WW.LmxProxy.Host.Configuration
|
||||
/// <summary>Health check / probe configuration.</summary>
|
||||
public class HealthCheckConfiguration
|
||||
{
|
||||
/// <summary>Tag address to probe for connection liveness. Default: TestChildObject.TestBool.</summary>
|
||||
public string TestTagAddress { get; set; } = "TestChildObject.TestBool";
|
||||
/// <summary>Tag address to probe for connection liveness. Default: DevAppEngine.Scheduler.ScanTime.</summary>
|
||||
public string TestTagAddress { get; set; } = "DevAppEngine.Scheduler.ScanTime";
|
||||
|
||||
/// <summary>Probe timeout in milliseconds. Default: 5000.</summary>
|
||||
public int ProbeTimeoutMs { get; set; } = 5000;
|
||||
|
||||
@@ -20,7 +20,7 @@ namespace ZB.MOM.WW.LmxProxy.Host.Health
|
||||
|
||||
public DetailedHealthCheckService(
|
||||
IScadaClient scadaClient,
|
||||
string testTagAddress = "TestChildObject.TestBool")
|
||||
string testTagAddress = "DevAppEngine.Scheduler.ScanTime")
|
||||
{
|
||||
_scadaClient = scadaClient;
|
||||
_testTagAddress = testTagAddress;
|
||||
|
||||
@@ -33,7 +33,7 @@
|
||||
},
|
||||
|
||||
"HealthCheck": {
|
||||
"TestTagAddress": "TestChildObject.TestBool",
|
||||
"TestTagAddress": "DevAppEngine.Scheduler.ScanTime",
|
||||
"ProbeTimeoutMs": 5000,
|
||||
"MaxConsecutiveTransportFailures": 3,
|
||||
"DegradedProbeIntervalMs": 30000
|
||||
|
||||
Reference in New Issue
Block a user