deprecate(lmxproxy): move all LmxProxy code, tests, and docs to deprecated/

LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL
adapter files, and related docs to deprecated/. Removed LmxProxy registration
from DataConnectionFactory, project reference from DCL, protocol option from
UI, and cleaned up all requirement docs.
This commit is contained in:
Joseph Doherty
2026-04-08 15:56:23 -04:00
parent 8423915ba1
commit 9dccf8e72f
220 changed files with 25 additions and 132 deletions
@@ -0,0 +1,210 @@
# LmxProxy v2 Rebuild — Design Document
**Date**: 2026-03-21
**Status**: Approved
**Scope**: Complete rebuild of LmxProxy Host and Client with v2 protocol
## 1. Overview
Rebuild the LmxProxy gRPC proxy service from scratch, implementing the v2 protocol (TypedValue + QualityCode) as defined in `docs/lmxproxy_updates.md`. The existing code in `src/` is retained as reference only. No backward compatibility with v1.
## 2. Key Design Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| gRPC server for Host | Grpc.Core (C-core) | Only option for .NET Framework 4.8 server-side |
| Service hosting | Topshelf | Proven, already deployed, simple install/uninstall |
| Protocol version | v2 only, clean break | Small controlled client count, no value in v1 compat |
| Shared code between projects | None — fully independent | Different .NET runtimes (.NET Fx 4.8 vs .NET 10), wire compat is the contract |
| Client retry library | Polly v8+ | Building fresh on .NET 10, modern API |
| Testing strategy | Unit tests during implementation, integration tests after Client functional | Phased approach, real hardware validation on windev |
## 3. Architecture
### 3.1 Host (.NET Framework 4.8, x86)
```
Program.cs (Topshelf entry point)
└── LmxProxyService (lifecycle manager)
├── Configuration (appsettings.json binding + validation)
├── MxAccessClient (COM interop, STA dispatch thread)
│ ├── Connection state machine
│ ├── Read/Write with semaphore concurrency
│ ├── Subscription storage for reconnect replay
│ └── Auto-reconnect loop (5s interval)
├── SessionManager (ConcurrentDictionary, 5-min inactivity scavenging)
├── SubscriptionManager (per-client channels, shared MxAccess subscriptions)
├── ApiKeyService (JSON file, FileSystemWatcher hot-reload)
├── ScadaGrpcService (proto-generated, all 10 RPCs)
│ └── ApiKeyInterceptor (x-api-key header enforcement)
├── PerformanceMetrics (per-op tracking, p95, 60s log)
├── HealthCheckService (basic + detailed with test tag)
└── StatusWebServer (HTML dashboard, JSON status, health endpoint)
```
### 3.2 Client (.NET 10, AnyCPU)
```
ILmxProxyClient (public interface)
└── LmxProxyClient (partial class)
├── Connection (GrpcChannel, protobuf-net.Grpc, 30s keep-alive)
├── Read/Write/Subscribe operations
├── CodeFirstSubscription (IAsyncEnumerable streaming)
├── ClientMetrics (p95/p99, 1000-sample buffer)
└── Disposal (session disconnect, channel cleanup)
LmxProxyClientBuilder (fluent builder, Polly v8 resilience pipeline)
ILmxProxyClientFactory + LmxProxyClientFactory (config-based creation)
ServiceCollectionExtensions (DI registrations)
StreamingExtensions (batched reads/writes, parallel processing)
Domain/
├── ScadaContracts.cs (IScadaService + all DataContract messages)
├── Quality.cs, QualityExtensions.cs
├── Vtq.cs
└── ConnectionState.cs
```
### 3.3 Wire Compatibility
The `.proto` file is the single source of truth for the wire format. Host generates server stubs from it. Client implements code-first contracts (`[DataContract]`/`[ServiceContract]`) that mirror the proto exactly — same field numbers, names, nesting, and streaming shapes. Cross-stack serialization tests verify compatibility.
## 4. Protocol (v2)
### 4.1 TypedValue System
Protobuf `oneof` carrying native types:
| Case | Proto Type | .NET Type |
|------|-----------|-----------|
| bool_value | bool | bool |
| int32_value | int32 | int |
| int64_value | int64 | long |
| float_value | float | float |
| double_value | double | double |
| string_value | string | string |
| bytes_value | bytes | byte[] |
| datetime_value | int64 (UTC Ticks) | DateTime |
| array_value | ArrayValue | typed arrays |
Unset `oneof` = null. No string serialization heuristics.
### 4.2 COM Variant Coercion Table
| COM Variant Type | TypedValue Case | Notes |
|-----------------|-----------------|-------|
| VT_BOOL | bool_value | |
| VT_I2 (short) | int32_value | Widened |
| VT_I4 (int) | int32_value | |
| VT_I8 (long) | int64_value | |
| VT_UI2 (ushort) | int32_value | Widened |
| VT_UI4 (uint) | int64_value | Widened to avoid sign issues |
| VT_UI8 (ulong) | int64_value | Truncation risk logged if > long.MaxValue |
| VT_R4 (float) | float_value | |
| VT_R8 (double) | double_value | |
| VT_BSTR (string) | string_value | |
| VT_DATE (DateTime) | datetime_value | Converted to UTC Ticks |
| VT_DECIMAL | double_value | Precision loss logged |
| VT_CY (Currency) | double_value | |
| VT_NULL, VT_EMPTY, DBNull | unset oneof | Represents null |
| VT_ARRAY | array_value | Element type determines ArrayValue field |
| VT_UNKNOWN | string_value | ToString() fallback, logged as warning |
### 4.3 QualityCode System
`status_code` (uint32, OPC UA-compatible) is canonical. `symbolic_name` is derived from a lookup table, never set independently.
Category derived from high bits:
- `0x00xxxxxx` = Good
- `0x40xxxxxx` = Uncertain
- `0x80xxxxxx` = Bad
Domain `Quality` enum uses byte values for the low-order byte, with extension methods `IsGood()`, `IsBad()`, `IsUncertain()`.
### 4.4 Error Model
| Error Type | Mechanism | Examples |
|-----------|-----------|----------|
| Infrastructure | gRPC StatusCode | Unauthenticated (bad API key), PermissionDenied (ReadOnly write), InvalidArgument (bad session), Unavailable (MxAccess down) |
| Business outcome | Payload `success`/`message` fields | Tag read failure, write type mismatch, batch partial failure, WriteBatchAndWait flag timeout |
| Subscription | gRPC StatusCode on stream | Unauthenticated (invalid session), Internal (unexpected error) |
## 5. COM Threading Model
MxAccess is an STA COM component. All COM operations execute on a **dedicated STA thread** with a `BlockingCollection<Action>` dispatch queue:
- `MxAccessClient` creates a single STA thread at construction
- All COM calls (connect, read, write, subscribe, disconnect) are dispatched to this thread via the queue
- Callers await a `TaskCompletionSource<T>` that the STA thread completes after the COM call
- The STA thread runs a message pump loop (`Application.Run` or manual `MSG` pump)
- On disposal, a sentinel is enqueued and the thread joins with a 10-second timeout
This replaces the fragile `Task.Run` + `SemaphoreSlim` pattern in the reference code.
## 6. Session Lifecycle
- Sessions created on `Connect` with GUID "N" format (32-char hex)
- Tracked in `ConcurrentDictionary<string, SessionInfo>`
- **Inactivity scavenging**: sessions not accessed for 5 minutes are automatically terminated. Client keep-alive pings (30s) keep legitimate sessions alive.
- On termination: subscriptions cleaned up, session removed from dictionary
- All sessions lost on service restart (in-memory only)
## 7. Subscription Semantics
- **Shared MxAccess subscriptions**: first client to subscribe creates the underlying MxAccess subscription. Last to unsubscribe disposes it. Ref-counted.
- **Sampling rate**: when multiple clients subscribe to the same tag with different `sampling_ms`, the fastest (lowest non-zero) rate is used for the MxAccess subscription. All clients receive updates at this rate.
- **Per-client channels**: each client gets an independent `BoundedChannel<VtqMessage>` (capacity 1000, DropOldest). One slow consumer's drops do not affect other clients.
- **MxAccess disconnect**: all subscribed clients receive a bad-quality notification for all their subscribed tags.
- **Session termination**: all subscriptions for that session are cleaned up.
## 8. Authentication
- `x-api-key` gRPC metadata header is the authoritative authentication mechanism
- `ConnectRequest.api_key` is accepted but the interceptor is the enforcement point
- API keys loaded from JSON file with FileSystemWatcher hot-reload (1-second debounce)
- Auto-generates default file with two random keys (ReadOnly + ReadWrite) if missing
- Write-protected RPCs: Write, WriteBatch, WriteBatchAndWait
## 9. Phasing
| Phase | Scope | Depends On |
|-------|-------|------------|
| 1 | Protocol & Domain Types | — |
| 2 | Host Core (MxAccessClient, SessionManager, SubscriptionManager) | Phase 1 |
| 3 | Host gRPC Server, Security, Configuration, Service Hosting | Phase 2 |
| 4 | Host Health, Metrics, Status Server | Phase 3 |
| 5 | Client Core | Phase 1 |
| 6 | Client Extras (Builder, Factory, DI, Streaming) | Phase 5 |
| 7 | Integration Tests & Deployment | Phases 4 + 6 |
Phases 2-4 (Host) and 5-6 (Client) can proceed in parallel after Phase 1.
## 10. Guardrails
1. **Proto is the source of truth** — any wire format question is resolved by reading `scada.proto`, not the code-first contracts.
2. **No v1 code in the new build** — reference only. Do not copy-paste and modify; write fresh.
3. **Cross-stack tests in Phase 1** — Host proto serialize → Client code-first deserialize (and vice versa) before any business logic.
4. **COM calls only on STA thread** — no `Task.Run` for COM operations. All go through the dispatch queue.
5. **status_code is canonical for quality**`symbolic_name` is always derived, never independently set.
6. **Unit tests before integration** — every phase includes unit tests. Integration tests are Phase 7 only.
7. **Each phase must compile and pass tests** before the next phase begins.
8. **No string serialization heuristics** — v2 uses native TypedValue. No `double.TryParse` or `bool.TryParse` on values.
## 11. Resolved Conflicts
| Conflict | Resolution |
|----------|-----------|
| WriteBatchAndWait signature (MxAccessClient vs Protocol) | Follow Protocol spec: write items, poll flagTag for flagValue. IScadaClient interface matches protocol semantics. |
| Builder default port 5050 vs Host 50051 | Standardize builder default to 50051 |
| Auth in metadata vs payload | x-api-key header is authoritative; ConnectRequest.api_key accepted but interceptor enforces |
## 12. Reference Code
The existing code remains in `src/` as `src-reference/` for consultation:
- `src-reference/ZB.MOM.WW.LmxProxy.Host/` — v1 Host implementation
- `src-reference/ZB.MOM.WW.LmxProxy.Client/` — v1 Client implementation
Key reference files for COM interop patterns:
- `Implementation/MxAccessClient.Connection.cs` — COM object lifecycle
- `Implementation/MxAccessClient.EventHandlers.cs` — MxAccess callbacks
- `Implementation/MxAccessClient.Subscription.cs` — Advise/Unadvise patterns
@@ -0,0 +1,673 @@
# Gap 1 & Gap 2: Active Health Probing + Subscription Handle Cleanup
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Fix two reconnect-related gaps: (1) the monitor loop cannot detect a silently-dead MxAccess connection, and (2) SubscriptionManager holds stale IAsyncDisposable handles after reconnect.
**Architecture:** Add a domain-level connection probe to `MxAccessClient` that classifies results as Healthy/TransportFailure/DataDegraded. The monitor loop uses this to decide reconnect vs degrade-and-backoff. Separately, remove `SubscriptionManager._mxAccessHandles` entirely and switch to address-based unsubscribe through `IScadaClient`, making `MxAccessClient` the sole owner of COM subscription lifecycle.
**Tech Stack:** .NET Framework 4.8, C#, MxAccess COM interop, Serilog
---
## Task 0: Add `ProbeResult` domain type
**Files:**
- Create: `src/ZB.MOM.WW.LmxProxy.Host/Domain/ProbeResult.cs`
**Step 1: Create the ProbeResult type**
```csharp
using System;
namespace ZB.MOM.WW.LmxProxy.Host.Domain
{
public enum ProbeStatus
{
Healthy,
TransportFailure,
DataDegraded
}
public sealed class ProbeResult
{
public ProbeStatus Status { get; }
public Quality? Quality { get; }
public DateTime? Timestamp { get; }
public string? Message { get; }
public Exception? Exception { get; }
private ProbeResult(ProbeStatus status, Quality? quality, DateTime? timestamp,
string? message, Exception? exception)
{
Status = status;
Quality = quality;
Timestamp = timestamp;
Message = message;
Exception = exception;
}
public static ProbeResult Healthy(Quality quality, DateTime timestamp)
=> new ProbeResult(ProbeStatus.Healthy, quality, timestamp, null, null);
public static ProbeResult Degraded(Quality quality, DateTime timestamp, string message)
=> new ProbeResult(ProbeStatus.DataDegraded, quality, timestamp, message, null);
public static ProbeResult TransportFailed(string message, Exception? ex = null)
=> new ProbeResult(ProbeStatus.TransportFailure, null, null, message, ex);
}
}
```
**Step 2: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/Domain/ProbeResult.cs
git commit -m "feat: add ProbeResult domain type for connection health classification"
```
---
## Task 1: Add `ProbeConnectionAsync` to `MxAccessClient`
**Files:**
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs` — add `ProbeConnectionAsync` to interface
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — implement probe method
**Step 1: Add to IScadaClient interface**
In `IScadaClient.cs`, add after the `DisconnectAsync` method:
```csharp
/// <summary>
/// Probes connection health by reading a test tag.
/// Returns a classified result: Healthy, TransportFailure, or DataDegraded.
/// </summary>
Task<ProbeResult> ProbeConnectionAsync(string testTagAddress, int timeoutMs, CancellationToken ct = default);
```
**Step 2: Implement in MxAccessClient.Connection.cs**
Add before `MonitorConnectionAsync`:
```csharp
/// <summary>
/// Probes the connection by reading a test tag with a timeout.
/// Classifies the result as transport failure vs data degraded.
/// </summary>
public async Task<ProbeResult> ProbeConnectionAsync(string testTagAddress, int timeoutMs,
CancellationToken ct = default)
{
if (!IsConnected)
return ProbeResult.TransportFailed("Not connected");
try
{
using (var cts = CancellationTokenSource.CreateLinkedTokenSource(ct))
{
cts.CancelAfter(timeoutMs);
Vtq vtq;
try
{
vtq = await ReadAsync(testTagAddress, cts.Token);
}
catch (OperationCanceledException) when (!ct.IsCancellationRequested)
{
// Our timeout fired, not the caller's — treat as transport failure
return ProbeResult.TransportFailed("Probe read timed out after " + timeoutMs + "ms");
}
if (vtq.Quality == Domain.Quality.Bad_NotConnected ||
vtq.Quality == Domain.Quality.Bad_CommFailure)
{
return ProbeResult.TransportFailed("Probe returned " + vtq.Quality);
}
if (!vtq.Quality.IsGood())
{
return ProbeResult.Degraded(vtq.Quality, vtq.Timestamp,
"Probe quality: " + vtq.Quality);
}
if (DateTime.UtcNow - vtq.Timestamp > TimeSpan.FromMinutes(5))
{
return ProbeResult.Degraded(vtq.Quality, vtq.Timestamp,
"Probe data stale (>" + 5 + "min)");
}
return ProbeResult.Healthy(vtq.Quality, vtq.Timestamp);
}
}
catch (System.Runtime.InteropServices.COMException ex)
{
return ProbeResult.TransportFailed("COM exception: " + ex.Message, ex);
}
catch (InvalidOperationException ex) when (ex.Message.Contains("Not connected"))
{
return ProbeResult.TransportFailed(ex.Message, ex);
}
catch (Exception ex)
{
return ProbeResult.TransportFailed("Probe failed: " + ex.Message, ex);
}
}
```
**Step 3: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs
git commit -m "feat: add ProbeConnectionAsync to MxAccessClient for active health probing"
```
---
## Task 2: Add health check configuration
**Files:**
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Configuration/LmxProxyConfiguration.cs` — add HealthCheckConfiguration class and property
**Step 1: Add HealthCheckConfiguration**
Add a new class in the Configuration namespace (can be in the same file or a new file — keep it simple, same file):
```csharp
/// <summary>Health check / probe configuration.</summary>
public class HealthCheckConfiguration
{
/// <summary>Tag address to probe for connection liveness. Default: TestChildObject.TestBool.</summary>
public string TestTagAddress { get; set; } = "TestChildObject.TestBool";
/// <summary>Probe timeout in milliseconds. Default: 5000.</summary>
public int ProbeTimeoutMs { get; set; } = 5000;
/// <summary>Consecutive transport failures before forced reconnect. Default: 3.</summary>
public int MaxConsecutiveTransportFailures { get; set; } = 3;
/// <summary>Probe interval while in degraded state (ms). Default: 30000 (30s).</summary>
public int DegradedProbeIntervalMs { get; set; } = 30000;
}
```
Add to `LmxProxyConfiguration`:
```csharp
/// <summary>Health check / active probe settings.</summary>
public HealthCheckConfiguration HealthCheck { get; set; } = new HealthCheckConfiguration();
```
**Step 2: Add to appsettings.json**
In the existing `appsettings.json`, add the `HealthCheck` section:
```json
"HealthCheck": {
"TestTagAddress": "TestChildObject.TestBool",
"ProbeTimeoutMs": 5000,
"MaxConsecutiveTransportFailures": 3,
"DegradedProbeIntervalMs": 30000
}
```
**Step 3: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/Configuration/LmxProxyConfiguration.cs
git add src/ZB.MOM.WW.LmxProxy.Host/appsettings.json
git commit -m "feat: add HealthCheck configuration section for active connection probing"
```
---
## Task 3: Rewrite `MonitorConnectionAsync` with active probing
**Files:**
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.cs` — add probe state fields
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — rewrite monitor loop
The monitor needs configuration passed in. The simplest approach: add constructor parameters for the probe settings alongside the existing ones.
**Step 1: Add probe fields to MxAccessClient.cs**
Add fields after the existing reconnect fields (around line 42):
```csharp
// Probe configuration
private readonly string? _probeTestTagAddress;
private readonly int _probeTimeoutMs;
private readonly int _maxConsecutiveTransportFailures;
private readonly int _degradedProbeIntervalMs;
// Probe state
private int _consecutiveTransportFailures;
private bool _isDegraded;
```
Add constructor parameters and assignments. After the existing `_galaxyName = galaxyName;` line:
```csharp
public MxAccessClient(
int maxConcurrentOperations = 10,
int readTimeoutSeconds = 5,
int writeTimeoutSeconds = 5,
int monitorIntervalSeconds = 5,
bool autoReconnect = true,
string? nodeName = null,
string? galaxyName = null,
string? probeTestTagAddress = null,
int probeTimeoutMs = 5000,
int maxConsecutiveTransportFailures = 3,
int degradedProbeIntervalMs = 30000)
```
And in the body:
```csharp
_probeTestTagAddress = probeTestTagAddress;
_probeTimeoutMs = probeTimeoutMs;
_maxConsecutiveTransportFailures = maxConsecutiveTransportFailures;
_degradedProbeIntervalMs = degradedProbeIntervalMs;
```
**Step 2: Rewrite MonitorConnectionAsync in MxAccessClient.Connection.cs**
Replace the existing `MonitorConnectionAsync` (lines 177-213) with:
```csharp
/// <summary>
/// Auto-reconnect monitor loop with active health probing.
/// - If IsConnected is false: immediate reconnect (existing behavior).
/// - If IsConnected is true and probe configured: read test tag each interval.
/// - TransportFailure for N consecutive probes → forced disconnect + reconnect.
/// - DataDegraded → stay connected, back off probe interval, report degraded.
/// - Healthy → reset counters and resume normal interval.
/// </summary>
private async Task MonitorConnectionAsync(CancellationToken ct)
{
Log.Information("Connection monitor loop started (interval={IntervalMs}ms, probe={ProbeEnabled})",
_monitorIntervalMs, _probeTestTagAddress != null);
while (!ct.IsCancellationRequested)
{
var interval = _isDegraded ? _degradedProbeIntervalMs : _monitorIntervalMs;
try
{
await Task.Delay(interval, ct);
}
catch (OperationCanceledException)
{
break;
}
// ── Case 1: Already disconnected ──
if (!IsConnected)
{
_isDegraded = false;
_consecutiveTransportFailures = 0;
await AttemptReconnectAsync(ct);
continue;
}
// ── Case 2: Connected, no probe configured — legacy behavior ──
if (_probeTestTagAddress == null)
continue;
// ── Case 3: Connected, probe configured — active health check ──
var probe = await ProbeConnectionAsync(_probeTestTagAddress, _probeTimeoutMs, ct);
switch (probe.Status)
{
case ProbeStatus.Healthy:
if (_isDegraded)
{
Log.Information("Probe healthy — exiting degraded mode");
_isDegraded = false;
}
_consecutiveTransportFailures = 0;
break;
case ProbeStatus.DataDegraded:
_consecutiveTransportFailures = 0;
if (!_isDegraded)
{
Log.Warning("Probe degraded: {Message} — entering degraded mode (probe interval {IntervalMs}ms)",
probe.Message, _degradedProbeIntervalMs);
_isDegraded = true;
}
break;
case ProbeStatus.TransportFailure:
_isDegraded = false;
_consecutiveTransportFailures++;
Log.Warning("Probe transport failure ({Count}/{Max}): {Message}",
_consecutiveTransportFailures, _maxConsecutiveTransportFailures, probe.Message);
if (_consecutiveTransportFailures >= _maxConsecutiveTransportFailures)
{
Log.Warning("Max consecutive transport failures reached — forcing reconnect");
_consecutiveTransportFailures = 0;
try
{
await DisconnectAsync(ct);
}
catch (Exception ex)
{
Log.Warning(ex, "Error during forced disconnect before reconnect");
// DisconnectAsync already calls CleanupComObjectsAsync on error path
}
await AttemptReconnectAsync(ct);
}
break;
}
}
Log.Information("Connection monitor loop exited");
}
private async Task AttemptReconnectAsync(CancellationToken ct)
{
Log.Information("Attempting reconnect...");
SetState(ConnectionState.Reconnecting);
try
{
await ConnectAsync(ct);
Log.Information("Reconnected to MxAccess successfully");
}
catch (OperationCanceledException)
{
// Let the outer loop handle cancellation
}
catch (Exception ex)
{
Log.Warning(ex, "Reconnect attempt failed, will retry at next interval");
}
}
```
**Step 3: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.cs
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs
git commit -m "feat: rewrite monitor loop with active probing, transport vs degraded classification"
```
---
## Task 4: Wire probe config through `LmxProxyService.Start()`
**Files:**
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs` — pass HealthCheck config to MxAccessClient constructor
**Step 1: Update MxAccessClient construction**
In `LmxProxyService.Start()`, update the MxAccessClient creation (around line 62) to pass the new parameters:
```csharp
_mxAccessClient = new MxAccessClient(
maxConcurrentOperations: _config.Connection.MaxConcurrentOperations,
readTimeoutSeconds: _config.Connection.ReadTimeoutSeconds,
writeTimeoutSeconds: _config.Connection.WriteTimeoutSeconds,
monitorIntervalSeconds: _config.Connection.MonitorIntervalSeconds,
autoReconnect: _config.Connection.AutoReconnect,
nodeName: _config.Connection.NodeName,
galaxyName: _config.Connection.GalaxyName,
probeTestTagAddress: _config.HealthCheck.TestTagAddress,
probeTimeoutMs: _config.HealthCheck.ProbeTimeoutMs,
maxConsecutiveTransportFailures: _config.HealthCheck.MaxConsecutiveTransportFailures,
degradedProbeIntervalMs: _config.HealthCheck.DegradedProbeIntervalMs);
```
**Step 2: Update DetailedHealthCheckService to use shared probe**
In `LmxProxyService.Start()`, update the DetailedHealthCheckService construction (around line 114) to pass the test tag address from config:
```csharp
_detailedHealthCheckService = new DetailedHealthCheckService(
_mxAccessClient, _config.HealthCheck.TestTagAddress);
```
**Step 3: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs
git commit -m "feat: wire HealthCheck config to MxAccessClient and DetailedHealthCheckService"
```
---
## Task 5: Add `UnsubscribeByAddressAsync` to `IScadaClient` and `MxAccessClient`
This is the foundation for removing handle-based unsubscribe from SubscriptionManager.
**Files:**
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs` — add `UnsubscribeByAddressAsync`
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Subscription.cs` — implement, change `UnsubscribeAsync` visibility
**Step 1: Add to IScadaClient**
After `SubscribeAsync`:
```csharp
/// <summary>
/// Unsubscribes specific tag addresses. Removes from stored subscriptions
/// and COM state. Safe to call after reconnect — uses current handle mappings.
/// </summary>
Task UnsubscribeByAddressAsync(IEnumerable<string> addresses);
```
**Step 2: Implement in MxAccessClient.Subscription.cs**
The existing `UnsubscribeAsync` (line 53) already does exactly this — it's just `internal`. Rename it or add a public wrapper:
```csharp
/// <summary>
/// Unsubscribes specific addresses by address name.
/// Removes from both COM state and stored subscriptions (no reconnect replay).
/// </summary>
public async Task UnsubscribeByAddressAsync(IEnumerable<string> addresses)
{
await UnsubscribeAsync(addresses);
}
```
This keeps the existing `internal UnsubscribeAsync` unchanged (it's still used by `SubscriptionHandle.DisposeAsync`).
**Step 3: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Subscription.cs
git commit -m "feat: add UnsubscribeByAddressAsync to IScadaClient for address-based unsubscribe"
```
---
## Task 6: Remove `_mxAccessHandles` from `SubscriptionManager`
**Files:**
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs`
**Step 1: Remove `_mxAccessHandles` field**
Delete line 34-35:
```csharp
// REMOVE:
private readonly ConcurrentDictionary<string, IAsyncDisposable> _mxAccessHandles
= new ConcurrentDictionary<string, IAsyncDisposable>(StringComparer.OrdinalIgnoreCase);
```
**Step 2: Rewrite `CreateMxAccessSubscriptionsAsync`**
The method no longer stores handles. It just calls `SubscribeAsync` to create the COM subscriptions. `MxAccessClient` stores them in `_storedSubscriptions` internally.
```csharp
private async Task CreateMxAccessSubscriptionsAsync(List<string> addresses)
{
try
{
await _scadaClient.SubscribeAsync(
addresses,
(address, vtq) => OnTagValueChanged(address, vtq));
}
catch (Exception ex)
{
Log.Error(ex, "Failed to create MxAccess subscriptions for {Count} tags", addresses.Count);
}
}
```
**Step 3: Rewrite unsubscribe logic in `UnsubscribeClient`**
Replace the handle disposal section (lines 198-212) with address-based unsubscribe:
```csharp
// Unsubscribe tags with no remaining clients via address-based API
if (tagsToDispose.Count > 0)
{
try
{
_scadaClient.UnsubscribeByAddressAsync(tagsToDispose).GetAwaiter().GetResult();
}
catch (Exception ex)
{
Log.Warning(ex, "Error unsubscribing {Count} tags from MxAccess", tagsToDispose.Count);
}
}
```
**Step 4: Verify build**
```bash
dotnet build src/ZB.MOM.WW.LmxProxy.Host
```
Expected: Build succeeds. No references to `_mxAccessHandles` remain.
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs
git commit -m "fix: remove _mxAccessHandles from SubscriptionManager, use address-based unsubscribe"
```
---
## Task 7: Wire `ConnectionStateChanged` for reconnect notification in `SubscriptionManager`
After reconnect, `RecreateStoredSubscriptionsAsync` recreates COM subscriptions, and `SubscriptionManager` continues to receive `OnTagValueChanged` callbacks because the callback references are preserved in `_storedSubscriptions`. However, we should notify subscribed clients that quality has been restored.
**Files:**
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs` — add `NotifyReconnection` method
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs` — wire Connected state to SubscriptionManager
**Step 1: Add `NotifyReconnection` to SubscriptionManager**
```csharp
/// <summary>
/// Logs reconnection for observability. Data flow resumes automatically
/// via MxAccessClient.RecreateStoredSubscriptionsAsync callbacks.
/// </summary>
public void NotifyReconnection()
{
Log.Information("MxAccess reconnected — subscriptions recreated, " +
"data flow will resume via OnDataChange callbacks " +
"({ClientCount} clients, {TagCount} tags)",
_clientSubscriptions.Count, _tagSubscriptions.Count);
}
```
**Step 2: Wire in LmxProxyService.Start()**
Extend the existing `ConnectionStateChanged` handler (around line 97):
```csharp
_mxAccessClient.ConnectionStateChanged += (sender, e) =>
{
if (e.CurrentState == Domain.ConnectionState.Disconnected ||
e.CurrentState == Domain.ConnectionState.Error)
{
_subscriptionManager.NotifyDisconnection();
}
else if (e.CurrentState == Domain.ConnectionState.Connected &&
e.PreviousState == Domain.ConnectionState.Reconnecting)
{
_subscriptionManager.NotifyReconnection();
}
};
```
**Step 3: Commit**
```bash
git add src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs
git add src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs
git commit -m "feat: wire reconnection notification to SubscriptionManager for observability"
```
---
## Task 8: Build, deploy to windev, test
**Files:**
- No code changes — build and deployment task.
**Step 1: Build the solution**
```bash
dotnet build ZB.MOM.WW.LmxProxy.slnx
```
Expected: Clean build, no errors.
**Step 2: Deploy to windev**
Follow existing deployment procedure per `docker/README.md` or manual copy to windev.
**Step 3: Manual test — Gap 1 (active probing)**
1. Start the v2 service on windev. Verify logs show: `Connection monitor loop started (interval=5000ms, probe=True)`.
2. Verify probe runs: logs should show no warnings while platform is healthy.
3. Kill aaBootstrap on windev. Within 15-20s (3 probe failures at 5s intervals), logs should show:
- `Probe transport failure (1/3): Probe returned Bad_CommFailure` (or similar)
- `Probe transport failure (2/3): ...`
- `Probe transport failure (3/3): ...`
- `Max consecutive transport failures reached — forcing reconnect`
- `Attempting reconnect...`
4. After platform restart (but objects still stopped): Logs should show `Probe degraded` and `entering degraded mode`, then probe backs off to 30s interval. No reconnect churn.
5. After objects restart via SMC: Logs should show `Probe healthy — exiting degraded mode`.
**Step 4: Manual test — Gap 2 (subscription cleanup)**
1. Connect a gRPC client, subscribe to tags.
2. Kill aaBootstrap → client receives `Bad_NotConnected` quality.
3. Restart platform + objects. Verify client starts receiving Good quality updates again (via `RecreateStoredSubscriptionsAsync`).
4. Disconnect the client. Verify logs show `Unsubscribed from N tags` (address-based) with no handle disposal errors.
---
## Design Rationale
### Why two failure modes in the probe?
| Failure Mode | Cause | Correct Response |
|---|---|---|
| **Transport failure** | COM object dead, platform process crashed, MxAccess unreachable | Force disconnect + reconnect |
| **Data degraded** | COM session alive, AVEVA objects stopped, all reads return Bad quality | Stay connected, report degraded, back off probes |
Reconnecting on DataDegraded would churn COM objects with no benefit — the platform objects are stopped regardless of connection state. Observed: 40+ minutes of Bad quality after aaBootstrap crash until manual SMC restart.
### Why remove `_mxAccessHandles`?
1. **Batch handle bug**: `CreateMxAccessSubscriptionsAsync` stored the same `IAsyncDisposable` handle for every address in a batch. Disposing any one address disposed the entire batch, silently removing unrelated subscriptions from `_storedSubscriptions`.
2. **Stale after reconnect**: `RecreateStoredSubscriptionsAsync` recreates COM subscriptions but doesn't produce new `SubscriptionManager` handles. Old handles point to disposed COM state.
3. **Ownership violation**: `MxAccessClient` already owns subscription lifecycle via `_storedSubscriptions` and `_addressToHandle`. Duplicating ownership in `SubscriptionManager._mxAccessHandles` is a leaky abstraction.
The fix: `SubscriptionManager` owns client routing and ref counts only. `MxAccessClient` owns COM subscription lifecycle. Unsubscribe is by address, not by opaque handle.
@@ -0,0 +1,15 @@
{
"planPath": "lmxproxy/docs/plans/2026-03-22-gap1-gap2-reconnect-subscriptions.md",
"tasks": [
{"id": 0, "subject": "Task 0: Add ProbeResult domain type", "status": "pending"},
{"id": 1, "subject": "Task 1: Add ProbeConnectionAsync to MxAccessClient", "status": "pending", "blockedBy": [0]},
{"id": 2, "subject": "Task 2: Add health check configuration", "status": "pending"},
{"id": 3, "subject": "Task 3: Rewrite MonitorConnectionAsync with active probing", "status": "pending", "blockedBy": [1, 2]},
{"id": 4, "subject": "Task 4: Wire probe config through LmxProxyService.Start()", "status": "pending", "blockedBy": [2, 3]},
{"id": 5, "subject": "Task 5: Add UnsubscribeByAddressAsync to IScadaClient", "status": "pending"},
{"id": 6, "subject": "Task 6: Remove _mxAccessHandles from SubscriptionManager", "status": "pending", "blockedBy": [5]},
{"id": 7, "subject": "Task 7: Wire ConnectionStateChanged for reconnect notification", "status": "pending", "blockedBy": [6]},
{"id": 8, "subject": "Task 8: Build, deploy to windev, test", "status": "pending", "blockedBy": [4, 7]}
],
"lastUpdated": "2026-03-22T00:00:00Z"
}
@@ -0,0 +1,185 @@
# LmxProxy Stale Session Subscription Leak Fix
## Problem
When a gRPC client disconnects abruptly, Grpc.Core (the C-core library used by the .NET Framework 4.8 server) does not reliably fire the `ServerCallContext.CancellationToken`. This means:
1. The `Subscribe` RPC in `ScadaGrpcService` blocks forever on `reader.WaitToReadAsync(context.CancellationToken)` (line 368)
2. The `finally` block with `_subscriptionManager.UnsubscribeClient(request.SessionId)` never runs
3. The `ct.Register(() => UnsubscribeClient(clientId))` in `SubscriptionManager.SubscribeAsync` also never fires (same token)
4. The old session's subscriptions leak in `SubscriptionManager._clientSubscriptions` and `_tagSubscriptions`
When the client reconnects with a new session ID, it creates duplicate subscriptions. Tags aren't cleaned up because they still have a ref-count from the leaked old session. Over time, client count grows and tag subscriptions accumulate.
The `SessionManager` does scavenge inactive sessions after 5 minutes, but it only removes the session from its own dictionary — it doesn't notify `SubscriptionManager` to clean up subscriptions.
## Fix
Bridge `SessionManager` scavenging to `SubscriptionManager` cleanup. When a session is scavenged due to inactivity, also call `SubscriptionManager.UnsubscribeClient()`.
### Step 1: Add cleanup callback to SessionManager
File: `src/ZB.MOM.WW.LmxProxy.Host/Sessions/SessionManager.cs`
Add a callback field and expose it:
```csharp
// Add after the _inactivityTimeout field (line 22)
private Action<string>? _onSessionScavenged;
/// <summary>
/// Register a callback invoked when a session is scavenged due to inactivity.
/// The callback receives the session ID.
/// </summary>
public void OnSessionScavenged(Action<string> callback)
{
_onSessionScavenged = callback;
}
```
Then in `ScavengeInactiveSessions`, invoke the callback for each scavenged session:
```csharp
// In ScavengeInactiveSessions (line 103-118), change the foreach to:
foreach (var kvp in expired)
{
if (_sessions.TryRemove(kvp.Key, out _))
{
Log.Information("Session {SessionId} scavenged (inactive since {LastActivity})",
kvp.Key, kvp.Value.LastActivity);
// Notify subscriber cleanup
try
{
_onSessionScavenged?.Invoke(kvp.Key);
}
catch (Exception ex)
{
Log.Warning(ex, "Error in session scavenge callback for {SessionId}", kvp.Key);
}
}
}
```
### Step 2: Wire up the callback in LmxProxyService
File: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs`
After both `SessionManager` and `SubscriptionManager` are created, register the callback:
```csharp
// Add after SubscriptionManager creation:
_sessionManager.OnSessionScavenged(sessionId =>
{
Log.Information("Cleaning up subscriptions for scavenged session {SessionId}", sessionId);
_subscriptionManager.UnsubscribeClient(sessionId);
});
```
Find where `_sessionManager` and `_subscriptionManager` are both initialized and add this line right after.
### Step 3: Also clean up on explicit Disconnect
This is already handled — `ScadaGrpcService.Disconnect()` (line 86) calls `_subscriptionManager.UnsubscribeClient(request.SessionId)` before terminating the session. No change needed.
### Step 4: Add proactive stream timeout (belt-and-suspenders)
The scavenger runs every 60 seconds with a 5-minute timeout. This means a leaked session could take up to 6 minutes to clean up. For faster detection, add a secondary timeout in the Subscribe RPC itself.
File: `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Services/ScadaGrpcService.cs`
In the `Subscribe` method, replace the simple `context.CancellationToken` with a combined token that also expires if the session becomes invalid:
```csharp
// Replace the Subscribe method (lines 353-390) with:
public override async Task Subscribe(
Scada.SubscribeRequest request,
IServerStreamWriter<Scada.VtqMessage> responseStream,
ServerCallContext context)
{
if (!_sessionManager.ValidateSession(request.SessionId))
{
throw new RpcException(new GrpcStatus(StatusCode.Unauthenticated, "Invalid session"));
}
var reader = await _subscriptionManager.SubscribeAsync(
request.SessionId, request.Tags, context.CancellationToken);
try
{
// Use a combined approach: check both the gRPC cancellation token AND
// periodic session validity. This works around Grpc.Core not reliably
// firing CancellationToken on client disconnect.
while (true)
{
// Wait for data with a timeout so we can periodically check session validity
using var timeoutCts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(
context.CancellationToken, timeoutCts.Token);
bool hasData;
try
{
hasData = await reader.WaitToReadAsync(linkedCts.Token);
}
catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested
&& !context.CancellationToken.IsCancellationRequested)
{
// Timeout expired, not a client disconnect — check if session is still valid
if (!_sessionManager.ValidateSession(request.SessionId))
{
Log.Information("Subscribe stream ending — session {SessionId} no longer valid",
request.SessionId);
break;
}
continue; // Session still valid, keep waiting
}
if (!hasData) break; // Channel completed
while (reader.TryRead(out var item))
{
var protoVtq = ConvertToProtoVtq(item.address, item.vtq);
await responseStream.WriteAsync(protoVtq);
}
}
}
catch (OperationCanceledException)
{
// Client disconnected -- normal
}
catch (Exception ex)
{
Log.Error(ex, "Subscribe stream error for session {SessionId}", request.SessionId);
throw new RpcException(new GrpcStatus(StatusCode.Internal, ex.Message));
}
finally
{
_subscriptionManager.UnsubscribeClient(request.SessionId);
}
}
```
This adds a 30-second periodic check: if no data arrives for 30 seconds, it checks whether the session is still valid. If the session was scavenged (client disconnected, 5-min timeout), the stream exits cleanly and runs the `finally` cleanup.
## Summary of Changes
| File | Change |
|------|--------|
| `Sessions/SessionManager.cs` | Add `_onSessionScavenged` callback, invoke during `ScavengeInactiveSessions` |
| `LmxProxyService.cs` | Wire `_sessionManager.OnSessionScavenged` to `_subscriptionManager.UnsubscribeClient` |
| `Grpc/Services/ScadaGrpcService.cs` | Add 30-second periodic session validity check in `Subscribe` loop |
## Testing
1. Start LmxProxy server
2. Connect a client and subscribe to tags
3. Kill the client process abruptly (not a clean disconnect)
4. Check status page — client count should still show the old session
5. Wait up to 5 minutes — session should be scavenged, subscription count should drop
6. Reconnect client — should get a clean new session, no duplicate subscriptions
7. Verify tag subscription counts match expected (no leaked refs)
## Optional: Reduce scavenge timeout for faster cleanup
In `LmxProxyService.cs` where `SessionManager` is constructed, consider reducing `inactivityTimeoutMinutes` from 5 to 2, since the Subscribe RPC now has its own 30-second validity check. The 5-minute timeout was the only cleanup path before; now it's belt-and-suspenders.
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,666 @@
# Phase 4: Host Health, Metrics & Status Server — Implementation Plan
**Date**: 2026-03-21
**Prerequisites**: Phase 3 complete and passing (gRPC server, Security, Configuration, Service Hosting all functional)
**Working Directory**: The lmxproxy repo is on windev at `C:\src\lmxproxy`
## Guardrails
1. **This is a v2 rebuild** — do not copy code from the v1 reference in `src-reference/`. Write fresh implementations guided by the design docs and the reference code's structure.
2. **Host targets .NET Framework 4.8, x86** — all code must use C# 9.0 language features maximum (`LangVersion` is `9.0` in the csproj). No file-scoped namespaces, no `required` keyword, no collection expressions in Host code.
3. **No new NuGet packages** — all required packages are already in the Host `.csproj` (`Microsoft.Extensions.Diagnostics.HealthChecks`, `Serilog`, `System.Threading.Channels`, `System.Text.Json` via framework).
4. **Namespace**: `ZB.MOM.WW.LmxProxy.Host` with sub-namespaces matching folder structure (e.g., `ZB.MOM.WW.LmxProxy.Host.Health`, `ZB.MOM.WW.LmxProxy.Host.Metrics`, `ZB.MOM.WW.LmxProxy.Host.Status`).
5. **All COM operations are on the STA thread** — health checks that read test tags must go through `MxAccessClient.ReadAsync()`, never directly touching COM objects.
6. **Build must pass after each step**: `dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86`
7. **Tests run on windev**: `dotnet test tests/ZB.MOM.WW.LmxProxy.Host.Tests --platform x86`
## Step 1: Create PerformanceMetrics
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs`
Create the `PerformanceMetrics` class in namespace `ZB.MOM.WW.LmxProxy.Host.Metrics`.
### 1.1 OperationMetrics (nested or separate class in same file)
```csharp
namespace ZB.MOM.WW.LmxProxy.Host.Metrics
{
public class OperationMetrics
{
private readonly List<double> _durations = new List<double>();
private readonly object _lock = new object();
private long _totalCount;
private long _successCount;
private double _totalMilliseconds;
private double _minMilliseconds = double.MaxValue;
private double _maxMilliseconds;
public void Record(TimeSpan duration, bool success) { ... }
public MetricsStatistics GetStatistics() { ... }
}
}
```
Implementation details:
- `Record(TimeSpan duration, bool success)`: Inside `lock (_lock)`, increment `_totalCount`, conditionally increment `_successCount`, add `duration.TotalMilliseconds` to `_durations` list, update `_totalMilliseconds`, `_minMilliseconds`, `_maxMilliseconds`. If `_durations.Count > 1000`, call `_durations.RemoveAt(0)` to maintain rolling buffer.
- `GetStatistics()`: Inside `lock (_lock)`, return early with empty `MetricsStatistics` if `_totalCount == 0`. Otherwise sort `_durations`, compute p95 index as `(int)Math.Ceiling(sortedDurations.Count * 0.95) - 1`, clamp to `Math.Max(0, p95Index)`.
### 1.2 MetricsStatistics
```csharp
public class MetricsStatistics
{
public long TotalCount { get; set; }
public long SuccessCount { get; set; }
public double SuccessRate { get; set; }
public double AverageMilliseconds { get; set; }
public double MinMilliseconds { get; set; }
public double MaxMilliseconds { get; set; }
public double Percentile95Milliseconds { get; set; }
}
```
### 1.3 ITimingScope interface and TimingScope implementation
```csharp
public interface ITimingScope : IDisposable
{
void SetSuccess(bool success);
}
```
`TimingScope` is a private nested class inside `PerformanceMetrics`:
- Constructor takes `PerformanceMetrics metrics, string operationName`, starts a `Stopwatch`.
- `SetSuccess(bool success)` stores the flag (default `true`).
- `Dispose()`: stops stopwatch, calls `_metrics.RecordOperation(_operationName, _stopwatch.Elapsed, _success)`. Guard against double-dispose with `_disposed` flag.
### 1.4 PerformanceMetrics class
```csharp
public class PerformanceMetrics : IDisposable
{
private static readonly ILogger Logger = Log.ForContext<PerformanceMetrics>();
private readonly ConcurrentDictionary<string, OperationMetrics> _metrics = new ConcurrentDictionary<string, OperationMetrics>();
private readonly Timer _reportingTimer;
private bool _disposed;
public PerformanceMetrics()
{
_reportingTimer = new Timer(ReportMetrics, null, TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(60));
}
public void RecordOperation(string operationName, TimeSpan duration, bool success = true) { ... }
public ITimingScope BeginOperation(string operationName) => new TimingScope(this, operationName);
public OperationMetrics? GetMetrics(string operationName) { ... }
public IReadOnlyDictionary<string, OperationMetrics> GetAllMetrics() { ... }
public Dictionary<string, MetricsStatistics> GetStatistics() { ... }
private void ReportMetrics(object? state) { ... } // Log each operation's stats at Information level
public void Dispose() { ... } // Dispose timer, call ReportMetrics one final time
}
```
`ReportMetrics` iterates `_metrics`, calls `GetStatistics()` on each, logs via Serilog structured logging with properties: `Operation`, `Count`, `SuccessRate`, `AverageMs`, `MinMs`, `MaxMs`, `P95Ms`.
### 1.5 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
```
## Step 2: Create HealthCheckService
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs`
Namespace: `ZB.MOM.WW.LmxProxy.Host.Health`
### 2.1 Basic HealthCheckService
```csharp
public class HealthCheckService : IHealthCheck
{
private static readonly ILogger Logger = Log.ForContext<HealthCheckService>();
private readonly IScadaClient _scadaClient;
private readonly SubscriptionManager _subscriptionManager;
private readonly PerformanceMetrics _performanceMetrics;
public HealthCheckService(
IScadaClient scadaClient,
SubscriptionManager subscriptionManager,
PerformanceMetrics performanceMetrics) { ... }
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default) { ... }
}
```
Dependencies imported:
- `ZB.MOM.WW.LmxProxy.Host.Domain` for `IScadaClient`, `ConnectionState`
- `ZB.MOM.WW.LmxProxy.Host.Services` for `SubscriptionManager` (if still in that namespace after Phase 2/3; adjust import to match actual location)
- `ZB.MOM.WW.LmxProxy.Host.Metrics` for `PerformanceMetrics`
- `Microsoft.Extensions.Diagnostics.HealthChecks` for `IHealthCheck`, `HealthCheckResult`, `HealthCheckContext`
`CheckHealthAsync` logic:
1. Create `Dictionary<string, object> data`.
2. Read `_scadaClient.IsConnected` and `_scadaClient.ConnectionState` into `data["scada_connected"]` and `data["scada_connection_state"]`.
3. Get subscription stats via `_subscriptionManager.GetSubscriptionStats()` — store `TotalClients`, `TotalTags` in data.
4. Iterate `_performanceMetrics.GetAllMetrics()` to compute `totalOperations` and `averageSuccessRate`.
5. Store `total_operations` and `average_success_rate` in data.
6. Decision tree:
- If `!isConnected``HealthCheckResult.Unhealthy("SCADA client is not connected", data: data)`
- If `averageSuccessRate < 0.5 && totalOperations > 100``HealthCheckResult.Degraded(...)`
- If `subscriptionStats.TotalClients > 100``HealthCheckResult.Degraded(...)`
- Otherwise → `HealthCheckResult.Healthy("LmxProxy is healthy", data)`
7. Wrap everything in try/catch — on exception return `Unhealthy` with exception details.
### 2.2 DetailedHealthCheckService
In the same file or a separate file `src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs`:
```csharp
public class DetailedHealthCheckService : IHealthCheck
{
private static readonly ILogger Logger = Log.ForContext<DetailedHealthCheckService>();
private readonly IScadaClient _scadaClient;
private readonly string _testTagAddress;
public DetailedHealthCheckService(IScadaClient scadaClient, string testTagAddress = "TestChildObject.TestBool") { ... }
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default) { ... }
}
```
`CheckHealthAsync` logic:
1. If `!_scadaClient.IsConnected` → return `Unhealthy`.
2. Try `Vtq vtq = await _scadaClient.ReadAsync(_testTagAddress, cancellationToken)`.
3. If `vtq.Quality != Quality.Good` → return `Degraded` with quality info.
4. If `DateTime.UtcNow - vtq.Timestamp > TimeSpan.FromMinutes(5)` → return `Degraded` (stale data).
5. Otherwise → `Healthy`.
6. Catch read exceptions → return `Degraded("Could not read test tag")`.
7. Catch all exceptions → return `Unhealthy`.
### 2.3 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
```
## Step 3: Create StatusReportService
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs`
Namespace: `ZB.MOM.WW.LmxProxy.Host.Status`
### 3.1 Data model classes
Define in the same file (or a separate `StatusModels.cs` in the same folder):
```csharp
public class StatusData
{
public DateTime Timestamp { get; set; }
public string ServiceName { get; set; } = "";
public string Version { get; set; } = "";
public ConnectionStatus Connection { get; set; } = new ConnectionStatus();
public SubscriptionStatus Subscriptions { get; set; } = new SubscriptionStatus();
public PerformanceStatus Performance { get; set; } = new PerformanceStatus();
public HealthInfo Health { get; set; } = new HealthInfo();
public HealthInfo? DetailedHealth { get; set; }
}
public class ConnectionStatus
{
public bool IsConnected { get; set; }
public string State { get; set; } = "";
public string NodeName { get; set; } = "";
public string GalaxyName { get; set; } = "";
}
public class SubscriptionStatus
{
public int TotalClients { get; set; }
public int TotalTags { get; set; }
public int ActiveSubscriptions { get; set; }
}
public class PerformanceStatus
{
public long TotalOperations { get; set; }
public double AverageSuccessRate { get; set; }
public Dictionary<string, OperationStatus> Operations { get; set; } = new Dictionary<string, OperationStatus>();
}
public class OperationStatus
{
public long TotalCount { get; set; }
public double SuccessRate { get; set; }
public double AverageMilliseconds { get; set; }
public double MinMilliseconds { get; set; }
public double MaxMilliseconds { get; set; }
public double Percentile95Milliseconds { get; set; }
}
public class HealthInfo
{
public string Status { get; set; } = "";
public string Description { get; set; } = "";
public Dictionary<string, string> Data { get; set; } = new Dictionary<string, string>();
}
```
### 3.2 StatusReportService
```csharp
public class StatusReportService
{
private static readonly ILogger Logger = Log.ForContext<StatusReportService>();
private readonly IScadaClient _scadaClient;
private readonly SubscriptionManager _subscriptionManager;
private readonly PerformanceMetrics _performanceMetrics;
private readonly HealthCheckService _healthCheckService;
private readonly DetailedHealthCheckService? _detailedHealthCheckService;
public StatusReportService(
IScadaClient scadaClient,
SubscriptionManager subscriptionManager,
PerformanceMetrics performanceMetrics,
HealthCheckService healthCheckService,
DetailedHealthCheckService? detailedHealthCheckService = null) { ... }
public async Task<string> GenerateHtmlReportAsync() { ... }
public async Task<string> GenerateJsonReportAsync() { ... }
public async Task<bool> IsHealthyAsync() { ... }
private async Task<StatusData> CollectStatusDataAsync() { ... }
private static string GenerateHtmlFromStatusData(StatusData statusData) { ... }
private static string GenerateErrorHtml(Exception ex) { ... }
}
```
`CollectStatusDataAsync`:
- Populate `StatusData.Timestamp = DateTime.UtcNow`, `ServiceName = "ZB.MOM.WW.LmxProxy.Host"`, `Version` from `Assembly.GetExecutingAssembly().GetName().Version`.
- Connection info from `_scadaClient.IsConnected`, `_scadaClient.ConnectionState`.
- Subscription stats from `_subscriptionManager.GetSubscriptionStats()`.
- Performance stats from `_performanceMetrics.GetStatistics()` — include P95 in the `OperationStatus`.
- Health from `_healthCheckService.CheckHealthAsync(new HealthCheckContext())`.
- Detailed health from `_detailedHealthCheckService?.CheckHealthAsync(new HealthCheckContext())` if not null.
`GenerateJsonReportAsync`:
- Use `System.Text.Json.JsonSerializer.Serialize(statusData, new JsonSerializerOptions { WriteIndented = true, PropertyNamingPolicy = JsonNamingPolicy.CamelCase })`.
`GenerateHtmlFromStatusData`:
- Use `StringBuilder` to generate self-contained HTML.
- Include inline CSS (Bootstrap-like grid, status cards with color-coded left borders).
- Color coding: green (#28a745) for Healthy/Connected, yellow (#ffc107) for Degraded, red (#dc3545) for Unhealthy/Disconnected.
- Operations table with columns: Operation, Count, Success Rate, Avg (ms), Min (ms), Max (ms), P95 (ms).
- `<meta http-equiv="refresh" content="30">` for auto-refresh.
- Last updated timestamp at the bottom.
`IsHealthyAsync`:
- Run basic health check, return `result.Status == HealthStatus.Healthy`.
### 3.3 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
```
## Step 4: Create StatusWebServer
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs`
Namespace: `ZB.MOM.WW.LmxProxy.Host.Status`
```csharp
public class StatusWebServer : IDisposable
{
private static readonly ILogger Logger = Log.ForContext<StatusWebServer>();
private readonly WebServerConfiguration _configuration;
private readonly StatusReportService _statusReportService;
private HttpListener? _httpListener;
private CancellationTokenSource? _cancellationTokenSource;
private Task? _listenerTask;
private bool _disposed;
public StatusWebServer(WebServerConfiguration configuration, StatusReportService statusReportService) { ... }
public bool Start() { ... }
public bool Stop() { ... }
public void Dispose() { ... }
private async Task HandleRequestsAsync(CancellationToken cancellationToken) { ... }
private async Task HandleRequestAsync(HttpListenerContext context) { ... }
private async Task HandleStatusPageAsync(HttpListenerResponse response) { ... }
private async Task HandleStatusApiAsync(HttpListenerResponse response) { ... }
private async Task HandleHealthApiAsync(HttpListenerResponse response) { ... }
private static async Task WriteResponseAsync(HttpListenerResponse response, string content, string contentType) { ... }
}
```
### 4.1 Start()
1. If `!_configuration.Enabled`, log info and return `true`.
2. Create `HttpListener`, add prefix `_configuration.Prefix ?? $"http://+:{_configuration.Port}/"` (ensure trailing `/`).
3. Call `_httpListener.Start()`.
4. Create `_cancellationTokenSource = new CancellationTokenSource()`.
5. Start `_listenerTask = Task.Run(() => HandleRequestsAsync(_cancellationTokenSource.Token))`.
6. On exception, log error and return `false`.
### 4.2 Stop()
1. If not enabled or listener is null, return `true`.
2. Cancel `_cancellationTokenSource`.
3. Wait for `_listenerTask` with 5-second timeout.
4. Stop and close `_httpListener`.
### 4.3 HandleRequestsAsync
- Loop while not cancelled and listener is listening.
- `await _httpListener.GetContextAsync()` — on success, spawn `Task.Run` to handle.
- Catch `ObjectDisposedException` and `HttpListenerException(995)` as expected shutdown signals.
- On other errors, log and delay 1 second before continuing.
### 4.4 HandleRequestAsync routing
| Path (lowered) | Handler |
|---|---|
| `/` | `HandleStatusPageAsync` — calls `_statusReportService.GenerateHtmlReportAsync()`, content type `text/html; charset=utf-8` |
| `/api/status` | `HandleStatusApiAsync` — calls `_statusReportService.GenerateJsonReportAsync()`, content type `application/json; charset=utf-8` |
| `/api/health` | `HandleHealthApiAsync` — calls `_statusReportService.IsHealthyAsync()`, returns `"OK"` (200) or `"UNHEALTHY"` (503) as `text/plain` |
| Non-GET method | Return 405 Method Not Allowed |
| Unknown path | Return 404 Not Found |
| Exception | Return 500 Internal Server Error |
### 4.5 WriteResponseAsync
- Set `Content-Type`, add `Cache-Control: no-cache, no-store, must-revalidate`, `Pragma: no-cache`, `Expires: 0`.
- Convert content to UTF-8 bytes, set `ContentLength64`, write to `response.OutputStream`.
### 4.6 Dispose
- Guard with `_disposed` flag. Call `Stop()`. Dispose `_cancellationTokenSource` and close `_httpListener`.
### 4.7 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
```
## Step 5: Wire into LmxProxyService
**File**: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs`
This file already exists. Modify the `Start()` method to create and wire the new components. The v2 rebuild should create these fresh, but the wiring pattern follows the same order as the reference.
### 5.1 Add using directives
```csharp
using ZB.MOM.WW.LmxProxy.Host.Health;
using ZB.MOM.WW.LmxProxy.Host.Metrics;
using ZB.MOM.WW.LmxProxy.Host.Status;
```
### 5.2 Add fields
```csharp
private PerformanceMetrics? _performanceMetrics;
private HealthCheckService? _healthCheckService;
private DetailedHealthCheckService? _detailedHealthCheckService;
private StatusReportService? _statusReportService;
private StatusWebServer? _statusWebServer;
```
### 5.3 In Start(), after SessionManager and SubscriptionManager creation
```csharp
// Create performance metrics
_performanceMetrics = new PerformanceMetrics();
// Create health check services
_healthCheckService = new HealthCheckService(_scadaClient, _subscriptionManager, _performanceMetrics);
_detailedHealthCheckService = new DetailedHealthCheckService(_scadaClient);
// Create status report service
_statusReportService = new StatusReportService(
_scadaClient, _subscriptionManager, _performanceMetrics,
_healthCheckService, _detailedHealthCheckService);
// Start status web server
_statusWebServer = new StatusWebServer(_configuration.WebServer, _statusReportService);
if (!_statusWebServer.Start())
{
Logger.Warning("Status web server failed to start — continuing without it");
}
```
### 5.4 In Stop(), before gRPC server shutdown
```csharp
// Stop status web server
_statusWebServer?.Stop();
// Dispose performance metrics
_performanceMetrics?.Dispose();
```
### 5.5 Pass _performanceMetrics to ScadaGrpcService constructor
Ensure `ScadaGrpcService` receives `_performanceMetrics` so it can record timings on each RPC call. The gRPC service should call `_performanceMetrics.BeginOperation("Read")` (etc.) and dispose the timing scope at the end of each RPC handler.
### 5.6 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
```
## Step 6: Unit Tests
**Project**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/`
If this project does not exist yet, create it:
```bash
ssh windev "cd C:\src\lmxproxy && dotnet new xunit -n ZB.MOM.WW.LmxProxy.Host.Tests -o tests/ZB.MOM.WW.LmxProxy.Host.Tests --framework net48"
```
**Csproj adjustments** for `tests/ZB.MOM.WW.LmxProxy.Host.Tests/ZB.MOM.WW.LmxProxy.Host.Tests.csproj`:
- `<TargetFramework>net48</TargetFramework>`
- `<PlatformTarget>x86</PlatformTarget>`
- `<LangVersion>9.0</LangVersion>`
- Add `<ProjectReference Include="..\..\src\ZB.MOM.WW.LmxProxy.Host\ZB.MOM.WW.LmxProxy.Host.csproj" />`
- Add `<PackageReference Include="xunit" Version="2.9.3" />`
- Add `<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2" />`
- Add `<PackageReference Include="NSubstitute" Version="5.3.0" />` (for mocking IScadaClient)
- Add `<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />`
**Also add to solution** in `ZB.MOM.WW.LmxProxy.slnx`:
```xml
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.LmxProxy.Host.Tests/ZB.MOM.WW.LmxProxy.Host.Tests.csproj" />
</Folder>
```
### 6.1 PerformanceMetrics Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/Metrics/PerformanceMetricsTests.cs`
```csharp
namespace ZB.MOM.WW.LmxProxy.Host.Tests.Metrics
{
public class PerformanceMetricsTests
{
[Fact]
public void RecordOperation_TracksCountAndDuration()
// Record 5 operations, verify GetStatistics returns TotalCount=5
[Fact]
public void RecordOperation_TracksSuccessAndFailure()
// Record 3 success + 2 failure, verify SuccessRate == 0.6
[Fact]
public void GetStatistics_CalculatesP95Correctly()
// Record 100 operations with known durations (1ms through 100ms)
// Verify P95 is approximately 95ms
[Fact]
public void RollingBuffer_CapsAt1000Samples()
// Record 1500 operations, verify _durations list doesn't exceed 1000
// (test via GetStatistics behavior — TotalCount is 1500 but percentile computed from 1000)
[Fact]
public void BeginOperation_RecordsDurationOnDispose()
// Use BeginOperation, await Task.Delay(50), dispose scope
// Verify recorded duration >= 50ms
[Fact]
public void TimingScope_DefaultsToSuccess()
// BeginOperation + dispose without calling SetSuccess
// Verify SuccessCount == 1
[Fact]
public void TimingScope_RespectsSetSuccessFalse()
// BeginOperation, SetSuccess(false), dispose
// Verify SuccessCount == 0, TotalCount == 1
[Fact]
public void GetMetrics_ReturnsNullForUnknownOperation()
[Fact]
public void GetAllMetrics_ReturnsAllTrackedOperations()
}
}
```
### 6.2 HealthCheckService Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/Health/HealthCheckServiceTests.cs`
Use NSubstitute to mock `IScadaClient`. Create a real `PerformanceMetrics` instance and a real or mock `SubscriptionManager` (depends on Phase 2/3 implementation — if `SubscriptionManager` has an interface, mock it; if not, use the `GetSubscriptionStats()` approach with a concrete instance).
```csharp
namespace ZB.MOM.WW.LmxProxy.Host.Tests.Health
{
public class HealthCheckServiceTests
{
[Fact]
public async Task ReturnsHealthy_WhenConnectedAndNormalMetrics()
// Mock: IsConnected=true, ConnectionState=Connected
// SubscriptionStats: TotalClients=5, TotalTags=10
// PerformanceMetrics: record some successes
// Assert: HealthStatus.Healthy
[Fact]
public async Task ReturnsUnhealthy_WhenNotConnected()
// Mock: IsConnected=false
// Assert: HealthStatus.Unhealthy, description contains "not connected"
[Fact]
public async Task ReturnsDegraded_WhenSuccessRateBelow50Percent()
// Mock: IsConnected=true
// Record 200 operations with 40% success rate
// Assert: HealthStatus.Degraded
[Fact]
public async Task ReturnsDegraded_WhenClientCountOver100()
// Mock: IsConnected=true, SubscriptionStats.TotalClients=150
// Assert: HealthStatus.Degraded
[Fact]
public async Task DoesNotFlagLowSuccessRate_Under100Operations()
// Record 50 operations with 0% success rate
// Assert: still Healthy (threshold is > 100 total ops)
}
public class DetailedHealthCheckServiceTests
{
[Fact]
public async Task ReturnsUnhealthy_WhenNotConnected()
[Fact]
public async Task ReturnsHealthy_WhenTestTagGoodAndRecent()
// Mock ReadAsync returns Good quality with recent timestamp
// Assert: Healthy
[Fact]
public async Task ReturnsDegraded_WhenTestTagQualityNotGood()
// Mock ReadAsync returns Uncertain quality
// Assert: Degraded
[Fact]
public async Task ReturnsDegraded_WhenTestTagTimestampStale()
// Mock ReadAsync returns Good quality but timestamp 10 minutes ago
// Assert: Degraded
[Fact]
public async Task ReturnsDegraded_WhenTestTagReadThrows()
// Mock ReadAsync throws exception
// Assert: Degraded
}
}
```
### 6.3 StatusReportService Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/Status/StatusReportServiceTests.cs`
```csharp
namespace ZB.MOM.WW.LmxProxy.Host.Tests.Status
{
public class StatusReportServiceTests
{
[Fact]
public async Task GenerateJsonReportAsync_ReturnsCamelCaseJson()
// Verify JSON contains "serviceName", "connection", "isConnected" (camelCase)
[Fact]
public async Task GenerateHtmlReportAsync_ContainsAutoRefresh()
// Verify HTML contains <meta http-equiv="refresh" content="30">
[Fact]
public async Task IsHealthyAsync_ReturnsTrueWhenHealthy()
[Fact]
public async Task IsHealthyAsync_ReturnsFalseWhenUnhealthy()
[Fact]
public async Task GenerateJsonReportAsync_IncludesPerformanceMetrics()
// Record some operations, verify JSON includes operation names and stats
}
}
```
### 6.4 Run tests
```bash
ssh windev "cd C:\src\lmxproxy && dotnet test tests/ZB.MOM.WW.LmxProxy.Host.Tests --platform x86 --verbosity normal"
```
## Step 7: Build Verification
Run full solution build and tests:
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx && dotnet test --verbosity normal"
```
If the test project is .NET 4.8 x86, you may need:
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx --platform x86 && dotnet test tests/ZB.MOM.WW.LmxProxy.Host.Tests --platform x86"
```
## Completion Criteria
- [ ] `PerformanceMetrics` class with `OperationMetrics`, `MetricsStatistics`, `ITimingScope` in `src/ZB.MOM.WW.LmxProxy.Host/Metrics/`
- [ ] `HealthCheckService` and `DetailedHealthCheckService` in `src/ZB.MOM.WW.LmxProxy.Host/Health/`
- [ ] `StatusReportService` with data model classes in `src/ZB.MOM.WW.LmxProxy.Host/Status/`
- [ ] `StatusWebServer` with HTML dashboard, JSON status, and health endpoints in `src/ZB.MOM.WW.LmxProxy.Host/Status/`
- [ ] All components wired into `LmxProxyService.Start()` / `Stop()`
- [ ] `ScadaGrpcService` uses `PerformanceMetrics.BeginOperation()` for Read, ReadBatch, Write, WriteBatch RPCs
- [ ] Unit tests for PerformanceMetrics (recording, percentile, rolling buffer, timing scope)
- [ ] Unit tests for HealthCheckService (healthy, unhealthy, degraded transitions)
- [ ] Unit tests for DetailedHealthCheckService (connected, quality, staleness)
- [ ] Unit tests for StatusReportService (JSON format, HTML format, health aggregation)
- [ ] Solution builds without errors: `dotnet build ZB.MOM.WW.LmxProxy.slnx`
- [ ] All tests pass: `dotnet test`
@@ -0,0 +1,852 @@
# Phase 5: Client Core — Implementation Plan
**Date**: 2026-03-21
**Prerequisites**: Phase 1 complete and passing (Protocol & Domain Types — `ScadaContracts.cs` with v2 `TypedValue`/`QualityCode` messages, `Quality.cs`, `QualityExtensions.cs`, `Vtq.cs`, `ConnectionState.cs` all exist and cross-stack serialization tests pass)
**Working Directory**: The lmxproxy repo is on windev at `C:\src\lmxproxy`
## Guardrails
1. **Client targets .NET 10, AnyCPU** — use latest C# features freely. The csproj `<TargetFramework>` is `net10.0`, `<LangVersion>latest</LangVersion>`.
2. **Code-first gRPC only** — the Client uses `protobuf-net.Grpc` with `[ServiceContract]`/`[DataContract]` attributes. Never reference proto files or `Grpc.Tools`.
3. **No string serialization heuristics** — v2 uses native `TypedValue`. Do not write `double.TryParse`, `bool.TryParse`, or any string-to-value parsing on tag values.
4. **`status_code` is canonical for quality** — `symbolic_name` is derived. Never set `symbolic_name` independently.
5. **Polly v8 API** — the Client csproj already has `<PackageReference Include="Polly" Version="8.5.2" />`. Use the v8 `ResiliencePipeline` API, not the legacy v7 `IAsyncPolicy` API.
6. **No new NuGet packages** — all needed packages are already in `src/ZB.MOM.WW.LmxProxy.Client/ZB.MOM.WW.LmxProxy.Client.csproj`.
7. **Build command**: `dotnet build src/ZB.MOM.WW.LmxProxy.Client`
8. **Test command**: `dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests`
9. **Namespace root**: `ZB.MOM.WW.LmxProxy.Client`
## Step 1: ClientTlsConfiguration
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ClientTlsConfiguration.cs`
This file already exists with the correct shape. Verify it has all these properties (from Component-Client.md):
```csharp
namespace ZB.MOM.WW.LmxProxy.Client;
public class ClientTlsConfiguration
{
public bool UseTls { get; set; } = false;
public string? ClientCertificatePath { get; set; }
public string? ClientKeyPath { get; set; }
public string? ServerCaCertificatePath { get; set; }
public string? ServerNameOverride { get; set; }
public bool ValidateServerCertificate { get; set; } = true;
public bool AllowSelfSignedCertificates { get; set; } = false;
public bool IgnoreAllCertificateErrors { get; set; } = false;
}
```
If it matches, no changes needed. If any properties are missing, add them.
## Step 2: Security/GrpcChannelFactory
**File**: `src/ZB.MOM.WW.LmxProxy.Client/Security/GrpcChannelFactory.cs`
This file already exists. Verify the implementation covers:
1. `CreateChannel(Uri address, ClientTlsConfiguration? tlsConfiguration, ILogger logger)` — returns `GrpcChannel`.
2. Creates `SocketsHttpHandler` with `EnableMultipleHttp2Connections = true`.
3. For TLS: sets `SslProtocols = Tls12 | Tls13`, configures `ServerNameOverride` as `TargetHost`, loads client certificate from PEM files for mTLS.
4. Certificate validation callback handles: `IgnoreAllCertificateErrors`, `!ValidateServerCertificate`, custom CA trust store via `ServerCaCertificatePath`, `AllowSelfSignedCertificates`.
5. Static constructor sets `System.Net.Http.SocketsHttpHandler.Http2UnencryptedSupport = true` for non-TLS.
The existing implementation matches. No changes expected unless Phase 1 introduced breaking changes.
## Step 3: ILmxProxyClient Interface
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ILmxProxyClient.cs`
Rewrite for v2 protocol. The key changes from v1:
- `WriteAsync` and `WriteBatchAsync` accept `TypedValue` instead of `object`
- `SubscribeAsync` has an `onStreamError` callback parameter
- `CheckApiKeyAsync` is added
- Return types use v2 domain `Vtq` (which wraps `TypedValue` + `QualityCode`)
```csharp
using ZB.MOM.WW.LmxProxy.Client.Domain;
namespace ZB.MOM.WW.LmxProxy.Client;
/// <summary>
/// Interface for LmxProxy client operations.
/// </summary>
public interface ILmxProxyClient : IDisposable, IAsyncDisposable
{
/// <summary>Gets or sets the default timeout for operations (range: 1s to 10min).</summary>
TimeSpan DefaultTimeout { get; set; }
/// <summary>Connects to the LmxProxy service and establishes a session.</summary>
Task ConnectAsync(CancellationToken cancellationToken = default);
/// <summary>Disconnects from the LmxProxy service.</summary>
Task DisconnectAsync();
/// <summary>Returns true if the client has an active session.</summary>
Task<bool> IsConnectedAsync();
/// <summary>Reads a single tag value.</summary>
Task<Vtq> ReadAsync(string address, CancellationToken cancellationToken = default);
/// <summary>Reads multiple tag values in a single batch.</summary>
Task<IDictionary<string, Vtq>> ReadBatchAsync(IEnumerable<string> addresses, CancellationToken cancellationToken = default);
/// <summary>Writes a single tag value (native TypedValue — no string heuristics).</summary>
Task WriteAsync(string address, TypedValue value, CancellationToken cancellationToken = default);
/// <summary>Writes multiple tag values in a single batch.</summary>
Task WriteBatchAsync(IDictionary<string, TypedValue> values, CancellationToken cancellationToken = default);
/// <summary>
/// Writes a batch of values, then polls a flag tag until it matches or timeout expires.
/// Returns (writeResults, flagReached, elapsedMs).
/// </summary>
Task<WriteBatchAndWaitResponse> WriteBatchAndWaitAsync(
IDictionary<string, TypedValue> values,
string flagTag,
TypedValue flagValue,
int timeoutMs = 5000,
int pollIntervalMs = 100,
CancellationToken cancellationToken = default);
/// <summary>Subscribes to tag updates with value and error callbacks.</summary>
Task<ISubscription> SubscribeAsync(
IEnumerable<string> addresses,
Action<string, Vtq> onUpdate,
Action<Exception>? onStreamError = null,
CancellationToken cancellationToken = default);
/// <summary>Validates an API key and returns info.</summary>
Task<ApiKeyInfo> CheckApiKeyAsync(string apiKey, CancellationToken cancellationToken = default);
/// <summary>Returns a snapshot of client-side metrics.</summary>
Dictionary<string, object> GetMetrics();
}
```
**Note**: The `TypedValue` class referenced here is from `Domain/ScadaContracts.cs` — it should already have been updated in Phase 1 to use `[DataContract]` with the v2 oneof-style properties (e.g., `BoolValue`, `Int32Value`, `DoubleValue`, `StringValue`, `DatetimeValue`, etc., with a `ValueCase` enum or similar discriminator).
## Step 4: LmxProxyClient — Main File
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.cs`
This is a partial class. The main file contains the constructor, fields, properties, and the Read/Write/WriteBatch/WriteBatchAndWait/CheckApiKey methods.
### 4.1 Fields and Constructor
```csharp
public partial class LmxProxyClient : ILmxProxyClient
{
private readonly ILogger<LmxProxyClient> _logger;
private readonly string _host;
private readonly int _port;
private readonly string? _apiKey;
private readonly ClientTlsConfiguration? _tlsConfiguration;
private readonly ClientMetrics _metrics = new();
private readonly SemaphoreSlim _connectionLock = new(1, 1);
private readonly List<ISubscription> _activeSubscriptions = [];
private readonly Lock _subscriptionLock = new();
private GrpcChannel? _channel;
private IScadaService? _client;
private string _sessionId = string.Empty;
private bool _disposed;
private bool _isConnected;
private TimeSpan _defaultTimeout = TimeSpan.FromSeconds(30);
private ClientConfiguration? _configuration;
private ResiliencePipeline? _resiliencePipeline; // Polly v8
private Timer? _keepAliveTimer;
private readonly TimeSpan _keepAliveInterval = TimeSpan.FromSeconds(30);
// IsConnected computed property
public bool IsConnected => !_disposed && _isConnected && !string.IsNullOrEmpty(_sessionId);
public LmxProxyClient(
string host, int port, string? apiKey,
ClientTlsConfiguration? tlsConfiguration,
ILogger<LmxProxyClient>? logger = null)
{
_host = host ?? throw new ArgumentNullException(nameof(host));
_port = port;
_apiKey = apiKey;
_tlsConfiguration = tlsConfiguration;
_logger = logger ?? NullLogger<LmxProxyClient>.Instance;
}
internal void SetBuilderConfiguration(ClientConfiguration config)
{
_configuration = config;
// Build Polly v8 ResiliencePipeline from config
if (config.MaxRetryAttempts > 0)
{
_resiliencePipeline = new ResiliencePipelineBuilder()
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = config.MaxRetryAttempts,
Delay = config.RetryDelay,
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = new PredicateBuilder()
.Handle<RpcException>(ex =>
ex.StatusCode == StatusCode.Unavailable ||
ex.StatusCode == StatusCode.DeadlineExceeded ||
ex.StatusCode == StatusCode.ResourceExhausted ||
ex.StatusCode == StatusCode.Aborted),
OnRetry = args =>
{
_logger.LogWarning("Retry {Attempt} after {Delay} for {Exception}",
args.AttemptNumber, args.RetryDelay, args.Outcome.Exception?.Message);
return ValueTask.CompletedTask;
}
})
.Build();
}
}
}
```
### 4.2 ReadAsync
```csharp
public async Task<Vtq> ReadAsync(string address, CancellationToken cancellationToken = default)
{
EnsureConnected();
_metrics.IncrementOperationCount("Read");
var sw = Stopwatch.StartNew();
try
{
var request = new ReadRequest { SessionId = _sessionId, Tag = address };
ReadResponse response = await ExecuteWithRetry(
() => _client!.ReadAsync(request).AsTask(), cancellationToken);
if (!response.Success)
throw new InvalidOperationException($"Read failed: {response.Message}");
return ConvertVtqMessage(response.Vtq);
}
catch (Exception ex)
{
_metrics.IncrementErrorCount("Read");
throw;
}
finally
{
sw.Stop();
_metrics.RecordLatency("Read", sw.ElapsedMilliseconds);
}
}
```
### 4.3 ReadBatchAsync
```csharp
public async Task<IDictionary<string, Vtq>> ReadBatchAsync(
IEnumerable<string> addresses, CancellationToken cancellationToken = default)
{
EnsureConnected();
_metrics.IncrementOperationCount("ReadBatch");
var sw = Stopwatch.StartNew();
try
{
var request = new ReadBatchRequest { SessionId = _sessionId, Tags = addresses.ToList() };
ReadBatchResponse response = await ExecuteWithRetry(
() => _client!.ReadBatchAsync(request).AsTask(), cancellationToken);
var result = new Dictionary<string, Vtq>();
foreach (var vtqMsg in response.Vtqs)
{
result[vtqMsg.Tag] = ConvertVtqMessage(vtqMsg);
}
return result;
}
catch
{
_metrics.IncrementErrorCount("ReadBatch");
throw;
}
finally
{
sw.Stop();
_metrics.RecordLatency("ReadBatch", sw.ElapsedMilliseconds);
}
}
```
### 4.4 WriteAsync
```csharp
public async Task WriteAsync(string address, TypedValue value, CancellationToken cancellationToken = default)
{
EnsureConnected();
_metrics.IncrementOperationCount("Write");
var sw = Stopwatch.StartNew();
try
{
var request = new WriteRequest { SessionId = _sessionId, Tag = address, Value = value };
WriteResponse response = await ExecuteWithRetry(
() => _client!.WriteAsync(request).AsTask(), cancellationToken);
if (!response.Success)
throw new InvalidOperationException($"Write failed: {response.Message}");
}
catch
{
_metrics.IncrementErrorCount("Write");
throw;
}
finally
{
sw.Stop();
_metrics.RecordLatency("Write", sw.ElapsedMilliseconds);
}
}
```
### 4.5 WriteBatchAsync
```csharp
public async Task WriteBatchAsync(IDictionary<string, TypedValue> values, CancellationToken cancellationToken = default)
{
EnsureConnected();
_metrics.IncrementOperationCount("WriteBatch");
var sw = Stopwatch.StartNew();
try
{
var request = new WriteBatchRequest
{
SessionId = _sessionId,
Items = values.Select(kv => new WriteItem { Tag = kv.Key, Value = kv.Value }).ToList()
};
WriteBatchResponse response = await ExecuteWithRetry(
() => _client!.WriteBatchAsync(request).AsTask(), cancellationToken);
if (!response.Success)
throw new InvalidOperationException($"WriteBatch failed: {response.Message}");
}
catch
{
_metrics.IncrementErrorCount("WriteBatch");
throw;
}
finally
{
sw.Stop();
_metrics.RecordLatency("WriteBatch", sw.ElapsedMilliseconds);
}
}
```
### 4.6 WriteBatchAndWaitAsync
```csharp
public async Task<WriteBatchAndWaitResponse> WriteBatchAndWaitAsync(
IDictionary<string, TypedValue> values, string flagTag, TypedValue flagValue,
int timeoutMs = 5000, int pollIntervalMs = 100, CancellationToken cancellationToken = default)
{
EnsureConnected();
var request = new WriteBatchAndWaitRequest
{
SessionId = _sessionId,
Items = values.Select(kv => new WriteItem { Tag = kv.Key, Value = kv.Value }).ToList(),
FlagTag = flagTag,
FlagValue = flagValue,
TimeoutMs = timeoutMs,
PollIntervalMs = pollIntervalMs
};
return await ExecuteWithRetry(
() => _client!.WriteBatchAndWaitAsync(request).AsTask(), cancellationToken);
}
```
### 4.7 CheckApiKeyAsync
```csharp
public async Task<ApiKeyInfo> CheckApiKeyAsync(string apiKey, CancellationToken cancellationToken = default)
{
EnsureConnected();
var request = new CheckApiKeyRequest { ApiKey = apiKey };
CheckApiKeyResponse response = await _client!.CheckApiKeyAsync(request);
return new ApiKeyInfo { IsValid = response.IsValid, Description = response.Message };
}
```
### 4.8 ConvertVtqMessage helper
This converts the wire `VtqMessage` (v2 with `TypedValue` + `QualityCode`) to the domain `Vtq`:
```csharp
private static Vtq ConvertVtqMessage(VtqMessage? msg)
{
if (msg is null)
return new Vtq(null, DateTime.UtcNow, Quality.Bad);
object? value = ExtractTypedValue(msg.Value);
DateTime timestamp = msg.TimestampUtcTicks > 0
? new DateTime(msg.TimestampUtcTicks, DateTimeKind.Utc)
: DateTime.UtcNow;
Quality quality = QualityExtensions.FromStatusCode(msg.Quality?.StatusCode ?? 0x80000000u);
return new Vtq(value, timestamp, quality);
}
private static object? ExtractTypedValue(TypedValue? tv)
{
if (tv is null) return null;
// Switch on whichever oneof-style property is set
// The exact property names depend on the Phase 1 code-first contract design
// e.g., tv.BoolValue, tv.Int32Value, tv.DoubleValue, tv.StringValue, etc.
// Return the native .NET value directly — no string conversions
...
}
```
**Important**: The exact shape of `TypedValue` in code-first contracts depends on Phase 1's implementation. Phase 1 should have defined a discriminator pattern (e.g., `ValueCase` enum or nullable properties with a convention). Adapt `ExtractTypedValue` to whatever pattern was chosen. The key rule: **no string heuristics**.
### 4.9 ExecuteWithRetry helper
```csharp
private async Task<T> ExecuteWithRetry<T>(Func<Task<T>> operation, CancellationToken ct)
{
if (_resiliencePipeline is not null)
{
return await _resiliencePipeline.ExecuteAsync(
async token => await operation(), ct);
}
return await operation();
}
```
### 4.10 EnsureConnected, Dispose, DisposeAsync
```csharp
private void EnsureConnected()
{
ObjectDisposedException.ThrowIf(_disposed, this);
if (!IsConnected)
throw new InvalidOperationException("Client is not connected. Call ConnectAsync first.");
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
_keepAliveTimer?.Dispose();
_channel?.Dispose();
_connectionLock.Dispose();
}
public async ValueTask DisposeAsync()
{
if (_disposed) return;
try { await DisconnectAsync(); } catch { /* swallow */ }
Dispose();
}
```
### 4.11 IsConnectedAsync
```csharp
public Task<bool> IsConnectedAsync() => Task.FromResult(IsConnected);
```
### 4.12 GetMetrics
```csharp
public Dictionary<string, object> GetMetrics() => _metrics.GetSnapshot();
```
### 4.13 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 5: LmxProxyClient.Connection
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.Connection.cs`
Partial class containing `ConnectAsync`, `DisconnectAsync`, keep-alive, `MarkDisconnectedAsync`, `BuildEndpointUri`.
### 5.1 ConnectAsync
1. Acquire `_connectionLock`.
2. Throw `ObjectDisposedException` if disposed.
3. Return early if already connected.
4. Build endpoint URI via `BuildEndpointUri()`.
5. Create channel: `GrpcChannelFactory.CreateChannel(endpoint, _tlsConfiguration, _logger)`.
6. Create code-first client: `channel.CreateGrpcService<IScadaService>()` (from `ProtoBuf.Grpc.Client`).
7. Send `ConnectRequest` with `ClientId = $"ScadaBridge-{Guid.NewGuid():N}"` and `ApiKey = _apiKey ?? string.Empty`.
8. If `!response.Success`, dispose channel and throw.
9. Store channel, client, sessionId. Set `_isConnected = true`.
10. Call `StartKeepAlive()`.
11. On failure, reset all state and rethrow.
12. Release lock in `finally`.
### 5.2 DisconnectAsync
1. Acquire `_connectionLock`.
2. Stop keep-alive.
3. If client and session exist, send `DisconnectRequest`. Swallow exceptions.
4. Clear client, sessionId, isConnected. Dispose channel.
5. Release lock.
### 5.3 Keep-alive timer
- `StartKeepAlive()`: creates `Timer` with `_keepAliveInterval` (30s) interval.
- Timer callback: sends `GetConnectionStateRequest`. On failure: stops timer, calls `MarkDisconnectedAsync(ex)`.
- `StopKeepAlive()`: disposes timer, nulls it.
### 5.4 MarkDisconnectedAsync
1. If disposed, return.
2. Acquire `_connectionLock`, set `_isConnected = false`, clear client/sessionId, dispose channel. Release lock.
3. Copy and clear `_activeSubscriptions` under `_subscriptionLock`.
4. Dispose each subscription (swallow errors).
5. Log warning with the exception.
### 5.5 BuildEndpointUri
```csharp
private Uri BuildEndpointUri()
{
string scheme = _tlsConfiguration?.UseTls == true ? Uri.UriSchemeHttps : Uri.UriSchemeHttp;
return new UriBuilder { Scheme = scheme, Host = _host, Port = _port }.Uri;
}
```
### 5.6 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 6: LmxProxyClient.CodeFirstSubscription
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.CodeFirstSubscription.cs`
Nested class inside `LmxProxyClient` implementing `ISubscription`.
### 6.1 CodeFirstSubscription class
```csharp
private class CodeFirstSubscription : ISubscription
{
private readonly IScadaService _client;
private readonly string _sessionId;
private readonly List<string> _tags;
private readonly Action<string, Vtq> _onUpdate;
private readonly Action<Exception>? _onStreamError;
private readonly ILogger<LmxProxyClient> _logger;
private readonly Action<ISubscription>? _onDispose;
private readonly CancellationTokenSource _cts = new();
private Task? _processingTask;
private bool _disposed;
private bool _streamErrorFired;
```
Constructor takes all of these. `StartAsync` stores `_processingTask = ProcessUpdatesAsync(cancellationToken)`.
### 6.2 ProcessUpdatesAsync
```csharp
private async Task ProcessUpdatesAsync(CancellationToken cancellationToken)
{
try
{
var request = new SubscribeRequest
{
SessionId = _sessionId,
Tags = _tags,
SamplingMs = 1000
};
using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, _cts.Token);
await foreach (VtqMessage vtqMsg in _client.SubscribeAsync(request, linkedCts.Token))
{
try
{
Vtq vtq = ConvertVtqMessage(vtqMsg); // static method from outer class
_onUpdate(vtqMsg.Tag, vtq);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error processing subscription update for {Tag}", vtqMsg.Tag);
}
}
}
catch (OperationCanceledException) when (_cts.IsCancellationRequested || cancellationToken.IsCancellationRequested)
{
_logger.LogDebug("Subscription cancelled");
}
catch (Exception ex)
{
_logger.LogError(ex, "Error in subscription processing");
FireStreamError(ex);
}
finally
{
if (!_disposed)
{
_disposed = true;
_onDispose?.Invoke(this);
}
}
}
private void FireStreamError(Exception ex)
{
if (_streamErrorFired) return;
_streamErrorFired = true;
try { _onStreamError?.Invoke(ex); }
catch (Exception cbEx) { _logger.LogWarning(cbEx, "onStreamError callback threw"); }
}
```
**Key difference from v1**: The `ConvertVtqMessage` now handles `TypedValue` + `QualityCode` natively instead of parsing strings. Also, `_onStreamError` callback is invoked exactly once on stream termination (per Component-Client.md section 5.1).
### 6.3 DisposeAsync and Dispose
`DisposeAsync()`: Cancel CTS, await `_processingTask` (swallow errors), dispose CTS. 5-second timeout guard.
`Dispose()`: Calls `DisposeAsync()` synchronously with `Task.Wait(TimeSpan.FromSeconds(5))`.
### 6.4 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 7: LmxProxyClient.ClientMetrics
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.ClientMetrics.cs`
Internal class. Already exists in v1 reference. Rewrite for v2 with p99 support.
```csharp
internal class ClientMetrics
{
private readonly ConcurrentDictionary<string, long> _operationCounts = new();
private readonly ConcurrentDictionary<string, long> _errorCounts = new();
private readonly ConcurrentDictionary<string, List<long>> _latencies = new();
private readonly Lock _latencyLock = new();
public void IncrementOperationCount(string operation) { ... }
public void IncrementErrorCount(string operation) { ... }
public void RecordLatency(string operation, long milliseconds) { ... }
public Dictionary<string, object> GetSnapshot() { ... }
}
```
`RecordLatency`: Under `_latencyLock`, add to list. If count > 1000, `RemoveAt(0)`.
`GetSnapshot`: Returns dictionary with keys `{op}_count`, `{op}_errors`, `{op}_avg_latency_ms`, `{op}_p95_latency_ms`, `{op}_p99_latency_ms`.
`GetPercentile(List<long> values, int percentile)`: Sort, compute index as `(int)Math.Ceiling(percentile / 100.0 * sorted.Count) - 1`, clamp with `Math.Max(0, ...)`.
## Step 8: LmxProxyClient.ApiKeyInfo
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.ApiKeyInfo.cs`
Simple DTO returned by `CheckApiKeyAsync`:
```csharp
namespace ZB.MOM.WW.LmxProxy.Client;
public partial class LmxProxyClient
{
/// <summary>
/// Result of an API key validation check.
/// </summary>
public class ApiKeyInfo
{
public bool IsValid { get; init; }
public string? Role { get; init; }
public string? Description { get; init; }
}
}
```
## Step 9: LmxProxyClient.ISubscription
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.ISubscription.cs`
```csharp
namespace ZB.MOM.WW.LmxProxy.Client;
public partial class LmxProxyClient
{
/// <summary>
/// Represents an active tag subscription. Dispose to unsubscribe.
/// </summary>
public interface ISubscription : IDisposable
{
/// <summary>Asynchronous disposal with cancellation support.</summary>
Task DisposeAsync();
}
}
```
## Step 10: Unit Tests
**Project**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/`
Create if not exists:
```bash
ssh windev "cd C:\src\lmxproxy && dotnet new xunit -n ZB.MOM.WW.LmxProxy.Client.Tests -o tests/ZB.MOM.WW.LmxProxy.Client.Tests --framework net10.0"
```
**Csproj** for `tests/ZB.MOM.WW.LmxProxy.Client.Tests/ZB.MOM.WW.LmxProxy.Client.Tests.csproj`:
- `<TargetFramework>net10.0</TargetFramework>`
- `<ProjectReference Include="..\..\src\ZB.MOM.WW.LmxProxy.Client\ZB.MOM.WW.LmxProxy.Client.csproj" />`
- `<PackageReference Include="xunit" Version="2.9.3" />`
- `<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2" />`
- `<PackageReference Include="NSubstitute" Version="5.3.0" />`
- `<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />`
**Add to solution** `ZB.MOM.WW.LmxProxy.slnx`:
```xml
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.LmxProxy.Client.Tests/ZB.MOM.WW.LmxProxy.Client.Tests.csproj" />
</Folder>
```
### 10.1 Connection Lifecycle Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientConnectionTests.cs`
Mock `IScadaService` using NSubstitute.
```csharp
public class LmxProxyClientConnectionTests
{
[Fact]
public async Task ConnectAsync_EstablishesSessionAndStartsKeepAlive()
[Fact]
public async Task ConnectAsync_ThrowsWhenServerReturnsFailure()
[Fact]
public async Task DisconnectAsync_SendsDisconnectAndClearsState()
[Fact]
public async Task IsConnectedAsync_ReturnsFalseBeforeConnect()
[Fact]
public async Task IsConnectedAsync_ReturnsTrueAfterConnect()
[Fact]
public async Task KeepAliveFailure_MarksDisconnected()
}
```
Note: Testing the keep-alive requires either waiting 30s (too slow) or making the interval configurable for tests. Consider passing the interval as an internal constructor parameter or using a test-only subclass. Alternatively, test `MarkDisconnectedAsync` directly.
### 10.2 Read/Write Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientReadWriteTests.cs`
```csharp
public class LmxProxyClientReadWriteTests
{
[Fact]
public async Task ReadAsync_ReturnsVtqFromResponse()
// Mock ReadAsync to return a VtqMessage with TypedValue.DoubleValue = 42.5
// Verify returned Vtq.Value is 42.5 (double)
[Fact]
public async Task ReadAsync_ThrowsOnFailureResponse()
[Fact]
public async Task ReadBatchAsync_ReturnsDictionaryOfVtqs()
[Fact]
public async Task WriteAsync_SendsTypedValueDirectly()
// Verify the WriteRequest.Value is the TypedValue passed in, not a string
[Fact]
public async Task WriteBatchAsync_SendsAllItems()
[Fact]
public async Task WriteBatchAndWaitAsync_ReturnsResponse()
}
```
### 10.3 Subscription Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientSubscriptionTests.cs`
```csharp
public class LmxProxyClientSubscriptionTests
{
[Fact]
public async Task SubscribeAsync_InvokesCallbackForEachUpdate()
[Fact]
public async Task SubscribeAsync_InvokesStreamErrorOnFailure()
[Fact]
public async Task SubscribeAsync_DisposeStopsProcessing()
}
```
### 10.4 TypedValue Conversion Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/TypedValueConversionTests.cs`
```csharp
public class TypedValueConversionTests
{
[Fact] public void ConvertVtqMessage_ExtractsBoolValue()
[Fact] public void ConvertVtqMessage_ExtractsInt32Value()
[Fact] public void ConvertVtqMessage_ExtractsInt64Value()
[Fact] public void ConvertVtqMessage_ExtractsFloatValue()
[Fact] public void ConvertVtqMessage_ExtractsDoubleValue()
[Fact] public void ConvertVtqMessage_ExtractsStringValue()
[Fact] public void ConvertVtqMessage_ExtractsDateTimeValue()
[Fact] public void ConvertVtqMessage_HandlesNullTypedValue()
[Fact] public void ConvertVtqMessage_HandlesNullMessage()
[Fact] public void ConvertVtqMessage_MapsQualityCodeCorrectly()
[Fact] public void ConvertVtqMessage_GoodQualityCode()
[Fact] public void ConvertVtqMessage_BadQualityCode()
[Fact] public void ConvertVtqMessage_UncertainQualityCode()
}
```
### 10.5 Metrics Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/ClientMetricsTests.cs`
```csharp
public class ClientMetricsTests
{
[Fact] public void IncrementOperationCount_Increments()
[Fact] public void IncrementErrorCount_Increments()
[Fact] public void RecordLatency_StoresValues()
[Fact] public void RollingBuffer_CapsAt1000()
[Fact] public void GetSnapshot_IncludesP95AndP99()
}
```
### 10.6 Run tests
```bash
ssh windev "cd C:\src\lmxproxy && dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests --verbosity normal"
```
## Step 11: Build Verification
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx && dotnet test --verbosity normal"
```
## Completion Criteria
- [ ] `ILmxProxyClient` interface updated for v2 (TypedValue parameters, onStreamError callback, CheckApiKeyAsync)
- [ ] `LmxProxyClient.cs` — main file with Read/Write/WriteBatch/WriteBatchAndWait/CheckApiKey using v2 TypedValue
- [ ] `LmxProxyClient.Connection.cs` — ConnectAsync, DisconnectAsync, keep-alive (30s), MarkDisconnectedAsync
- [ ] `LmxProxyClient.CodeFirstSubscription.cs` — IAsyncEnumerable processing, onStreamError callback, 5s dispose timeout
- [ ] `LmxProxyClient.ClientMetrics.cs` — per-op counts/errors/latency, 1000-sample buffer, p95/p99
- [ ] `LmxProxyClient.ApiKeyInfo.cs` — simple DTO
- [ ] `LmxProxyClient.ISubscription.cs` — IDisposable + DisposeAsync
- [ ] `ClientTlsConfiguration.cs` — all properties present
- [ ] `Security/GrpcChannelFactory.cs` — TLS 1.2/1.3, cert validation, custom CA, self-signed support
- [ ] No string serialization heuristics anywhere in Client code
- [ ] ConvertVtqMessage extracts native TypedValue without parsing
- [ ] Polly v8 ResiliencePipeline for retry (not v7 IAsyncPolicy)
- [ ] All unit tests pass
- [ ] Solution builds cleanly
@@ -0,0 +1,815 @@
# Phase 6: Client Extras — Implementation Plan
**Date**: 2026-03-21
**Prerequisites**: Phase 5 complete and passing (Client Core — `ILmxProxyClient`, `LmxProxyClient` partial classes, `ClientMetrics`, `ISubscription`, `ApiKeyInfo` all functional with unit tests passing)
**Working Directory**: The lmxproxy repo is on windev at `C:\src\lmxproxy`
## Guardrails
1. **Client targets .NET 10, AnyCPU** — latest C# features permitted.
2. **Polly v8 API**`ResiliencePipeline`, `ResiliencePipelineBuilder`, `RetryStrategyOptions`. Do NOT use Polly v7 `IAsyncPolicy`, `Policy.Handle<>().WaitAndRetryAsync(...)`.
3. **Builder default port is 50051** (per design doc section 11 — resolved conflict).
4. **No new NuGet packages**`Polly 8.5.2`, `Microsoft.Extensions.DependencyInjection.Abstractions 10.0.0`, `Microsoft.Extensions.Configuration.Abstractions 10.0.0`, `Microsoft.Extensions.Configuration.Binder 10.0.0`, `Microsoft.Extensions.Logging.Abstractions 10.0.0` are already in the csproj.
5. **Build command**: `dotnet build src/ZB.MOM.WW.LmxProxy.Client`
6. **Test command**: `dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests`
## Step 1: LmxProxyClientBuilder
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClientBuilder.cs`
Rewrite the builder for v2. Key changes from v1:
- Default port changes from `5050` to `50051`
- Retry uses Polly v8 `ResiliencePipeline` (built in `SetBuilderConfiguration`)
- `WithCorrelationIdHeader` support
### 1.1 Builder fields
```csharp
public class LmxProxyClientBuilder
{
private string? _host;
private int _port = 50051; // CHANGED from 5050
private string? _apiKey;
private ILogger<LmxProxyClient>? _logger;
private TimeSpan _defaultTimeout = TimeSpan.FromSeconds(30);
private int _maxRetryAttempts = 3;
private TimeSpan _retryDelay = TimeSpan.FromSeconds(1);
private bool _enableMetrics;
private string? _correlationIdHeader;
private ClientTlsConfiguration? _tlsConfiguration;
```
### 1.2 Fluent methods
Each method returns `this` for chaining. Validation at call site:
| Method | Default | Validation |
|---|---|---|
| `WithHost(string host)` | Required | `!string.IsNullOrWhiteSpace(host)` |
| `WithPort(int port)` | 50051 | 1-65535 |
| `WithApiKey(string? apiKey)` | null | none |
| `WithLogger(ILogger<LmxProxyClient> logger)` | NullLogger | `!= null` |
| `WithTimeout(TimeSpan timeout)` | 30s | `> TimeSpan.Zero && <= TimeSpan.FromMinutes(10)` |
| `WithSslCredentials(string? certificatePath)` | disabled | creates/updates `_tlsConfiguration` with `UseTls=true` |
| `WithTlsConfiguration(ClientTlsConfiguration config)` | null | `!= null` |
| `WithRetryPolicy(int maxAttempts, TimeSpan retryDelay)` | 3, 1s | `maxAttempts > 0`, `retryDelay > TimeSpan.Zero` |
| `WithMetrics()` | disabled | sets `_enableMetrics = true` |
| `WithCorrelationIdHeader(string headerName)` | null | `!string.IsNullOrEmpty` |
### 1.3 Build()
```csharp
public LmxProxyClient Build()
{
if (string.IsNullOrWhiteSpace(_host))
throw new InvalidOperationException("Host must be specified. Call WithHost() before Build().");
ValidateTlsConfiguration();
var client = new LmxProxyClient(_host, _port, _apiKey, _tlsConfiguration, _logger)
{
DefaultTimeout = _defaultTimeout
};
client.SetBuilderConfiguration(new ClientConfiguration
{
MaxRetryAttempts = _maxRetryAttempts,
RetryDelay = _retryDelay,
EnableMetrics = _enableMetrics,
CorrelationIdHeader = _correlationIdHeader
});
return client;
}
```
### 1.4 ValidateTlsConfiguration
If `_tlsConfiguration?.UseTls == true`:
- If `ServerCaCertificatePath` is set and file doesn't exist → throw `FileNotFoundException`.
- If `ClientCertificatePath` is set and file doesn't exist → throw `FileNotFoundException`.
- If `ClientKeyPath` is set and file doesn't exist → throw `FileNotFoundException`.
### 1.5 Polly v8 ResiliencePipeline setup (in LmxProxyClient.SetBuilderConfiguration)
This was defined in Step 4 of Phase 5. Verify it uses:
```csharp
using Polly;
using Polly.Retry;
using Grpc.Core;
_resiliencePipeline = new ResiliencePipelineBuilder()
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = config.MaxRetryAttempts,
Delay = config.RetryDelay,
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = new PredicateBuilder()
.Handle<RpcException>(ex =>
ex.StatusCode == StatusCode.Unavailable ||
ex.StatusCode == StatusCode.DeadlineExceeded ||
ex.StatusCode == StatusCode.ResourceExhausted ||
ex.StatusCode == StatusCode.Aborted),
OnRetry = args =>
{
_logger.LogWarning(
"Retry {Attempt}/{Max} after {Delay}ms — {Error}",
args.AttemptNumber, config.MaxRetryAttempts,
args.RetryDelay.TotalMilliseconds,
args.Outcome.Exception?.Message ?? "unknown");
return ValueTask.CompletedTask;
}
})
.Build();
```
Backoff sequence: `retryDelay * 2^(attempt-1)` → 1s, 2s, 4s for defaults.
### 1.6 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 2: ClientConfiguration
**File**: This is already defined in `LmxProxyClientBuilder.cs` (at the bottom of the file, as an `internal class`). Verify it contains:
```csharp
internal class ClientConfiguration
{
public int MaxRetryAttempts { get; set; }
public TimeSpan RetryDelay { get; set; }
public bool EnableMetrics { get; set; }
public string? CorrelationIdHeader { get; set; }
}
```
No changes needed if it matches.
## Step 3: ILmxProxyClientFactory + LmxProxyClientFactory
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ILmxProxyClientFactory.cs`
### 3.1 Interface
```csharp
namespace ZB.MOM.WW.LmxProxy.Client;
public interface ILmxProxyClientFactory
{
LmxProxyClient CreateClient();
LmxProxyClient CreateClient(string configName);
LmxProxyClient CreateClient(Action<LmxProxyClientBuilder> builderAction);
}
```
### 3.2 Implementation
```csharp
public class LmxProxyClientFactory : ILmxProxyClientFactory
{
private readonly IConfiguration _configuration;
public LmxProxyClientFactory(IConfiguration configuration)
{
_configuration = configuration ?? throw new ArgumentNullException(nameof(configuration));
}
public LmxProxyClient CreateClient() => CreateClient("LmxProxy");
public LmxProxyClient CreateClient(string configName)
{
IConfigurationSection section = _configuration.GetSection(configName);
var options = new LmxProxyClientOptions();
section.Bind(options);
return BuildFromOptions(options);
}
public LmxProxyClient CreateClient(Action<LmxProxyClientBuilder> builderAction)
{
var builder = new LmxProxyClientBuilder();
builderAction(builder);
return builder.Build();
}
private static LmxProxyClient BuildFromOptions(LmxProxyClientOptions options)
{
var builder = new LmxProxyClientBuilder()
.WithHost(options.Host)
.WithPort(options.Port)
.WithTimeout(options.Timeout)
.WithRetryPolicy(options.Retry.MaxAttempts, options.Retry.Delay);
if (!string.IsNullOrEmpty(options.ApiKey))
builder.WithApiKey(options.ApiKey);
if (options.EnableMetrics)
builder.WithMetrics();
if (!string.IsNullOrEmpty(options.CorrelationIdHeader))
builder.WithCorrelationIdHeader(options.CorrelationIdHeader);
if (options.UseSsl)
{
builder.WithTlsConfiguration(new ClientTlsConfiguration
{
UseTls = true,
ServerCaCertificatePath = options.CertificatePath
});
}
return builder.Build();
}
}
```
### 3.3 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 4: ServiceCollectionExtensions
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ServiceCollectionExtensions.cs`
### 4.1 Options classes
Define at the bottom of the file or in a separate `LmxProxyClientOptions.cs`:
```csharp
public class LmxProxyClientOptions
{
public string Host { get; set; } = "localhost";
public int Port { get; set; } = 50051; // CHANGED from 5050
public string? ApiKey { get; set; }
public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(30);
public bool UseSsl { get; set; }
public string? CertificatePath { get; set; }
public bool EnableMetrics { get; set; }
public string? CorrelationIdHeader { get; set; }
public RetryOptions Retry { get; set; } = new();
}
public class RetryOptions
{
public int MaxAttempts { get; set; } = 3;
public TimeSpan Delay { get; set; } = TimeSpan.FromSeconds(1);
}
```
### 4.2 Extension methods
```csharp
public static class ServiceCollectionExtensions
{
/// <summary>Registers a singleton ILmxProxyClient from the "LmxProxy" config section.</summary>
public static IServiceCollection AddLmxProxyClient(
this IServiceCollection services, IConfiguration configuration)
{
return services.AddLmxProxyClient(configuration, "LmxProxy");
}
/// <summary>Registers a singleton ILmxProxyClient from a named config section.</summary>
public static IServiceCollection AddLmxProxyClient(
this IServiceCollection services, IConfiguration configuration, string sectionName)
{
services.AddSingleton<ILmxProxyClientFactory>(
sp => new LmxProxyClientFactory(configuration));
services.AddSingleton<ILmxProxyClient>(sp =>
{
var factory = sp.GetRequiredService<ILmxProxyClientFactory>();
return factory.CreateClient(sectionName);
});
return services;
}
/// <summary>Registers a singleton ILmxProxyClient via builder action.</summary>
public static IServiceCollection AddLmxProxyClient(
this IServiceCollection services, Action<LmxProxyClientBuilder> configure)
{
services.AddSingleton<ILmxProxyClient>(sp =>
{
var builder = new LmxProxyClientBuilder();
configure(builder);
return builder.Build();
});
return services;
}
/// <summary>Registers a scoped ILmxProxyClient from the "LmxProxy" config section.</summary>
public static IServiceCollection AddScopedLmxProxyClient(
this IServiceCollection services, IConfiguration configuration)
{
services.AddSingleton<ILmxProxyClientFactory>(
sp => new LmxProxyClientFactory(configuration));
services.AddScoped<ILmxProxyClient>(sp =>
{
var factory = sp.GetRequiredService<ILmxProxyClientFactory>();
return factory.CreateClient();
});
return services;
}
/// <summary>Registers a keyed singleton ILmxProxyClient.</summary>
public static IServiceCollection AddNamedLmxProxyClient(
this IServiceCollection services, string name, Action<LmxProxyClientBuilder> configure)
{
services.AddKeyedSingleton<ILmxProxyClient>(name, (sp, key) =>
{
var builder = new LmxProxyClientBuilder();
configure(builder);
return builder.Build();
});
return services;
}
}
```
### 4.3 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 5: StreamingExtensions
**File**: `src/ZB.MOM.WW.LmxProxy.Client/StreamingExtensions.cs`
### 5.1 ReadStreamAsync
```csharp
public static class StreamingExtensions
{
/// <summary>
/// Reads multiple tags as an async stream in batches.
/// Retries up to 2 times per batch. Aborts after 3 consecutive batch errors.
/// </summary>
public static async IAsyncEnumerable<KeyValuePair<string, Vtq>> ReadStreamAsync(
this ILmxProxyClient client,
IEnumerable<string> addresses,
int batchSize = 100,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(client);
ArgumentNullException.ThrowIfNull(addresses);
if (batchSize <= 0)
throw new ArgumentOutOfRangeException(nameof(batchSize));
var batch = new List<string>(batchSize);
int consecutiveErrors = 0;
const int maxConsecutiveErrors = 3;
const int maxRetries = 2;
foreach (string address in addresses)
{
cancellationToken.ThrowIfCancellationRequested();
batch.Add(address);
if (batch.Count >= batchSize)
{
await foreach (var kvp in ReadBatchWithRetry(
client, batch, maxRetries, cancellationToken))
{
consecutiveErrors = 0;
yield return kvp;
}
// If we get here without yielding, it was an error
// (handled inside ReadBatchWithRetry)
batch.Clear();
}
}
// Process remaining
if (batch.Count > 0)
{
await foreach (var kvp in ReadBatchWithRetry(
client, batch, maxRetries, cancellationToken))
{
yield return kvp;
}
}
}
private static async IAsyncEnumerable<KeyValuePair<string, Vtq>> ReadBatchWithRetry(
ILmxProxyClient client,
List<string> batch,
int maxRetries,
[EnumeratorCancellation] CancellationToken ct)
{
int retries = 0;
while (retries <= maxRetries)
{
IDictionary<string, Vtq>? results = null;
try
{
results = await client.ReadBatchAsync(batch, ct);
}
catch when (retries < maxRetries)
{
retries++;
continue;
}
if (results is not null)
{
foreach (var kvp in results)
yield return kvp;
yield break;
}
retries++;
}
}
```
### 5.2 WriteStreamAsync
```csharp
/// <summary>
/// Writes values from an async enumerable in batches. Returns total count written.
/// </summary>
public static async Task<int> WriteStreamAsync(
this ILmxProxyClient client,
IAsyncEnumerable<KeyValuePair<string, TypedValue>> values,
int batchSize = 100,
CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(client);
ArgumentNullException.ThrowIfNull(values);
if (batchSize <= 0)
throw new ArgumentOutOfRangeException(nameof(batchSize));
var batch = new Dictionary<string, TypedValue>(batchSize);
int totalWritten = 0;
await foreach (var kvp in values.WithCancellation(cancellationToken))
{
batch[kvp.Key] = kvp.Value;
if (batch.Count >= batchSize)
{
await client.WriteBatchAsync(batch, cancellationToken);
totalWritten += batch.Count;
batch.Clear();
}
}
if (batch.Count > 0)
{
await client.WriteBatchAsync(batch, cancellationToken);
totalWritten += batch.Count;
}
return totalWritten;
}
```
### 5.3 ProcessInParallelAsync
```csharp
/// <summary>
/// Processes items in parallel with a configurable max concurrency (default 4).
/// </summary>
public static async Task ProcessInParallelAsync<T>(
this IAsyncEnumerable<T> source,
Func<T, CancellationToken, Task> processor,
int maxConcurrency = 4,
CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(source);
ArgumentNullException.ThrowIfNull(processor);
if (maxConcurrency <= 0)
throw new ArgumentOutOfRangeException(nameof(maxConcurrency));
using var semaphore = new SemaphoreSlim(maxConcurrency);
var tasks = new List<Task>();
await foreach (T item in source.WithCancellation(cancellationToken))
{
await semaphore.WaitAsync(cancellationToken);
tasks.Add(Task.Run(async () =>
{
try
{
await processor(item, cancellationToken);
}
finally
{
semaphore.Release();
}
}, cancellationToken));
}
await Task.WhenAll(tasks);
}
```
### 5.4 SubscribeStreamAsync
```csharp
/// <summary>
/// Wraps a callback-based subscription into an IAsyncEnumerable via System.Threading.Channels.
/// </summary>
public static async IAsyncEnumerable<(string Tag, Vtq Vtq)> SubscribeStreamAsync(
this ILmxProxyClient client,
IEnumerable<string> addresses,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(client);
ArgumentNullException.ThrowIfNull(addresses);
var channel = Channel.CreateBounded<(string, Vtq)>(
new BoundedChannelOptions(1000)
{
FullMode = BoundedChannelFullMode.DropOldest,
SingleReader = true,
SingleWriter = false
});
ISubscription? subscription = null;
try
{
subscription = await client.SubscribeAsync(
addresses,
(tag, vtq) =>
{
channel.Writer.TryWrite((tag, vtq));
},
ex =>
{
channel.Writer.TryComplete(ex);
},
cancellationToken);
await foreach (var item in channel.Reader.ReadAllAsync(cancellationToken))
{
yield return item;
}
}
finally
{
subscription?.Dispose();
channel.Writer.TryComplete();
}
}
}
```
### 5.5 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 6: Properties/AssemblyInfo.cs
**File**: `src/ZB.MOM.WW.LmxProxy.Client/Properties/AssemblyInfo.cs`
Create this file if it doesn't already exist:
```csharp
using System.Runtime.CompilerServices;
[assembly: InternalsVisibleTo("ZB.MOM.WW.LmxProxy.Client.Tests")]
```
This allows the test project to access `internal` types like `ClientMetrics` and `ClientConfiguration`.
### 6.1 Verify build
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
```
## Step 7: Unit Tests
Add tests to the existing `tests/ZB.MOM.WW.LmxProxy.Client.Tests/` project (created in Phase 5).
### 7.1 Builder Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientBuilderTests.cs`
```csharp
public class LmxProxyClientBuilderTests
{
[Fact]
public void Build_ThrowsWhenHostNotSet()
{
var builder = new LmxProxyClientBuilder();
Assert.Throws<InvalidOperationException>(() => builder.Build());
}
[Fact]
public void Build_DefaultPort_Is50051()
{
var client = new LmxProxyClientBuilder()
.WithHost("localhost")
.Build();
// Verify via reflection or by checking connection attempt URI
Assert.NotNull(client);
}
[Fact]
public void WithPort_ThrowsOnZero()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
new LmxProxyClientBuilder().WithPort(0));
}
[Fact]
public void WithPort_ThrowsOn65536()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
new LmxProxyClientBuilder().WithPort(65536));
}
[Fact]
public void WithTimeout_ThrowsOnNegative()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
new LmxProxyClientBuilder().WithTimeout(TimeSpan.FromSeconds(-1)));
}
[Fact]
public void WithTimeout_ThrowsOver10Minutes()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
new LmxProxyClientBuilder().WithTimeout(TimeSpan.FromMinutes(11)));
}
[Fact]
public void WithRetryPolicy_ThrowsOnZeroAttempts()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
new LmxProxyClientBuilder().WithRetryPolicy(0, TimeSpan.FromSeconds(1)));
}
[Fact]
public void WithRetryPolicy_ThrowsOnZeroDelay()
{
Assert.Throws<ArgumentOutOfRangeException>(() =>
new LmxProxyClientBuilder().WithRetryPolicy(3, TimeSpan.Zero));
}
[Fact]
public void Build_WithAllOptions_Succeeds()
{
var client = new LmxProxyClientBuilder()
.WithHost("10.100.0.48")
.WithPort(50051)
.WithApiKey("test-key")
.WithTimeout(TimeSpan.FromSeconds(15))
.WithRetryPolicy(5, TimeSpan.FromSeconds(2))
.WithMetrics()
.WithCorrelationIdHeader("X-Correlation-ID")
.Build();
Assert.NotNull(client);
}
[Fact]
public void Build_WithTls_ValidatesCertificatePaths()
{
var builder = new LmxProxyClientBuilder()
.WithHost("localhost")
.WithTlsConfiguration(new ClientTlsConfiguration
{
UseTls = true,
ServerCaCertificatePath = "/nonexistent/cert.pem"
});
Assert.Throws<FileNotFoundException>(() => builder.Build());
}
[Fact]
public void WithHost_ThrowsOnNull()
{
Assert.Throws<ArgumentException>(() =>
new LmxProxyClientBuilder().WithHost(null!));
}
[Fact]
public void WithHost_ThrowsOnEmpty()
{
Assert.Throws<ArgumentException>(() =>
new LmxProxyClientBuilder().WithHost(""));
}
}
```
### 7.2 Factory Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientFactoryTests.cs`
```csharp
public class LmxProxyClientFactoryTests
{
[Fact]
public void CreateClient_BindsFromConfiguration()
{
var config = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["LmxProxy:Host"] = "10.100.0.48",
["LmxProxy:Port"] = "50052",
["LmxProxy:ApiKey"] = "test-key",
["LmxProxy:Retry:MaxAttempts"] = "5",
["LmxProxy:Retry:Delay"] = "00:00:02",
})
.Build();
var factory = new LmxProxyClientFactory(config);
var client = factory.CreateClient();
Assert.NotNull(client);
}
[Fact]
public void CreateClient_NamedSection()
{
var config = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["MyProxy:Host"] = "10.100.0.48",
["MyProxy:Port"] = "50052",
})
.Build();
var factory = new LmxProxyClientFactory(config);
var client = factory.CreateClient("MyProxy");
Assert.NotNull(client);
}
[Fact]
public void CreateClient_BuilderAction()
{
var config = new ConfigurationBuilder().Build();
var factory = new LmxProxyClientFactory(config);
var client = factory.CreateClient(b => b.WithHost("localhost").WithPort(50051));
Assert.NotNull(client);
}
}
```
### 7.3 StreamingExtensions Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/StreamingExtensionsTests.cs`
```csharp
public class StreamingExtensionsTests
{
[Fact]
public async Task ReadStreamAsync_BatchesCorrectly()
// Create mock client, provide 250 addresses with batchSize=100
// Verify ReadBatchAsync called 3 times (100, 100, 50)
[Fact]
public async Task ReadStreamAsync_RetriesOnError()
// Mock first ReadBatchAsync to throw, second to succeed
// Verify results returned from second attempt
[Fact]
public async Task WriteStreamAsync_BatchesAndReturnsCount()
// Provide async enumerable of 250 items, batchSize=100
// Verify WriteBatchAsync called 3 times, total returned = 250
[Fact]
public async Task ProcessInParallelAsync_RespectsMaxConcurrency()
// Track concurrent count with SemaphoreSlim
// maxConcurrency=2, verify never exceeds 2 concurrent calls
[Fact]
public async Task SubscribeStreamAsync_YieldsFromChannel()
// Mock SubscribeAsync to invoke onUpdate callback with test values
// Verify IAsyncEnumerable yields matching items
}
```
### 7.4 Run all tests
```bash
ssh windev "cd C:\src\lmxproxy && dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests --verbosity normal"
```
## Step 8: Build Verification
Run full solution build and all tests:
```bash
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx && dotnet test --verbosity normal"
```
## Completion Criteria
- [ ] `LmxProxyClientBuilder` with default port 50051, Polly v8 wiring, all fluent methods, TLS validation
- [ ] `ClientConfiguration` internal record with retry, metrics, correlation header fields
- [ ] `ILmxProxyClientFactory` + `LmxProxyClientFactory` with 3 `CreateClient` overloads
- [ ] `ServiceCollectionExtensions` with `AddLmxProxyClient` (3 overloads), `AddScopedLmxProxyClient`, `AddNamedLmxProxyClient`
- [ ] `LmxProxyClientOptions` + `RetryOptions` configuration classes
- [ ] `StreamingExtensions` with `ReadStreamAsync` (batched, 2 retries, 3 consecutive error abort), `WriteStreamAsync` (batched), `ProcessInParallelAsync` (SemaphoreSlim, max 4), `SubscribeStreamAsync` (Channel-based IAsyncEnumerable)
- [ ] `Properties/AssemblyInfo.cs` with `InternalsVisibleTo` for test project
- [ ] Builder tests: validation, defaults, Polly pipeline wiring, TLS cert validation
- [ ] Factory tests: config binding from IConfiguration, named sections, builder action
- [ ] StreamingExtensions tests: batching, error recovery, parallel throttling, subscription streaming
- [ ] Solution builds cleanly
- [ ] All tests pass
@@ -0,0 +1,837 @@
# Phase 7: Integration Tests & Deployment — Implementation Plan
**Date**: 2026-03-21
**Prerequisites**: Phase 4 (Host complete) and Phase 6 (Client complete) both passing. All unit tests green.
**Working Directory (Mac)**: `/Users/dohertj2/Desktop/scadalink-design/lmxproxy`
**Working Directory (windev)**: `C:\src\lmxproxy`
**windev SSH**: `ssh windev` (alias configured in `~/.ssh/config`, passwordless ed25519, user `dohertj2`)
## Guardrails
1. **Never stop the v1 service until v2 is verified** — deploy v2 on alternate ports first.
2. **Take a Veeam backup before cutover** — provides rollback point.
3. **Integration tests run from Mac against windev** — they use `Grpc.Net.Client` which is cross-platform.
4. **All integration tests must pass before cutover**.
5. **API keys**: The existing `apikeys.json` on windev is the source of truth for valid keys. Read it to get test keys.
6. **Real MxAccess tags**: Use the `TestChildObject` tags on windev's AVEVA System Platform instance. Available tags cover all TypedValue cases:
- `TestChildObject.TestBool` (bool)
- `TestChildObject.TestInt` (int)
- `TestChildObject.TestFloat` (float)
- `TestChildObject.TestDouble` (double)
- `TestChildObject.TestString` (string)
- `TestChildObject.TestDateTime` (datetime)
- `TestChildObject.TestBoolArray[]` (bool array)
- `TestChildObject.TestDateTimeArray[]` (datetime array)
- `TestChildObject.TestDoubleArray[]` (double array)
- `TestChildObject.TestFloatArray[]` (float array)
- `TestChildObject.TestIntArray[]` (int array)
- `TestChildObject.TestStringArray[]` (string array)
## Step 1: Build Host on windev
### 1.1 Pull latest code
```bash
ssh windev "cd C:\src\lmxproxy && git pull"
```
If the repo doesn't exist on windev yet:
```bash
ssh windev "git clone https://gitea.dohertylan.com/dohertj2/lmxproxy.git C:\src\lmxproxy"
```
### 1.2 Publish Host binary
```bash
ssh windev "cd C:\src\lmxproxy && dotnet publish src/ZB.MOM.WW.LmxProxy.Host -c Release -r win-x86 --self-contained false -o C:\publish-v2\"
```
**Expected output**: `C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe` plus dependencies.
### 1.3 Create v2 appsettings.json
Create `C:\publish-v2\appsettings.json` configured for testing on alternate ports:
```bash
ssh windev "powershell -Command \"@'
{
\"GrpcPort\": 50052,
\"ApiKeyConfigFile\": \"apikeys.json\",
\"Connection\": {
\"MonitorIntervalSeconds\": 5,
\"ConnectionTimeoutSeconds\": 30,
\"ReadTimeoutSeconds\": 5,
\"WriteTimeoutSeconds\": 5,
\"MaxConcurrentOperations\": 10,
\"AutoReconnect\": true
},
\"Subscription\": {
\"ChannelCapacity\": 1000,
\"ChannelFullMode\": \"DropOldest\"
},
\"HealthCheck\": {
\"Enabled\": true,
\"TestTagAddress\": \"TestChildObject.TestBool\",
\"MaxStaleDataMinutes\": 5
},
\"Tls\": {
\"Enabled\": false
},
\"WebServer\": {
\"Enabled\": true,
\"Port\": 8081
},
\"Serilog\": {
\"MinimumLevel\": {
\"Default\": \"Information\",
\"Override\": {
\"Microsoft\": \"Warning\",
\"System\": \"Warning\",
\"Grpc\": \"Information\"
}
},
\"WriteTo\": [
{ \"Name\": \"Console\" },
{
\"Name\": \"File\",
\"Args\": {
\"path\": \"logs/lmxproxy-v2-.txt\",
\"rollingInterval\": \"Day\",
\"retainedFileCountLimit\": 30
}
}
]
}
}
'@ | Set-Content -Path 'C:\publish-v2\appsettings.json' -Encoding UTF8\""
```
**Key differences from production config**: gRPC port is 50052 (not 50051), web port is 8081 (not 8080), log file prefix is `lmxproxy-v2-`.
### 1.4 Copy apikeys.json
If v2 should use the same API keys as v1:
```bash
ssh windev "copy C:\publish\apikeys.json C:\publish-v2\apikeys.json"
```
If `C:\publish\apikeys.json` doesn't exist (the v2 service will auto-generate one on first start):
```bash
ssh windev "if not exist C:\publish\apikeys.json echo No existing apikeys.json - v2 will auto-generate"
```
### 1.5 Verify the publish directory
```bash
ssh windev "dir C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe && dir C:\publish-v2\appsettings.json"
```
## Step 2: Deploy v2 Host Service
### 2.1 Install as a separate Topshelf service
The v2 service runs alongside v1 on different ports. Install with a distinct service name:
```bash
ssh windev "C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe install -servicename \"ZB.MOM.WW.LmxProxy.Host.V2\" -displayname \"SCADA Bridge LMX Proxy V2\" -description \"LmxProxy v2 gRPC service (test deployment)\" --autostart"
```
### 2.2 Start the v2 service
```bash
ssh windev "sc start ZB.MOM.WW.LmxProxy.Host.V2"
```
### 2.3 Wait 10 seconds for startup, then verify
```bash
ssh windev "timeout /t 10 /nobreak >nul && sc query ZB.MOM.WW.LmxProxy.Host.V2"
```
Expected: `STATE: 4 RUNNING`.
### 2.4 Verify status page
From Mac, use curl to check the v2 status page:
```bash
curl -s http://10.100.0.48:8081/ | head -20
```
Expected: HTML containing "LmxProxy Status Dashboard".
```bash
curl -s http://10.100.0.48:8081/api/health
```
Expected: `OK` with HTTP 200.
```bash
curl -s http://10.100.0.48:8081/api/status | python3 -m json.tool | head -30
```
Expected: JSON with `serviceName`, `connection.isConnected: true`, version info.
### 2.5 Verify MxAccess connected
The status page should show `MxAccess Connection: Connected`. If it shows `Disconnected`, check the logs:
```bash
ssh windev "type C:\publish-v2\logs\lmxproxy-v2-*.txt | findstr /i \"error\""
```
### 2.6 Read the apikeys.json to get test keys
```bash
ssh windev "type C:\publish-v2\apikeys.json"
```
Record the ReadWrite and ReadOnly API keys for use in integration tests. Example structure:
```json
{
"Keys": [
{ "Key": "abc123...", "Role": "ReadWrite", "Description": "Default ReadWrite key" },
{ "Key": "def456...", "Role": "ReadOnly", "Description": "Default ReadOnly key" }
]
}
```
## Step 3: Create Integration Test Project
### 3.1 Create project
On windev (or Mac — the test project is .NET 10 and cross-platform):
```bash
cd /Users/dohertj2/Desktop/scadalink-design/lmxproxy
dotnet new xunit -n ZB.MOM.WW.LmxProxy.Client.IntegrationTests -o tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests --framework net10.0
```
### 3.2 Configure csproj
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests.csproj`
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<LangVersion>latest</LangVersion>
<Nullable>enable</Nullable>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="xunit" Version="2.9.3" />
<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2" />
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />
<PackageReference Include="Microsoft.Extensions.Configuration" Version="10.0.0" />
<PackageReference Include="Microsoft.Extensions.Configuration.Json" Version="10.0.0" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.LmxProxy.Client\ZB.MOM.WW.LmxProxy.Client.csproj" />
</ItemGroup>
<ItemGroup>
<None Update="appsettings.test.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
</Project>
```
### 3.3 Add to solution
Edit `ZB.MOM.WW.LmxProxy.slnx`:
```xml
<Solution>
<Folder Name="/src/">
<Project Path="src/ZB.MOM.WW.LmxProxy.Host/ZB.MOM.WW.LmxProxy.Host.csproj" />
<Project Path="src/ZB.MOM.WW.LmxProxy.Client/ZB.MOM.WW.LmxProxy.Client.csproj" />
</Folder>
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.LmxProxy.Host.Tests/ZB.MOM.WW.LmxProxy.Host.Tests.csproj" />
<Project Path="tests/ZB.MOM.WW.LmxProxy.Client.Tests/ZB.MOM.WW.LmxProxy.Client.Tests.csproj" />
<Project Path="tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests.csproj" />
</Folder>
</Solution>
```
### 3.4 Create test configuration
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/appsettings.test.json`
```json
{
"LmxProxy": {
"Host": "10.100.0.48",
"Port": 50052,
"ReadWriteApiKey": "REPLACE_WITH_ACTUAL_KEY",
"ReadOnlyApiKey": "REPLACE_WITH_ACTUAL_KEY",
"InvalidApiKey": "invalid-key-that-does-not-exist"
}
}
```
**IMPORTANT**: After reading the actual `apikeys.json` from windev in Step 2.6, replace the placeholder values with the real keys.
### 3.5 Create test base class
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/IntegrationTestBase.cs`
```csharp
using Microsoft.Extensions.Configuration;
using ZB.MOM.WW.LmxProxy.Client;
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
public abstract class IntegrationTestBase : IAsyncLifetime
{
protected IConfiguration Configuration { get; }
protected string Host { get; }
protected int Port { get; }
protected string ReadWriteApiKey { get; }
protected string ReadOnlyApiKey { get; }
protected string InvalidApiKey { get; }
protected LmxProxyClient? Client { get; set; }
protected IntegrationTestBase()
{
Configuration = new ConfigurationBuilder()
.AddJsonFile("appsettings.test.json")
.Build();
var section = Configuration.GetSection("LmxProxy");
Host = section["Host"] ?? "10.100.0.48";
Port = int.Parse(section["Port"] ?? "50052");
ReadWriteApiKey = section["ReadWriteApiKey"] ?? throw new Exception("ReadWriteApiKey not configured");
ReadOnlyApiKey = section["ReadOnlyApiKey"] ?? throw new Exception("ReadOnlyApiKey not configured");
InvalidApiKey = section["InvalidApiKey"] ?? "invalid-key";
}
protected LmxProxyClient CreateClient(string? apiKey = null)
{
return new LmxProxyClientBuilder()
.WithHost(Host)
.WithPort(Port)
.WithApiKey(apiKey ?? ReadWriteApiKey)
.WithTimeout(TimeSpan.FromSeconds(10))
.WithRetryPolicy(2, TimeSpan.FromSeconds(1))
.WithMetrics()
.Build();
}
public virtual async Task InitializeAsync()
{
Client = CreateClient();
await Client.ConnectAsync();
}
public virtual async Task DisposeAsync()
{
if (Client is not null)
{
await Client.DisconnectAsync();
Client.Dispose();
}
}
}
```
## Step 4: Integration Test Scenarios
### 4.1 Connection Lifecycle
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ConnectionTests.cs`
```csharp
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
public class ConnectionTests : IntegrationTestBase
{
[Fact]
public async Task ConnectAndDisconnect_Succeeds()
{
// Client is connected in InitializeAsync
Assert.True(await Client!.IsConnectedAsync());
await Client.DisconnectAsync();
Assert.False(await Client.IsConnectedAsync());
}
[Fact]
public async Task ConnectWithInvalidApiKey_Fails()
{
using var badClient = CreateClient(InvalidApiKey);
// Expect RpcException with StatusCode.Unauthenticated
var ex = await Assert.ThrowsAsync<Grpc.Core.RpcException>(
() => badClient.ConnectAsync());
Assert.Equal(Grpc.Core.StatusCode.Unauthenticated, ex.StatusCode);
}
[Fact]
public async Task DoubleConnect_IsIdempotent()
{
await Client!.ConnectAsync(); // Already connected — should be no-op
Assert.True(await Client.IsConnectedAsync());
}
}
```
### 4.2 Read Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ReadTests.cs`
```csharp
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
public class ReadTests : IntegrationTestBase
{
[Fact]
public async Task Read_BoolTag_ReturnsBoolValue()
{
var vtq = await Client!.ReadAsync("TestChildObject.TestBool");
Assert.IsType<bool>(vtq.Value);
Assert.True(vtq.Quality.IsGood());
}
[Fact]
public async Task Read_IntTag_ReturnsIntValue()
{
var vtq = await Client!.ReadAsync("TestChildObject.TestInt");
Assert.True(vtq.Value is int or long);
Assert.True(vtq.Quality.IsGood());
}
[Fact]
public async Task Read_FloatTag_ReturnsFloatValue()
{
var vtq = await Client!.ReadAsync("TestChildObject.TestFloat");
Assert.True(vtq.Value is float or double);
Assert.True(vtq.Quality.IsGood());
}
[Fact]
public async Task Read_DoubleTag_ReturnsDoubleValue()
{
var vtq = await Client!.ReadAsync("TestChildObject.TestDouble");
Assert.IsType<double>(vtq.Value);
Assert.True(vtq.Quality.IsGood());
}
[Fact]
public async Task Read_StringTag_ReturnsStringValue()
{
var vtq = await Client!.ReadAsync("TestChildObject.TestString");
Assert.IsType<string>(vtq.Value);
Assert.True(vtq.Quality.IsGood());
}
[Fact]
public async Task Read_DateTimeTag_ReturnsDateTimeValue()
{
var vtq = await Client!.ReadAsync("TestChildObject.TestDateTime");
Assert.IsType<DateTime>(vtq.Value);
Assert.True(vtq.Quality.IsGood());
Assert.True(DateTime.UtcNow - vtq.Timestamp < TimeSpan.FromHours(1));
}
[Fact]
public async Task ReadBatch_MultiplesTags_ReturnsDictionary()
{
var tags = new[] { "TestChildObject.TestString", "TestChildObject.TestString" };
var results = await Client!.ReadBatchAsync(tags);
Assert.Equal(2, results.Count);
Assert.True(results.ContainsKey("TestChildObject.TestString"));
Assert.True(results.ContainsKey("TestChildObject.TestString"));
}
[Fact]
public async Task Read_NonexistentTag_ReturnsBadQuality()
{
// Reading a tag that doesn't exist should return Bad quality
// (or throw — depends on Host implementation. Adjust assertion accordingly.)
var vtq = await Client!.ReadAsync("NonExistent.Tag.12345");
// If the Host returns success=false, ReadAsync will throw.
// If it returns success=true with bad quality, check quality.
// Adjust based on actual behavior.
}
}
```
### 4.3 Write Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteTests.cs`
```csharp
using ZB.MOM.WW.LmxProxy.Client.Domain;
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
public class WriteTests : IntegrationTestBase
{
[Fact]
public async Task WriteAndReadBack_StringValue()
{
string testValue = $"IntTest-{DateTime.UtcNow:HHmmss}";
// Write to a writable string tag
await Client!.WriteAsync("TestChildObject.TestString",
new TypedValue { StringValue = testValue });
// Read back and verify
await Task.Delay(500); // Allow time for write to propagate
var vtq = await Client.ReadAsync("TestChildObject.TestString");
Assert.Equal(testValue, vtq.Value);
}
[Fact]
public async Task WriteWithReadOnlyKey_ThrowsPermissionDenied()
{
using var readOnlyClient = CreateClient(ReadOnlyApiKey);
await readOnlyClient.ConnectAsync();
var ex = await Assert.ThrowsAsync<Grpc.Core.RpcException>(
() => readOnlyClient.WriteAsync("TestChildObject.TestString",
new TypedValue { StringValue = "should-fail" }));
Assert.Equal(Grpc.Core.StatusCode.PermissionDenied, ex.StatusCode);
}
}
```
### 4.4 Subscribe Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/SubscribeTests.cs`
```csharp
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
public class SubscribeTests : IntegrationTestBase
{
[Fact]
public async Task Subscribe_ReceivesUpdates()
{
var received = new List<(string Tag, Vtq Vtq)>();
var receivedEvent = new TaskCompletionSource<bool>();
var subscription = await Client!.SubscribeAsync(
new[] { "TestChildObject.TestInt" },
(tag, vtq) =>
{
received.Add((tag, vtq));
if (received.Count >= 3)
receivedEvent.TrySetResult(true);
},
ex => receivedEvent.TrySetException(ex));
// Wait up to 30 seconds for at least 3 updates
var completed = await Task.WhenAny(receivedEvent.Task, Task.Delay(TimeSpan.FromSeconds(30)));
subscription.Dispose();
Assert.True(received.Count >= 1, $"Expected at least 1 update, got {received.Count}");
// Verify the VTQ has correct structure
var first = received[0];
Assert.Equal("TestChildObject.TestInt", first.Tag);
Assert.NotNull(first.Vtq.Value);
// ScanTime should be a DateTime value
Assert.True(first.Vtq.Timestamp > DateTime.MinValue);
}
}
```
### 4.5 WriteBatchAndWait Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteBatchAndWaitTests.cs`
```csharp
using ZB.MOM.WW.LmxProxy.Client.Domain;
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
public class WriteBatchAndWaitTests : IntegrationTestBase
{
[Fact]
public async Task WriteBatchAndWait_TypeAwareComparison()
{
// This test requires a writable tag and a flag tag.
// Adjust tag names based on available tags in TestChildObject.
// Example: write values and poll a flag.
var values = new Dictionary<string, TypedValue>
{
["TestChildObject.TestString"] = new TypedValue { StringValue = "BatchTest" }
};
// Poll the same tag we wrote to (simple self-check)
var response = await Client!.WriteBatchAndWaitAsync(
values,
flagTag: "TestChildObject.TestString",
flagValue: new TypedValue { StringValue = "BatchTest" },
timeoutMs: 5000,
pollIntervalMs: 200);
Assert.True(response.Success);
Assert.True(response.FlagReached);
Assert.True(response.ElapsedMs < 5000);
}
}
```
### 4.6 CheckApiKey Tests
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/CheckApiKeyTests.cs`
```csharp
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
public class CheckApiKeyTests : IntegrationTestBase
{
[Fact]
public async Task CheckApiKey_ValidReadWrite_ReturnsValid()
{
var info = await Client!.CheckApiKeyAsync(ReadWriteApiKey);
Assert.True(info.IsValid);
}
[Fact]
public async Task CheckApiKey_ValidReadOnly_ReturnsValid()
{
var info = await Client!.CheckApiKeyAsync(ReadOnlyApiKey);
Assert.True(info.IsValid);
}
[Fact]
public async Task CheckApiKey_Invalid_ReturnsInvalid()
{
var info = await Client!.CheckApiKeyAsync("totally-invalid-key-12345");
Assert.False(info.IsValid);
}
}
```
## Step 5: Run Integration Tests
### 5.1 Build the test project (from Mac)
```bash
cd /Users/dohertj2/Desktop/scadalink-design/lmxproxy
dotnet build tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests
```
### 5.2 Run integration tests against v2 on alternate port
```bash
dotnet test tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests --verbosity normal
```
All tests should pass against `10.100.0.48:50052`.
### 5.3 Debug failures
If tests fail, check:
1. v2 service is running: `ssh windev "sc query ZB.MOM.WW.LmxProxy.Host.V2"`
2. v2 service logs: `ssh windev "type C:\publish-v2\logs\lmxproxy-v2-*.txt | findstr /i error"`
3. Network connectivity: `curl -s http://10.100.0.48:8081/api/health`
4. API keys match: `ssh windev "type C:\publish-v2\apikeys.json"`
### 5.4 Verify metrics after test run
```bash
curl -s http://10.100.0.48:8081/api/status | python3 -m json.tool
```
Should show non-zero operation counts for Read, ReadBatch, Write, etc.
## Step 6: Cutover
**Only proceed if ALL integration tests pass.**
### 6.1 Stop v1 service
```bash
ssh windev "sc stop ZB.MOM.WW.LmxProxy.Host"
```
Verify stopped:
```bash
ssh windev "sc query ZB.MOM.WW.LmxProxy.Host"
```
Expected: `STATE: 1 STOPPED`.
### 6.2 Stop v2 service
```bash
ssh windev "sc stop ZB.MOM.WW.LmxProxy.Host.V2"
```
### 6.3 Reconfigure v2 to production ports
Update `C:\publish-v2\appsettings.json`:
- Change `GrpcPort` from `50052` to `50051`
- Change `WebServer.Port` from `8081` to `8080`
- Change log file prefix from `lmxproxy-v2-` to `lmxproxy-`
```bash
ssh windev "powershell -Command \"(Get-Content 'C:\publish-v2\appsettings.json') -replace '50052','50051' -replace '8081','8080' -replace 'lmxproxy-v2-','lmxproxy-' | Set-Content 'C:\publish-v2\appsettings.json'\""
```
### 6.4 Uninstall v1 service
```bash
ssh windev "C:\publish\ZB.MOM.WW.LmxProxy.Host.exe uninstall -servicename \"ZB.MOM.WW.LmxProxy.Host\""
```
### 6.5 Uninstall v2 test service and reinstall as production service
```bash
ssh windev "C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe uninstall -servicename \"ZB.MOM.WW.LmxProxy.Host.V2\""
```
```bash
ssh windev "C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe install -servicename \"ZB.MOM.WW.LmxProxy.Host\" -displayname \"SCADA Bridge LMX Proxy\" -description \"LmxProxy v2 gRPC service\" --autostart"
```
### 6.6 Start the production service
```bash
ssh windev "sc start ZB.MOM.WW.LmxProxy.Host"
```
### 6.7 Verify on production ports
```bash
ssh windev "timeout /t 10 /nobreak >nul && sc query ZB.MOM.WW.LmxProxy.Host"
```
Expected: `STATE: 4 RUNNING`.
```bash
curl -s http://10.100.0.48:8080/api/health
```
Expected: `OK`.
```bash
curl -s http://10.100.0.48:8080/api/status | python3 -m json.tool | head -15
```
Expected: Connected, version shows v2.
### 6.8 Update test configuration and re-run integration tests
Update `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/appsettings.test.json`:
- Change `Port` from `50052` to `50051`
```bash
dotnet test tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests --verbosity normal
```
All tests should pass on the production port.
### 6.9 Configure service recovery
```bash
ssh windev "sc failure ZB.MOM.WW.LmxProxy.Host reset= 86400 actions= restart/60000/restart/300000/restart/600000"
```
This configures: restart after 1 min on first failure, 5 min on second, 10 min on subsequent. Reset counter after 1 day (86400 seconds).
## Step 7: Documentation Updates
### 7.1 Update windev.md
Add a section about the LmxProxy v2 service to `/Users/dohertj2/Desktop/scadalink-design/windev.md`:
```markdown
## LmxProxy v2
| Field | Value |
|---|---|
| Service Name | ZB.MOM.WW.LmxProxy.Host |
| Display Name | SCADA Bridge LMX Proxy |
| gRPC Port | 50051 |
| Status Page | http://10.100.0.48:8080/ |
| Health Endpoint | http://10.100.0.48:8080/api/health |
| Publish Directory | C:\publish-v2\ |
| API Keys | C:\publish-v2\apikeys.json |
| Logs | C:\publish-v2\logs\ |
| Protocol | v2 (TypedValue + QualityCode) |
```
### 7.2 Update lmxproxy CLAUDE.md
If `lmxproxy/CLAUDE.md` references v1 behavior, update:
- Change "currently v1 protocol" references to "v2 protocol"
- Update publish directory references from `C:\publish\` to `C:\publish-v2\`
- Update any value conversion notes (no more string heuristics)
### 7.3 Clean up v1 publish directory (optional)
```bash
ssh windev "if exist C:\publish\ ren C:\publish publish-v1-backup"
```
## Step 8: Veeam Backup
### 8.1 Take incremental backup
```bash
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; Start-VBRJob -Job (Get-VBRJob -Name 'Backup WW_DEV_VM')\""
```
### 8.2 Wait for backup to complete (check status)
```bash
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; (Get-VBRJob -Name 'Backup WW_DEV_VM').FindLastSession() | Select-Object State, Result, CreationTime, EndTime\""
```
Expected: `State: Stopped, Result: Success`.
### 8.3 Get the restore point ID
```bash
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; Get-VBRRestorePoint -Backup (Get-VBRBackup -Name 'Backup WW_DEV_VM') | Select-Object Id, CreationTime, Type, @{N='SizeGB';E={[math]::Round(\`$_.ApproxSize/1GB,2)}} | Format-Table -AutoSize\""
```
### 8.4 Record in windev.md
Add a new row to the Restore Points table in `windev.md`:
```markdown
| `XXXXXXXX` | 2026-XX-XX XX:XX | Increment | **Post-v2 deployment** — LmxProxy v2 live on port 50051 |
```
Replace placeholders with actual restore point ID and timestamp.
## Completion Criteria
- [ ] v2 Host binary published to `C:\publish-v2\` on windev
- [ ] v2 service installed and running on alternate ports (50052/8081) — verified via status page
- [ ] Integration test project created at `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/`
- [ ] All integration tests pass against v2 on alternate ports:
- [ ] Connect/disconnect lifecycle
- [ ] Read string tag `TestChildObject.TestString` — value "JoeDev", Good quality
- [ ] Read writable tag `TestChildObject.TestString`
- [ ] Write string then read-back verification
- [ ] ReadBatch multiple tags
- [ ] Subscribe to `TestChildObject.TestInt` — verify updates received with TypedValue + QualityCode
- [ ] WriteBatchAndWait with type-aware flag comparison
- [ ] CheckApiKey — valid ReadWrite, valid ReadOnly, invalid
- [ ] Write with ReadOnly key — PermissionDenied
- [ ] Connect with invalid API key — Unauthenticated
- [ ] v1 service stopped and uninstalled
- [ ] v2 service reconfigured to production ports (50051/8080) and reinstalled
- [ ] All integration tests pass on production ports
- [ ] Service recovery configured (restart on failure)
- [ ] `windev.md` updated with v2 service details
- [ ] `lmxproxy/CLAUDE.md` updated for v2
- [ ] Veeam backup taken and restore point ID recorded in `windev.md`
- [ ] v1 publish directory backed up or removed