deprecate(lmxproxy): move all LmxProxy code, tests, and docs to deprecated/
LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL adapter files, and related docs to deprecated/. Removed LmxProxy registration from DataConnectionFactory, project reference from DCL, protocol option from UI, and cleaned up all requirement docs.
This commit is contained in:
107
deprecated/lmxproxy/docs/deviations.md
Normal file
107
deprecated/lmxproxy/docs/deviations.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# LmxProxy v2 Rebuild — Deviations & Key Technical Decisions
|
||||
|
||||
Decisions made during implementation that differ from or extend the original plan.
|
||||
|
||||
## 1. Grpc.Tools downgraded to 2.68.1
|
||||
|
||||
**Plan specified**: Grpc.Tools 2.71.0
|
||||
**Actual**: 2.68.1
|
||||
**Why**: protoc.exe from 2.71.0 crashes with access violation (exit code 0xC0000005) on windev (Windows 10, x64). The 2.68.1 version works reliably.
|
||||
**How to apply**: If upgrading Grpc.Tools in the future, test protoc on windev first.
|
||||
|
||||
## 2. STA threading — three iterations
|
||||
|
||||
**Plan specified**: Dedicated STA thread with `BlockingCollection<Action>` dispatch queue and `Application.DoEvents()` message pump.
|
||||
**Iteration 1 (failed)**: `StaDispatchThread` with `BlockingCollection.Take()` + `Application.DoEvents()`. Failed because `Take()` blocked the STA thread, preventing the message pump from running. COM callbacks never fired.
|
||||
**Iteration 2 (partial)**: Replaced with `Task.Run` on thread pool (MTA). `OnDataChange` worked (MxAccess fires it on its own threads), but `OnWriteComplete` never fired (needs message-pump-based marshaling). Writes used fire-and-forget as a workaround.
|
||||
**Iteration 3 (current)**: `StaComThread` with Win32 `GetMessage`/`DispatchMessage` loop. Work dispatched via `PostThreadMessage(WM_APP)` which wakes the message pump. COM callbacks (`OnDataChange`, `OnWriteComplete`) are delivered between work items via `DispatchMessage`. All COM objects created and called on this single STA thread.
|
||||
**How to apply**: All MxAccess COM calls must go through `_staThread.RunAsync()`. Never call COM objects directly from thread pool threads. See `docs/sta_gap.md` for the full design analysis.
|
||||
|
||||
## 3. TypedValue property-level `_setCase` tracking
|
||||
|
||||
**Plan specified**: `GetValueCase()` heuristic checking non-default values (e.g., `if (BoolValue) return BoolValue`).
|
||||
**Actual**: Each property setter records `_setCase = TypedValueCase.XxxValue`, and `GetValueCase()` returns `_setCase` directly.
|
||||
**Why**: protobuf-net code-first has no native `oneof` support. The heuristic approach can't distinguish "field not set" from "field set to default value" (e.g., `BoolValue = false`, `DoubleValue = 0.0`, `Int32Value = 0`). Since protobuf-net calls property setters during deserialization, tracking in the setter correctly identifies which field was deserialized.
|
||||
**How to apply**: Always use `GetValueCase()` to determine which TypedValue field is set, never check for non-default values directly.
|
||||
|
||||
## 4. API key sent via HTTP header (DelegatingHandler)
|
||||
|
||||
**Plan specified**: API key sent in `ConnectRequest.ApiKey` field (request body).
|
||||
**Actual**: API key sent as `x-api-key` HTTP header on every gRPC request via `ApiKeyDelegatingHandler`, in addition to the request body.
|
||||
**Why**: The Host's `ApiKeyInterceptor` validates the `x-api-key` gRPC metadata header before any RPC handler executes. protobuf-net.Grpc's `CreateGrpcService<T>()` doesn't expose per-call metadata, so the header must be added at the HTTP transport level. A `DelegatingHandler` wrapping the `SocketsHttpHandler` adds it to all outgoing requests.
|
||||
**How to apply**: The `GrpcChannelFactory.CreateChannel()` accepts an optional `apiKey` parameter. The `LmxProxyClient` passes it during channel creation in `ConnectAsync`.
|
||||
|
||||
## 5. v2 test deployment on port 50100
|
||||
|
||||
**Plan specified**: Port 50052 for v2 test deployment.
|
||||
**Actual**: Port 50100.
|
||||
**Why**: Ports 50049–50060 are used by MxAccess internal COM connections (established TCP pairs between the COM client and server). Port 50052 was occupied by an ephemeral MxAccess connection from the v1 service.
|
||||
**How to apply**: When deploying alongside v1, use ports above 50100 to avoid MxAccess ephemeral port range.
|
||||
|
||||
## 6. CheckApiKey validates request body key
|
||||
|
||||
**Plan specified**: Not explicitly defined — the interceptor validates the header key.
|
||||
**Actual**: `CheckApiKey` RPC validates the key from the *request body* (`request.ApiKey`) against `ApiKeyService`, not the header key.
|
||||
**Why**: The `x-api-key` header always carries the caller's valid key (for interceptor auth). The `CheckApiKey` RPC is designed for clients to test whether a *different* key is valid, so it must check the body key independently.
|
||||
**How to apply**: `ScadaGrpcService` receives `ApiKeyService` as an optional constructor parameter.
|
||||
|
||||
## 7. OnWriteComplete callback — resolved via STA message pump
|
||||
|
||||
**Plan specified**: Wait for `OnWriteComplete` COM callback to confirm write success.
|
||||
**History**: Initially implemented as fire-and-forget because `OnWriteComplete` never fired — the Host had no Windows message pump to deliver the COM callback. See `docs/sta_gap.md` for the full analysis.
|
||||
**Resolution**: `StaComThread` (a dedicated STA thread with a Win32 `GetMessage`/`DispatchMessage` loop) was introduced, providing a proper message pump. All COM operations are now dispatched to this thread via `PostThreadMessage(WM_APP)`. The message pump delivers `OnWriteComplete` callbacks between work items.
|
||||
**Current behavior**: Write dispatches `_lmxProxy.Write()` on the STA thread, registers a `TaskCompletionSource` in `_pendingWrites`, then awaits the callback with a timeout. `OnWriteComplete` resolves or rejects the TCS with `MxStatusMapper` error details. If the callback doesn't arrive within the write timeout, falls back to success (fire-and-forget safety net). Clean up (UnAdvise + RemoveItem) happens on the STA thread after the callback or timeout.
|
||||
**How to apply**: Writes now get real confirmation from MxAccess. Secured write (1012) and verified write (1013) rejections are surfaced as exceptions via `OnWriteComplete`. The timeout fallback ensures writes don't hang if the callback is delayed.
|
||||
|
||||
## 8. SubscriptionManager must create MxAccess COM subscriptions
|
||||
|
||||
**Plan specified**: SubscriptionManager manages per-client channels and routes updates from MxAccess.
|
||||
**Actual**: SubscriptionManager must also call `IScadaClient.SubscribeAsync()` to create the underlying COM subscriptions when a tag is first subscribed, and dispose them when the last client unsubscribes.
|
||||
**Why**: The Phase 2 implementation tracked client-to-tag routing in internal dictionaries but never called `MxAccessClient.SubscribeAsync()` to create the actual MxAccess COM subscriptions (`AddItem` + `AdviseSupervisory`). Without the COM subscription, `OnDataChange` never fired and no updates were delivered to clients. This caused the `Subscribe_ReceivesUpdates` integration test to receive 0 updates over 30 seconds.
|
||||
**How to apply**: `SubscriptionManager.SubscribeAsync()` collects newly-seen tags (those without an existing `TagSubscription`) and **awaits** `_scadaClient.SubscribeAsync()` for them, passing `OnTagValueChanged` as the callback. The await ensures the COM subscription is fully established before the channel reader is returned — this prevents a race where the initial `OnDataChange` (first value delivery after `AdviseSupervisory`) fires before the gRPC stream handler starts reading. Previously this was fire-and-forget (`_ = CreateMxAccessSubscriptionsAsync()`), causing intermittent `Subscribe_ReceivesUpdates` test failures (0 updates in 30s).
|
||||
|
||||
---
|
||||
|
||||
# Known Gaps
|
||||
|
||||
## Gap 1: No active connection health probing
|
||||
|
||||
**Status**: Resolved (2026-03-22, commit `a6c01d7`).
|
||||
|
||||
**Problem**: `MxAccessClient.IsConnected` checks `_connectionState == Connected && _connectionHandle > 0`. When the AVEVA platform (aaBootstrap) is killed or restarted, the MxAccess COM object and handle remain valid in memory — `IsConnected` stays `true`. The auto-reconnect monitor loop (`MonitorConnectionAsync`) only triggers when `IsConnected` is `false`, so it never attempts reconnection.
|
||||
|
||||
**Observed behavior** (tested 2026-03-22): After killing the aaBootstrap process, all reads returned null values with Bad quality indefinitely. The monitor loop kept seeing `IsConnected == true` and never reconnected.
|
||||
|
||||
**Fix implemented**: The monitor loop now actively probes the connection using `ProbeConnectionAsync`, which reads a configurable test tag and classifies the result as `Healthy`, `TransportFailure`, or `DataDegraded`.
|
||||
- `TransportFailure` for N consecutive probes (default 3) → forced disconnect + full reconnect (new COM object, `Register`, `RecreateStoredSubscriptionsAsync`)
|
||||
- `DataDegraded` → stay connected, back off probe interval to 30s, report degraded status (platform objects may be stopped)
|
||||
- `Healthy` → reset counters, resume normal interval
|
||||
|
||||
**Verified** (tested 2026-03-22): Graceful platform stop via SMC → 4 failed probes → automatic reconnect → reads restored within ~60 seconds. All 17 integration tests pass after recovery. Subscribed clients receive `Bad_NotConnected` quality during outage, then Good quality resumes automatically.
|
||||
|
||||
**Configuration** (`appsettings.json` → `HealthCheck` section):
|
||||
- `TestTagAddress`: Tag to probe (default `TestChildObject.TestBool`)
|
||||
- `ProbeTimeoutMs`: Probe read timeout (default 5000ms)
|
||||
- `MaxConsecutiveTransportFailures`: Failures before forced reconnect (default 3)
|
||||
- `DegradedProbeIntervalMs`: Probe interval in degraded mode (default 30000ms)
|
||||
|
||||
## Gap 2: Stale SubscriptionManager handles after reconnect
|
||||
|
||||
**Status**: Resolved (2026-03-22, commit `a6c01d7`).
|
||||
|
||||
**Problem**: `SubscriptionManager` stored `IAsyncDisposable` handles from `_scadaClient.SubscribeAsync()` in `_mxAccessHandles`. After a reconnect, `MxAccessClient.RecreateStoredSubscriptionsAsync()` recreated COM subscriptions internally but `SubscriptionManager._mxAccessHandles` still held stale handles. Additionally, a batch subscription stored the same handle for every address — disposing one address would dispose the entire batch.
|
||||
|
||||
**Fix implemented**: Removed `_mxAccessHandles` entirely. `SubscriptionManager` no longer tracks COM subscription handles. Ownership is cleanly split:
|
||||
- `SubscriptionManager` owns client routing and ref-counting only
|
||||
- `MxAccessClient` owns COM subscription lifecycle via `_storedSubscriptions` and `_addressToHandle`
|
||||
- Unsubscribe uses `_scadaClient.UnsubscribeByAddressAsync(addresses)` — address-based, resolves to current handles regardless of reconnect history
|
||||
|
||||
## Gap 3: AVEVA objects don't auto-start after platform crash
|
||||
|
||||
**Status**: Documented. Platform behavior, not an LmxProxy issue.
|
||||
|
||||
**Observed behavior** (tested 2026-03-22): After killing aaBootstrap, the service auto-restarted (via Windows SCM recovery or Watchdog) within seconds. However, the ArchestrA objects (TestChildObject) did not automatically start. MxAccess connected successfully (`Register()` returned a valid handle) but all tag reads returned null values with Bad quality for 40+ minutes. Objects only recovered after manual restart via the System Management Console (SMC).
|
||||
|
||||
**Implication for LmxProxy**: Even with Gap 1 fixed (active probing + reconnect), reads will still return Bad quality until the platform objects are running. LmxProxy cannot fix this — it's a platform-level recovery issue. The health check should report this clearly: "MxAccess connected but tag quality is Bad — platform objects may need manual restart."
|
||||
|
||||
**Timeline**: aaBootstrap restart from SMC (graceful) takes ~5 minutes for objects to come back. aaBootstrap kill (crash) requires manual object restart via SMC — objects do not auto-recover.
|
||||
360
deprecated/lmxproxy/docs/lmxproxy_protocol.md
Normal file
360
deprecated/lmxproxy/docs/lmxproxy_protocol.md
Normal file
@@ -0,0 +1,360 @@
|
||||
# LmxProxy Protocol Specification
|
||||
|
||||
The LmxProxy protocol is a gRPC-based SCADA read/write interface for bridging ScadaLink's Data Connection Layer to devices via an intermediary proxy server (LmxProxy). The proxy translates LmxProxy protocol operations into backend device calls (e.g., OPC UA). All communication uses HTTP/2 gRPC with Protocol Buffers.
|
||||
|
||||
## Service Definition
|
||||
|
||||
```protobuf
|
||||
syntax = "proto3";
|
||||
package scada;
|
||||
|
||||
service ScadaService {
|
||||
rpc Connect(ConnectRequest) returns (ConnectResponse);
|
||||
rpc Disconnect(DisconnectRequest) returns (DisconnectResponse);
|
||||
rpc GetConnectionState(GetConnectionStateRequest) returns (GetConnectionStateResponse);
|
||||
rpc Read(ReadRequest) returns (ReadResponse);
|
||||
rpc ReadBatch(ReadBatchRequest) returns (ReadBatchResponse);
|
||||
rpc Write(WriteRequest) returns (WriteResponse);
|
||||
rpc WriteBatch(WriteBatchRequest) returns (WriteBatchResponse);
|
||||
rpc WriteBatchAndWait(WriteBatchAndWaitRequest) returns (WriteBatchAndWaitResponse);
|
||||
rpc Subscribe(SubscribeRequest) returns (stream VtqMessage);
|
||||
rpc CheckApiKey(CheckApiKeyRequest) returns (CheckApiKeyResponse);
|
||||
}
|
||||
```
|
||||
|
||||
Proto file location: `src/ScadaLink.DataConnectionLayer/Adapters/Protos/scada.proto`
|
||||
|
||||
## Connection Lifecycle
|
||||
|
||||
### Session Model
|
||||
|
||||
Every client must call `Connect` before performing any read, write, or subscribe operation. The server returns a session ID (32-character hex GUID) that must be included in all subsequent requests. Sessions persist until `Disconnect` is called or the server restarts — there is no idle timeout.
|
||||
|
||||
### Authentication
|
||||
|
||||
API key authentication is optional, controlled by server configuration:
|
||||
|
||||
- **If required**: The `Connect` RPC fails with `success=false` if the API key doesn't match.
|
||||
- **If not required**: All API keys are accepted (including empty).
|
||||
- The API key is sent both in the `ConnectRequest.api_key` field and as an `x-api-key` gRPC metadata header on the `Connect` call.
|
||||
|
||||
### Connect
|
||||
|
||||
```
|
||||
ConnectRequest {
|
||||
client_id: string // Client identifier (e.g., "ScadaLink-{guid}")
|
||||
api_key: string // API key for authentication (empty if none)
|
||||
}
|
||||
|
||||
ConnectResponse {
|
||||
success: bool // Whether connection succeeded
|
||||
message: string // Status message
|
||||
session_id: string // 32-char hex GUID (only valid if success=true)
|
||||
}
|
||||
```
|
||||
|
||||
The client generates `client_id` as `"ScadaLink-{Guid:N}"` for uniqueness.
|
||||
|
||||
### Disconnect
|
||||
|
||||
```
|
||||
DisconnectRequest {
|
||||
session_id: string
|
||||
}
|
||||
|
||||
DisconnectResponse {
|
||||
success: bool
|
||||
message: string
|
||||
}
|
||||
```
|
||||
|
||||
Best-effort — the client calls disconnect but does not retry on failure.
|
||||
|
||||
### GetConnectionState
|
||||
|
||||
```
|
||||
GetConnectionStateRequest {
|
||||
session_id: string
|
||||
}
|
||||
|
||||
GetConnectionStateResponse {
|
||||
is_connected: bool
|
||||
client_id: string
|
||||
connected_since_utc_ticks: int64 // DateTime.UtcNow.Ticks at connect time
|
||||
}
|
||||
```
|
||||
|
||||
### CheckApiKey
|
||||
|
||||
```
|
||||
CheckApiKeyRequest {
|
||||
api_key: string
|
||||
}
|
||||
|
||||
CheckApiKeyResponse {
|
||||
is_valid: bool
|
||||
message: string
|
||||
}
|
||||
```
|
||||
|
||||
Standalone API key validation without creating a session.
|
||||
|
||||
## Value-Timestamp-Quality (VTQ)
|
||||
|
||||
The core data structure for all read and subscription results:
|
||||
|
||||
```
|
||||
VtqMessage {
|
||||
tag: string // Tag address
|
||||
value: string // Value encoded as string (see Value Encoding)
|
||||
timestamp_utc_ticks: int64 // UTC DateTime.Ticks (100ns intervals since 0001-01-01)
|
||||
quality: string // "Good", "Uncertain", or "Bad"
|
||||
}
|
||||
```
|
||||
|
||||
### Value Encoding
|
||||
|
||||
All values are transmitted as strings on the wire. Both client and server use the same parsing order:
|
||||
|
||||
| Wire String | Parsed Type | Example |
|
||||
|-------------|------------|---------|
|
||||
| Numeric (double-parseable) | `double` | `"42.5"` → `42.5` |
|
||||
| `"true"` / `"false"` (case-insensitive) | `bool` | `"True"` → `true` |
|
||||
| Everything else | `string` | `"Running"` → `"Running"` |
|
||||
| Empty string | `null` | `""` → `null` |
|
||||
|
||||
For write operations, values are converted to strings via `.ToString()` before transmission.
|
||||
|
||||
Arrays and lists are JSON-serialized (e.g., `[1,2,3]`).
|
||||
|
||||
### Quality Codes
|
||||
|
||||
Quality is transmitted as a case-insensitive string:
|
||||
|
||||
| Wire Value | Meaning | OPC UA Status Code |
|
||||
|-----------|---------|-------------------|
|
||||
| `"Good"` | Value is reliable | `0x00000000` (StatusCode == 0) |
|
||||
| `"Uncertain"` | Value may not be current | Non-zero, high bit clear |
|
||||
| `"Bad"` | Value is unreliable or unavailable | High bit set (`0x80000000`) |
|
||||
|
||||
A null or missing VTQ message is treated as Bad quality with null value and current UTC timestamp.
|
||||
|
||||
### Timestamps
|
||||
|
||||
- All timestamps are UTC.
|
||||
- Encoded as `int64` representing `DateTime.Ticks` (100-nanosecond intervals since 0001-01-01 00:00:00 UTC).
|
||||
- Client reconstructs via `new DateTime(ticks, DateTimeKind.Utc)`.
|
||||
|
||||
## Read Operations
|
||||
|
||||
### Read (Single Tag)
|
||||
|
||||
```
|
||||
ReadRequest {
|
||||
session_id: string // Valid session ID
|
||||
tag: string // Tag address
|
||||
}
|
||||
|
||||
ReadResponse {
|
||||
success: bool // Whether read succeeded
|
||||
message: string // Error message if failed
|
||||
vtq: VtqMessage // Value-timestamp-quality result
|
||||
}
|
||||
```
|
||||
|
||||
### ReadBatch (Multiple Tags)
|
||||
|
||||
```
|
||||
ReadBatchRequest {
|
||||
session_id: string
|
||||
tags: repeated string // Tag addresses
|
||||
}
|
||||
|
||||
ReadBatchResponse {
|
||||
success: bool // false if any tag failed
|
||||
message: string // Error message
|
||||
vtqs: repeated VtqMessage // Results in same order as request
|
||||
}
|
||||
```
|
||||
|
||||
Batch reads are **partially successful** — individual tags may have Bad quality while the overall response succeeds. If a tag read throws an exception, its VTQ is returned with Bad quality and current UTC timestamp.
|
||||
|
||||
## Write Operations
|
||||
|
||||
### Write (Single Tag)
|
||||
|
||||
```
|
||||
WriteRequest {
|
||||
session_id: string
|
||||
tag: string
|
||||
value: string // Value as string (parsed server-side)
|
||||
}
|
||||
|
||||
WriteResponse {
|
||||
success: bool
|
||||
message: string
|
||||
}
|
||||
```
|
||||
|
||||
### WriteBatch (Multiple Tags)
|
||||
|
||||
```
|
||||
WriteItem {
|
||||
tag: string
|
||||
value: string
|
||||
}
|
||||
|
||||
WriteResult {
|
||||
tag: string
|
||||
success: bool
|
||||
message: string
|
||||
}
|
||||
|
||||
WriteBatchRequest {
|
||||
session_id: string
|
||||
items: repeated WriteItem
|
||||
}
|
||||
|
||||
WriteBatchResponse {
|
||||
success: bool // Overall success (all items must succeed)
|
||||
message: string
|
||||
results: repeated WriteResult // Per-item results
|
||||
}
|
||||
```
|
||||
|
||||
Batch writes are **all-or-nothing** at the reporting level — if any item fails, overall `success` is `false`.
|
||||
|
||||
### WriteBatchAndWait (Atomic Write + Flag Polling)
|
||||
|
||||
A compound operation: write values, then poll a flag tag until it matches an expected value or times out.
|
||||
|
||||
```
|
||||
WriteBatchAndWaitRequest {
|
||||
session_id: string
|
||||
items: repeated WriteItem // Values to write
|
||||
flag_tag: string // Tag to poll after writes
|
||||
flag_value: string // Expected value (string comparison)
|
||||
timeout_ms: int32 // Timeout in ms (default 5000 if ≤ 0)
|
||||
poll_interval_ms: int32 // Poll interval in ms (default 100 if ≤ 0)
|
||||
}
|
||||
|
||||
WriteBatchAndWaitResponse {
|
||||
success: bool // Overall operation success
|
||||
message: string
|
||||
write_results: repeated WriteResult // Per-item write results
|
||||
flag_reached: bool // Whether flag matched before timeout
|
||||
elapsed_ms: int32 // Total elapsed time
|
||||
}
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
1. All writes execute first. If any write fails, the operation returns immediately with `success=false`.
|
||||
2. If writes succeed, polls `flag_tag` at `poll_interval_ms` intervals.
|
||||
3. Compares `readResult.Value?.ToString() == flag_value` (case-sensitive string comparison).
|
||||
4. If flag matches before timeout: `success=true`, `flag_reached=true`.
|
||||
5. If timeout expires: `success=true`, `flag_reached=false` (timeout is not an error).
|
||||
|
||||
## Subscription (Server Streaming)
|
||||
|
||||
### Subscribe
|
||||
|
||||
```
|
||||
SubscribeRequest {
|
||||
session_id: string
|
||||
tags: repeated string // Tag addresses to monitor
|
||||
sampling_ms: int32 // Backend sampling interval in milliseconds
|
||||
}
|
||||
|
||||
// Returns: stream of VtqMessage
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
|
||||
1. Server validates the session. Invalid session → `RpcException` with `StatusCode.Unauthenticated`.
|
||||
2. Server registers monitored items on the backend (e.g., OPC UA subscriptions) for all requested tags.
|
||||
3. On each value change, the server pushes a `VtqMessage` to the response stream.
|
||||
4. The stream remains open indefinitely until:
|
||||
- The client cancels (disposes the subscription).
|
||||
- The server encounters an error (backend disconnect, etc.).
|
||||
- The gRPC connection drops.
|
||||
5. On stream termination, the client's `onStreamError` callback fires exactly once.
|
||||
|
||||
**Client-side subscription lifecycle:**
|
||||
|
||||
```
|
||||
ILmxSubscription subscription = await client.SubscribeAsync(
|
||||
addresses: ["Motor.Speed", "Motor.Temperature"],
|
||||
onUpdate: (tag, vtq) => { /* handle value change */ },
|
||||
onStreamError: () => { /* handle disconnect */ });
|
||||
|
||||
// Later:
|
||||
await subscription.DisposeAsync(); // Cancels the stream
|
||||
```
|
||||
|
||||
Disposing the subscription cancels the underlying `CancellationTokenSource`, which terminates the background stream-reading task and triggers server-side cleanup of monitored items.
|
||||
|
||||
## Tag Addressing
|
||||
|
||||
Tags are string addresses that identify data points. The proxy maps tag addresses to backend-specific identifiers.
|
||||
|
||||
**LmxFakeProxy example** (OPC UA backend):
|
||||
|
||||
Tag addresses are concatenated with a configurable prefix to form OPC UA node IDs:
|
||||
|
||||
```
|
||||
Prefix: "ns=3;s="
|
||||
Tag: "Motor.Speed"
|
||||
NodeId: "ns=3;s=Motor.Speed"
|
||||
```
|
||||
|
||||
The prefix is configured at server startup via the `OPC_UA_PREFIX` environment variable.
|
||||
|
||||
## Transport Details
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Protocol | gRPC over HTTP/2 |
|
||||
| Default port | 50051 |
|
||||
| TLS | Optional (controlled by `UseTls` connection parameter) |
|
||||
| Metadata headers | `x-api-key` (sent on Connect call if API key configured) |
|
||||
|
||||
### Connection Parameters
|
||||
|
||||
The ScadaLink DCL configures LmxProxy connections via a string dictionary:
|
||||
|
||||
| Key | Type | Default | Description |
|
||||
|-----|------|---------|-------------|
|
||||
| `Host` | string | `"localhost"` | gRPC server hostname |
|
||||
| `Port` | string (parsed as int) | `"50051"` | gRPC server port |
|
||||
| `ApiKey` | string | (none) | API key for authentication |
|
||||
| `SamplingIntervalMs` | string (parsed as int) | `"0"` | Backend sampling interval for subscriptions |
|
||||
| `UseTls` | string (parsed as bool) | `"false"` | Use HTTPS instead of HTTP |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Operation | Error Mechanism | Client Behavior |
|
||||
|-----------|----------------|-----------------|
|
||||
| Connect | `success=false` in response | Throws `InvalidOperationException` |
|
||||
| Read/ReadBatch | `success=false` in response | Throws `InvalidOperationException` |
|
||||
| Write/WriteBatch | `success=false` in response | Throws `InvalidOperationException` |
|
||||
| WriteBatchAndWait | `success=false` or `flag_reached=false` | Returns result (timeout is not an exception) |
|
||||
| Subscribe (auth) | `RpcException` with `Unauthenticated` | Propagated to caller |
|
||||
| Subscribe (stream) | Stream ends or gRPC error | `onStreamError` callback invoked; `sessionId` nullified |
|
||||
| Any (disconnected) | Client checks `IsConnected` | Throws `InvalidOperationException("not connected")` |
|
||||
|
||||
When a subscription stream ends unexpectedly, the client immediately nullifies its session ID, causing `IsConnected` to return `false`. The DCL adapter fires its `Disconnected` event, which triggers the reconnection cycle in the `DataConnectionActor`.
|
||||
|
||||
## Implementation Files
|
||||
|
||||
| Component | File |
|
||||
|-----------|------|
|
||||
| Proto definition | `src/ScadaLink.DataConnectionLayer/Adapters/Protos/scada.proto` |
|
||||
| Client interface | `src/ScadaLink.DataConnectionLayer/Adapters/ILmxProxyClient.cs` |
|
||||
| Client implementation | `src/ScadaLink.DataConnectionLayer/Adapters/RealLmxProxyClient.cs` |
|
||||
| DCL adapter | `src/ScadaLink.DataConnectionLayer/Adapters/LmxProxyDataConnection.cs` |
|
||||
| Client factory | `src/ScadaLink.DataConnectionLayer/Adapters/LmxProxyClientFactory.cs` |
|
||||
| Server implementation | `infra/lmxfakeproxy/Services/ScadaServiceImpl.cs` |
|
||||
| Session manager | `infra/lmxfakeproxy/Sessions/SessionManager.cs` |
|
||||
| Tag mapper | `infra/lmxfakeproxy/TagMapper.cs` |
|
||||
| OPC UA bridge interface | `infra/lmxfakeproxy/Bridge/IOpcUaBridge.cs` |
|
||||
| OPC UA bridge impl | `infra/lmxfakeproxy/Bridge/OpcUaBridge.cs` |
|
||||
646
deprecated/lmxproxy/docs/lmxproxy_updates.md
Normal file
646
deprecated/lmxproxy/docs/lmxproxy_updates.md
Normal file
@@ -0,0 +1,646 @@
|
||||
# LmxProxy Protocol v2 — OPC UA Alignment
|
||||
|
||||
This document specifies all changes to the LmxProxy gRPC protocol to align it with OPC UA semantics. The changes replace string-serialized values with typed values and simple quality strings with OPC UA-style status codes.
|
||||
|
||||
**Baseline:** `lmxproxy_protocol.md` (v1 protocol spec)
|
||||
**Strategy:** Clean break — all clients and servers updated simultaneously. No backward compatibility layer.
|
||||
|
||||
---
|
||||
|
||||
## 1. Change Summary
|
||||
|
||||
| Message / Field | v1 Type | v2 Type | Breaking? |
|
||||
|-----------------|---------|---------|-----------|
|
||||
| `VtqMessage.value` | `string` | `TypedValue` | Yes |
|
||||
| `VtqMessage.quality` | `string` | `QualityCode` | Yes |
|
||||
| `WriteRequest.value` | `string` | `TypedValue` | Yes |
|
||||
| `WriteItem.value` | `string` | `TypedValue` | Yes |
|
||||
| `WriteBatchAndWaitRequest.flag_value` | `string` | `TypedValue` | Yes |
|
||||
|
||||
**Unchanged messages:** `ConnectRequest`, `ConnectResponse`, `DisconnectRequest`, `DisconnectResponse`, `GetConnectionStateRequest`, `GetConnectionStateResponse`, `CheckApiKeyRequest`, `CheckApiKeyResponse`, `ReadRequest`, `ReadBatchRequest`, `SubscribeRequest`, `WriteResponse`, `WriteBatchResponse`, `WriteBatchAndWaitResponse`, `WriteResult`.
|
||||
|
||||
**Unchanged RPCs:** The `ScadaService` definition is identical — same RPC names, same request/response pairing. Only the internal message shapes change.
|
||||
|
||||
---
|
||||
|
||||
## 2. Complete Updated Proto File
|
||||
|
||||
```protobuf
|
||||
syntax = "proto3";
|
||||
package scada;
|
||||
|
||||
// ============================================================
|
||||
// Service Definition (unchanged)
|
||||
// ============================================================
|
||||
|
||||
service ScadaService {
|
||||
rpc Connect(ConnectRequest) returns (ConnectResponse);
|
||||
rpc Disconnect(DisconnectRequest) returns (DisconnectResponse);
|
||||
rpc GetConnectionState(GetConnectionStateRequest) returns (GetConnectionStateResponse);
|
||||
rpc Read(ReadRequest) returns (ReadResponse);
|
||||
rpc ReadBatch(ReadBatchRequest) returns (ReadBatchResponse);
|
||||
rpc Write(WriteRequest) returns (WriteResponse);
|
||||
rpc WriteBatch(WriteBatchRequest) returns (WriteBatchResponse);
|
||||
rpc WriteBatchAndWait(WriteBatchAndWaitRequest) returns (WriteBatchAndWaitResponse);
|
||||
rpc Subscribe(SubscribeRequest) returns (stream VtqMessage);
|
||||
rpc CheckApiKey(CheckApiKeyRequest) returns (CheckApiKeyResponse);
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// NEW: Typed Value System
|
||||
// ============================================================
|
||||
|
||||
// Replaces the v1 string-encoded value field.
|
||||
// Exactly one field will be set. An unset oneof represents null.
|
||||
message TypedValue {
|
||||
oneof value {
|
||||
bool bool_value = 1;
|
||||
int32 int32_value = 2;
|
||||
int64 int64_value = 3;
|
||||
float float_value = 4;
|
||||
double double_value = 5;
|
||||
string string_value = 6;
|
||||
bytes bytes_value = 7; // byte[]
|
||||
int64 datetime_value = 8; // UTC DateTime.Ticks (100ns intervals since 0001-01-01)
|
||||
ArrayValue array_value = 9; // arrays of primitives
|
||||
}
|
||||
}
|
||||
|
||||
// Container for typed arrays. Exactly one field will be set.
|
||||
message ArrayValue {
|
||||
oneof values {
|
||||
BoolArray bool_values = 1;
|
||||
Int32Array int32_values = 2;
|
||||
Int64Array int64_values = 3;
|
||||
FloatArray float_values = 4;
|
||||
DoubleArray double_values = 5;
|
||||
StringArray string_values = 6;
|
||||
}
|
||||
}
|
||||
|
||||
message BoolArray { repeated bool values = 1; }
|
||||
message Int32Array { repeated int32 values = 1; }
|
||||
message Int64Array { repeated int64 values = 1; }
|
||||
message FloatArray { repeated float values = 1; }
|
||||
message DoubleArray { repeated double values = 1; }
|
||||
message StringArray { repeated string values = 1; }
|
||||
|
||||
// ============================================================
|
||||
// NEW: OPC UA-Style Quality Codes
|
||||
// ============================================================
|
||||
|
||||
// Replaces the v1 string quality field ("Good", "Bad", "Uncertain").
|
||||
message QualityCode {
|
||||
uint32 status_code = 1; // OPC UA-compatible numeric status code
|
||||
string symbolic_name = 2; // Human-readable name (e.g., "Good", "BadSensorFailure")
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// Connection Lifecycle (unchanged)
|
||||
// ============================================================
|
||||
|
||||
message ConnectRequest {
|
||||
string client_id = 1;
|
||||
string api_key = 2;
|
||||
}
|
||||
|
||||
message ConnectResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
string session_id = 3;
|
||||
}
|
||||
|
||||
message DisconnectRequest {
|
||||
string session_id = 1;
|
||||
}
|
||||
|
||||
message DisconnectResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
}
|
||||
|
||||
message GetConnectionStateRequest {
|
||||
string session_id = 1;
|
||||
}
|
||||
|
||||
message GetConnectionStateResponse {
|
||||
bool is_connected = 1;
|
||||
string client_id = 2;
|
||||
int64 connected_since_utc_ticks = 3;
|
||||
}
|
||||
|
||||
message CheckApiKeyRequest {
|
||||
string api_key = 1;
|
||||
}
|
||||
|
||||
message CheckApiKeyResponse {
|
||||
bool is_valid = 1;
|
||||
string message = 2;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// Value-Timestamp-Quality (CHANGED)
|
||||
// ============================================================
|
||||
|
||||
message VtqMessage {
|
||||
string tag = 1; // Tag address (unchanged)
|
||||
TypedValue value = 2; // CHANGED: typed value instead of string
|
||||
int64 timestamp_utc_ticks = 3; // UTC DateTime.Ticks (unchanged)
|
||||
QualityCode quality = 4; // CHANGED: structured quality instead of string
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// Read Operations (request unchanged, response uses new VtqMessage)
|
||||
// ============================================================
|
||||
|
||||
message ReadRequest {
|
||||
string session_id = 1;
|
||||
string tag = 2;
|
||||
}
|
||||
|
||||
message ReadResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
VtqMessage vtq = 3; // Uses updated VtqMessage with TypedValue + QualityCode
|
||||
}
|
||||
|
||||
message ReadBatchRequest {
|
||||
string session_id = 1;
|
||||
repeated string tags = 2;
|
||||
}
|
||||
|
||||
message ReadBatchResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
repeated VtqMessage vtqs = 3; // Uses updated VtqMessage
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// Write Operations (CHANGED: TypedValue instead of string)
|
||||
// ============================================================
|
||||
|
||||
message WriteRequest {
|
||||
string session_id = 1;
|
||||
string tag = 2;
|
||||
TypedValue value = 3; // CHANGED from string
|
||||
}
|
||||
|
||||
message WriteResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
}
|
||||
|
||||
message WriteItem {
|
||||
string tag = 1;
|
||||
TypedValue value = 2; // CHANGED from string
|
||||
}
|
||||
|
||||
message WriteResult {
|
||||
string tag = 1;
|
||||
bool success = 2;
|
||||
string message = 3;
|
||||
}
|
||||
|
||||
message WriteBatchRequest {
|
||||
string session_id = 1;
|
||||
repeated WriteItem items = 2;
|
||||
}
|
||||
|
||||
message WriteBatchResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
repeated WriteResult results = 3;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// WriteBatchAndWait (CHANGED: TypedValue for items and flag)
|
||||
// ============================================================
|
||||
|
||||
message WriteBatchAndWaitRequest {
|
||||
string session_id = 1;
|
||||
repeated WriteItem items = 2; // Uses updated WriteItem with TypedValue
|
||||
string flag_tag = 3;
|
||||
TypedValue flag_value = 4; // CHANGED from string — type-aware comparison
|
||||
int32 timeout_ms = 5;
|
||||
int32 poll_interval_ms = 6;
|
||||
}
|
||||
|
||||
message WriteBatchAndWaitResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
repeated WriteResult write_results = 3;
|
||||
bool flag_reached = 4;
|
||||
int32 elapsed_ms = 5;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// Subscription (request unchanged, stream uses new VtqMessage)
|
||||
// ============================================================
|
||||
|
||||
message SubscribeRequest {
|
||||
string session_id = 1;
|
||||
repeated string tags = 2;
|
||||
int32 sampling_ms = 3;
|
||||
}
|
||||
|
||||
// Returns: stream of VtqMessage (updated with TypedValue + QualityCode)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Detailed Change Specifications
|
||||
|
||||
### 3.1 Typed Value Representation
|
||||
|
||||
**What changed:** The `string value` field throughout the protocol is replaced by `TypedValue`, a protobuf `oneof` that carries the value in its native type.
|
||||
|
||||
**v1 behavior (removed):**
|
||||
- All values serialized to string via `.ToString()`
|
||||
- Client-side parsing heuristic: numeric → bool → string → null
|
||||
- Arrays JSON-serialized as strings (e.g., `"[1,2,3]"`)
|
||||
- Empty string treated as null
|
||||
|
||||
**v2 behavior:**
|
||||
- Values transmitted in their native protobuf type
|
||||
- No parsing ambiguity — the `oneof` case tells you the type
|
||||
- Arrays use dedicated repeated-field messages (`Int32Array`, `FloatArray`, etc.)
|
||||
- Null represented by an unset `oneof` (no field selected in `TypedValue`)
|
||||
- `datetime_value` uses `int64` UTC Ticks (same wire encoding as v1 timestamps, but now semantically typed as a DateTime value rather than a string)
|
||||
|
||||
**Null handling:**
|
||||
|
||||
| Scenario | v1 | v2 |
|
||||
|----------|----|----|
|
||||
| Null value | `value = ""` (empty string) | `TypedValue` with no `oneof` case set |
|
||||
| Missing VTQ | Treated as Bad quality, null value | Same — Bad quality, unset `TypedValue` |
|
||||
|
||||
**Type mapping from internal tag model:**
|
||||
|
||||
| Tag Data Type | TypedValue Field | Notes |
|
||||
|---------------|-----------------|-------|
|
||||
| `bool` | `bool_value` | |
|
||||
| `int32` | `int32_value` | |
|
||||
| `int64` | `int64_value` | |
|
||||
| `float` | `float_value` | |
|
||||
| `double` | `double_value` | |
|
||||
| `string` | `string_value` | |
|
||||
| `byte[]` | `bytes_value` | |
|
||||
| `DateTime` | `datetime_value` | UTC Ticks as int64 |
|
||||
| `float[]` | `array_value.float_values` | |
|
||||
| `int32[]` | `array_value.int32_values` | |
|
||||
| Other arrays | Corresponding `ArrayValue` field | |
|
||||
|
||||
### 3.2 OPC UA-Style Quality Codes
|
||||
|
||||
**What changed:** The `string quality` field (one of `"Good"`, `"Uncertain"`, `"Bad"`) is replaced by `QualityCode` containing a numeric OPC UA status code and a human-readable symbolic name.
|
||||
|
||||
**v1 behavior (removed):**
|
||||
- Quality as case-insensitive string: `"Good"`, `"Uncertain"`, `"Bad"`
|
||||
- No sub-codes — all failures were just `"Bad"`
|
||||
|
||||
**v2 behavior:**
|
||||
- `status_code` is a `uint32` matching OPC UA `StatusCode` bit layout
|
||||
- `symbolic_name` is the human-readable equivalent (for logging, debugging, display)
|
||||
- Category derived from high bits: `0x00xxxxxx` = Good, `0x40xxxxxx` = Uncertain, `0x80xxxxxx` = Bad
|
||||
|
||||
**Supported quality codes:**
|
||||
|
||||
The quality codes below are filtered to those actively used by AVEVA System Platform, InTouch, and OI Server/DAServer (per AVEVA Tech Note TN1305). AVEVA's ecosystem maps OPC DA quality codes to OPC UA status codes when communicating over OPC UA. This table includes the OPC UA equivalents for the AVEVA-relevant quality states.
|
||||
|
||||
**Good Quality:**
|
||||
|
||||
| Symbolic Name | Status Code | AVEVA OPC DA Hex | AVEVA Description |
|
||||
|---------------|-------------|------------------|-------------------|
|
||||
| `Good` | `0x00000000` | `0x00C0` | Value is reliable, non-specific |
|
||||
| `GoodLocalOverride` | `0x00D80000` | `0x00D8` | Value has been manually overridden; input disconnected |
|
||||
|
||||
**Uncertain Quality:**
|
||||
|
||||
| Symbolic Name | Status Code | AVEVA OPC DA Hex | AVEVA Description |
|
||||
|---------------|-------------|------------------|-------------------|
|
||||
| `UncertainLastUsableValue` | `0x40900000` | `0x0044` | External source stopped writing; value is stale |
|
||||
| `UncertainSensorNotAccurate` | `0x42390000` | `0x0050` | Sensor out of calibration or clamped at limit |
|
||||
| `UncertainEngineeringUnitsExceeded` | `0x40540000` | `0x0054` | Value is outside defined engineering limits |
|
||||
| `UncertainSubNormal` | `0x40580000` | `0x0058` | Derived from multiple sources with insufficient good sources |
|
||||
|
||||
**Bad Quality:**
|
||||
|
||||
| Symbolic Name | Status Code | AVEVA OPC DA Hex | AVEVA Description |
|
||||
|---------------|-------------|------------------|-------------------|
|
||||
| `Bad` | `0x80000000` | `0x0000` | Non-specific bad; value is not useful |
|
||||
| `BadConfigurationError` | `0x80040000` | `0x0004` | Server-specific configuration problem (e.g., item deleted) |
|
||||
| `BadNotConnected` | `0x808A0000` | `0x0008` | Input not logically connected to a source |
|
||||
| `BadDeviceFailure` | `0x806B0000` | `0x000C` | Device failure detected |
|
||||
| `BadSensorFailure` | `0x806D0000` | `0x0010` | Sensor failure detected |
|
||||
| `BadLastKnownValue` | `0x80050000` | `0x0014` | Communication failed; last known value available (check timestamp age) |
|
||||
| `BadCommunicationFailure` | `0x80050000` | `0x0018` | Communication failed; no last known value available |
|
||||
| `BadOutOfService` | `0x808F0000` | `0x001C` | Block is off-scan or locked; item/group is inactive |
|
||||
|
||||
**Notes:**
|
||||
- AVEVA OPC DA quality codes use a 16-bit structure: 2 bits major (Good/Bad/Uncertain), 4 bits minor (sub-status), 2 bits limit (Not Limited, Low, High, Constant). The OPC UA status codes above are the standard UA equivalents.
|
||||
- The limit bits (Not Limited `0x00`, Low Limited `0x01`, High Limited `0x02`, Constant `0x03`) are appended to any quality code. For example, `Good + High Limited` = `0x00C2` in OPC DA. In OPC UA, limits are conveyed via separate status code bits but the base code remains the same.
|
||||
- AVEVA's "Initializing" state (seen when OI Server is still establishing communication) maps to `Bad` with no sub-code in OPC DA (`0x0000`). In OPC UA this is `BadWaitingForInitialData` (`0x80320000`).
|
||||
- This is the minimum set needed to simulate realistic AVEVA System Platform behavior. Additional OPC UA codes can be added if specific simulation scenarios require them.
|
||||
|
||||
**Category helper logic (C#):**
|
||||
|
||||
```csharp
|
||||
public static string GetCategory(uint statusCode) => statusCode switch
|
||||
{
|
||||
_ when (statusCode & 0xC0000000) == 0x00000000 => "Good",
|
||||
_ when (statusCode & 0xC0000000) == 0x40000000 => "Uncertain",
|
||||
_ when (statusCode & 0xC0000000) == 0x80000000 => "Bad",
|
||||
_ => "Unknown"
|
||||
};
|
||||
|
||||
public static bool IsGood(uint statusCode) => (statusCode & 0xC0000000) == 0x00000000;
|
||||
public static bool IsBad(uint statusCode) => (statusCode & 0xC0000000) == 0x80000000;
|
||||
```
|
||||
|
||||
### 3.3 WriteBatchAndWait Flag Comparison
|
||||
|
||||
**What changed:** `flag_value` is now `TypedValue` instead of `string`. The server uses type-aware equality comparison instead of string comparison.
|
||||
|
||||
**v1 behavior (removed):**
|
||||
```csharp
|
||||
// v1: string comparison
|
||||
bool matched = readResult.Value?.ToString() == request.FlagValue;
|
||||
```
|
||||
|
||||
**v2 behavior:**
|
||||
```csharp
|
||||
// v2: type-aware comparison
|
||||
bool matched = TypedValueEquals(readResult.TypedValue, request.FlagValue);
|
||||
```
|
||||
|
||||
**Comparison rules:**
|
||||
- Both values must have the same `oneof` case (same type). Mismatched types are never equal.
|
||||
- Numeric comparison uses the native type's equality (no floating-point string round-trip issues).
|
||||
- String comparison is case-sensitive (unchanged from v1).
|
||||
- Bool comparison is direct equality.
|
||||
- Null (unset `oneof`) equals null. Null does not equal any set value.
|
||||
- Array comparison: element-by-element equality, same length required.
|
||||
- `datetime_value` compared as `int64` equality (tick-level precision).
|
||||
|
||||
---
|
||||
|
||||
## 4. Behavioral Changes
|
||||
|
||||
### 4.1 Read Operations
|
||||
|
||||
No RPC signature changes. The returned `VtqMessage` now uses `TypedValue` and `QualityCode` instead of strings.
|
||||
|
||||
**v1 client code:**
|
||||
```csharp
|
||||
var response = await client.ReadAsync(new ReadRequest { SessionId = sid, Tag = "Motor.Speed" });
|
||||
double value = double.Parse(response.Vtq.Value); // string → double
|
||||
bool isGood = response.Vtq.Quality.Equals("Good", ...); // string comparison
|
||||
```
|
||||
|
||||
**v2 client code:**
|
||||
```csharp
|
||||
var response = await client.ReadAsync(new ReadRequest { SessionId = sid, Tag = "Motor.Speed" });
|
||||
double value = response.Vtq.Value.DoubleValue; // direct typed access
|
||||
bool isGood = response.Vtq.Quality.StatusCode == 0x00000000; // numeric comparison
|
||||
// or: bool isGood = IsGood(response.Vtq.Quality.StatusCode); // helper method
|
||||
```
|
||||
|
||||
### 4.2 Write Operations
|
||||
|
||||
Client must construct `TypedValue` instead of converting to string.
|
||||
|
||||
**v1 client code:**
|
||||
```csharp
|
||||
await client.WriteAsync(new WriteRequest
|
||||
{
|
||||
SessionId = sid,
|
||||
Tag = "Motor.Speed",
|
||||
Value = 42.5.ToString() // double → string
|
||||
});
|
||||
```
|
||||
|
||||
**v2 client code:**
|
||||
```csharp
|
||||
await client.WriteAsync(new WriteRequest
|
||||
{
|
||||
SessionId = sid,
|
||||
Tag = "Motor.Speed",
|
||||
Value = new TypedValue { DoubleValue = 42.5 } // native type
|
||||
});
|
||||
```
|
||||
|
||||
### 4.3 Subscription Stream
|
||||
|
||||
No RPC signature changes. The streamed `VtqMessage` items now use the updated format. Client `onUpdate` callbacks receive typed values and structured quality.
|
||||
|
||||
### 4.4 Error Conditions with New Quality Codes
|
||||
|
||||
The server now returns specific quality codes instead of generic `"Bad"`:
|
||||
|
||||
| Scenario | v1 Quality | v2 Quality |
|
||||
|----------|-----------|-----------|
|
||||
| Tag not found | `"Bad"` | `BadConfigurationError` (`0x80040000`) |
|
||||
| Tag read exception / comms loss | `"Bad"` | `BadCommunicationFailure` (`0x80050000`) |
|
||||
| Write to read-only tag | `success=false` | WriteResult.success=false, message indicates read-only |
|
||||
| Type mismatch on write | `success=false` | WriteResult.success=false, message indicates type mismatch |
|
||||
| Simulated sensor failure | `"Bad"` | `BadSensorFailure` (`0x806D0000`) |
|
||||
| Simulated device failure | `"Bad"` | `BadDeviceFailure` (`0x806B0000`) |
|
||||
| Stale value (fault injection) | `"Uncertain"` | `UncertainLastUsableValue` (`0x40900000`) |
|
||||
| Block off-scan / disabled | `"Bad"` | `BadOutOfService` (`0x808F0000`) |
|
||||
| Local override active | `"Good"` | `GoodLocalOverride` (`0x00D80000`) |
|
||||
| Initializing / waiting for first value | `"Bad"` | `BadWaitingForInitialData` (`0x80320000`) |
|
||||
|
||||
---
|
||||
|
||||
## 5. Migration Guide
|
||||
|
||||
### 5.1 Strategy
|
||||
|
||||
**Clean break** — all clients and servers are updated simultaneously in a single coordinated release. No backward compatibility layer, no version negotiation, no dual-format support.
|
||||
|
||||
This is appropriate because:
|
||||
- The LmxProxy is an internal protocol between ScadaLink components, not a public API
|
||||
- The number of clients is small and controlled
|
||||
- Maintaining dual formats adds complexity with no long-term benefit
|
||||
|
||||
### 5.2 Server-Side Changes
|
||||
|
||||
**Files to update:**
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `scada.proto` | Replace with v2 proto (Section 2 of this document) |
|
||||
| `ScadaServiceImpl.cs` | Update all RPC handlers to construct `TypedValue` and `QualityCode` instead of strings |
|
||||
| `SessionManager.cs` | No changes (session model unchanged) |
|
||||
| `TagMapper.cs` | Update to return `TypedValue` from tag reads instead of string conversion |
|
||||
|
||||
**Server implementation notes:**
|
||||
- When reading a tag, construct `TypedValue` by setting the appropriate `oneof` field based on the tag's data type. Do not call `.ToString()`.
|
||||
- When a tag read fails, return `QualityCode { StatusCode = 0x80050000, SymbolicName = "BadCommunicationFailure" }` (or a more specific code) instead of the string `"Bad"`.
|
||||
- When handling writes, extract the value from the `TypedValue` oneof and apply it to the tag actor. If the `oneof` case doesn't match the tag's expected data type, return `WriteResult` with `success=false` and message indicating type mismatch.
|
||||
- For `WriteBatchAndWait` flag comparison, implement `TypedValueEquals()` per the comparison rules in Section 3.3.
|
||||
|
||||
### 5.3 Client-Side Changes
|
||||
|
||||
**Files to update:**
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `ILmxProxyClient.cs` | Interface unchanged (same method signatures, updated message types come from proto regeneration) |
|
||||
| `RealLmxProxyClient.cs` | Update value construction in write methods; update value extraction in read callbacks |
|
||||
| `LmxProxyDataConnection.cs` | Update DCL adapter to map between DCL's internal value model and `TypedValue`/`QualityCode` |
|
||||
| `LmxProxyClientFactory.cs` | No changes |
|
||||
|
||||
**Client implementation notes:**
|
||||
- Replace all `double.Parse(vtq.Value)` / `bool.Parse(vtq.Value)` calls with direct typed access (e.g., `vtq.Value.DoubleValue`).
|
||||
- Replace all `vtq.Quality.Equals("Good", ...)` string comparisons with numeric status code checks or the `IsGood()`/`IsBad()` helpers.
|
||||
- Replace all `.ToString()` value serialization in write paths with `TypedValue` construction.
|
||||
- The `onUpdate` callback signature in `SubscribeAsync` doesn't change at the interface level, but the `VtqMessage` it receives now contains `TypedValue` and `QualityCode`.
|
||||
|
||||
### 5.4 Migration Checklist
|
||||
|
||||
```
|
||||
[ ] Generate updated C# classes from v2 proto file
|
||||
[ ] Update server: ScadaServiceImpl read handlers → TypedValue + QualityCode
|
||||
[ ] Update server: ScadaServiceImpl write handlers → accept TypedValue
|
||||
[ ] Update server: WriteBatchAndWait flag comparison → TypedValueEquals()
|
||||
[ ] Update server: Error paths → specific QualityCode status codes
|
||||
[ ] Update client: RealLmxProxyClient read paths → typed value extraction
|
||||
[ ] Update client: RealLmxProxyClient write paths → TypedValue construction
|
||||
[ ] Update client: Quality checks → numeric status code comparison
|
||||
[ ] Update client: LmxProxyDataConnection DCL adapter → map TypedValue ↔ DCL values
|
||||
[ ] Update all unit tests for new message shapes
|
||||
[ ] Integration test: client ↔ server round-trip with all data types
|
||||
[ ] Integration test: WriteBatchAndWait with typed flag comparison
|
||||
[ ] Integration test: Subscription stream delivers typed VTQ messages
|
||||
[ ] Integration test: Error paths return correct QualityCode sub-codes
|
||||
[ ] Remove all string-based value parsing/serialization code
|
||||
[ ] Remove all string-based quality comparison code
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Test Scenarios for v2 Validation
|
||||
|
||||
These scenarios validate that the v2 protocol behaves correctly across all data types and quality codes.
|
||||
|
||||
### 6.1 Round-Trip Type Fidelity
|
||||
|
||||
For each supported data type, write a value via `Write`, read it back via `Read`, and verify the `TypedValue` oneof case and value match exactly:
|
||||
|
||||
| Data Type | Test Value | TypedValue Field | Verify |
|
||||
|-----------|-----------|-----------------|--------|
|
||||
| `bool` | `true` | `bool_value` | `== true` |
|
||||
| `int32` | `2147483647` | `int32_value` | `== int.MaxValue` |
|
||||
| `int64` | `9223372036854775807` | `int64_value` | `== long.MaxValue` |
|
||||
| `float` | `3.14159f` | `float_value` | `== 3.14159f` (exact bits) |
|
||||
| `double` | `2.718281828459045` | `double_value` | `== 2.718281828459045` (exact bits) |
|
||||
| `string` | `"Hello World"` | `string_value` | `== "Hello World"` |
|
||||
| `bytes` | `[0x00, 0xFF, 0x42]` | `bytes_value` | byte-for-byte match |
|
||||
| `DateTime` | `638789000000000000L` | `datetime_value` | `== 638789000000000000L` |
|
||||
| `float[]` | `[1.0f, 2.0f, 3.0f]` | `array_value.float_values` | element-wise match |
|
||||
| `int32[]` | `[10, 20, 30]` | `array_value.int32_values` | element-wise match |
|
||||
| null | (unset) | no oneof case | `Value case == None` |
|
||||
|
||||
### 6.2 Quality Code Propagation
|
||||
|
||||
| Scenario | Trigger | Expected QualityCode |
|
||||
|----------|---------|---------------------|
|
||||
| Normal read | Read a healthy tag | `{ 0x00000000, "Good" }` |
|
||||
| Local override | Script sets `GoodLocalOverride` | `{ 0x00D80000, "GoodLocalOverride" }` |
|
||||
| Fault injection: sensor failure | Script sets `BadSensorFailure` | `{ 0x806D0000, "BadSensorFailure" }` |
|
||||
| Fault injection: device failure | Script sets `BadDeviceFailure` | `{ 0x806B0000, "BadDeviceFailure" }` |
|
||||
| Fault injection: stale value | Script sets `UncertainLastUsableValue` | `{ 0x40900000, "UncertainLastUsableValue" }` |
|
||||
| Fault injection: off-scan | Script sets `BadOutOfService` | `{ 0x808F0000, "BadOutOfService" }` |
|
||||
| Fault injection: comms failure | Script sets `BadCommunicationFailure` | `{ 0x80050000, "BadCommunicationFailure" }` |
|
||||
| Unknown tag | Read nonexistent tag | `{ 0x80040000, "BadConfigurationError" }` |
|
||||
| Write to read-only | Write to a read-only tag | WriteResult.success=false, message contains "read-only" |
|
||||
|
||||
### 6.3 WriteBatchAndWait Typed Flag Comparison
|
||||
|
||||
| Flag Type | Written Value | Flag Value | Expected Result |
|
||||
|-----------|--------------|-----------|-----------------|
|
||||
| `bool` | `true` | `TypedValue { bool_value = true }` | `flag_reached = true` |
|
||||
| `bool` | `false` | `TypedValue { bool_value = true }` | `flag_reached = false` (timeout) |
|
||||
| `double` | `42.5` | `TypedValue { double_value = 42.5 }` | `flag_reached = true` |
|
||||
| `double` | `42.500001` | `TypedValue { double_value = 42.5 }` | `flag_reached = false` |
|
||||
| `string` | `"DONE"` | `TypedValue { string_value = "DONE" }` | `flag_reached = true` |
|
||||
| `string` | `"done"` | `TypedValue { string_value = "DONE" }` | `flag_reached = false` (case-sensitive) |
|
||||
| `int32` | `1` | `TypedValue { double_value = 1.0 }` | `flag_reached = false` (type mismatch) |
|
||||
|
||||
### 6.4 Subscription Stream
|
||||
|
||||
- Subscribe to tags of mixed data types
|
||||
- Verify each streamed `VtqMessage` has the correct `oneof` case matching the tag's data type
|
||||
- Inject a fault mid-stream and verify the quality code changes from `Good` to the injected code
|
||||
- Cancel the subscription and verify the stream terminates cleanly
|
||||
|
||||
---
|
||||
|
||||
## 7. Appendix: v1 → v2 Quick Reference
|
||||
|
||||
**Reading a value:**
|
||||
```csharp
|
||||
// v1
|
||||
string raw = vtq.Value;
|
||||
if (double.TryParse(raw, out var d)) { /* use d */ }
|
||||
else if (bool.TryParse(raw, out var b)) { /* use b */ }
|
||||
else { /* it's a string */ }
|
||||
|
||||
// v2
|
||||
switch (vtq.Value.ValueCase)
|
||||
{
|
||||
case TypedValue.ValueOneofCase.DoubleValue:
|
||||
double d = vtq.Value.DoubleValue;
|
||||
break;
|
||||
case TypedValue.ValueOneofCase.BoolValue:
|
||||
bool b = vtq.Value.BoolValue;
|
||||
break;
|
||||
case TypedValue.ValueOneofCase.StringValue:
|
||||
string s = vtq.Value.StringValue;
|
||||
break;
|
||||
case TypedValue.ValueOneofCase.None:
|
||||
// null value
|
||||
break;
|
||||
// ... other cases
|
||||
}
|
||||
```
|
||||
|
||||
**Writing a value:**
|
||||
```csharp
|
||||
// v1
|
||||
new WriteItem { Tag = "Motor.Speed", Value = 42.5.ToString() }
|
||||
|
||||
// v2
|
||||
new WriteItem { Tag = "Motor.Speed", Value = new TypedValue { DoubleValue = 42.5 } }
|
||||
```
|
||||
|
||||
**Checking quality:**
|
||||
```csharp
|
||||
// v1
|
||||
bool isGood = vtq.Quality.Equals("Good", StringComparison.OrdinalIgnoreCase);
|
||||
bool isBad = vtq.Quality.Equals("Bad", StringComparison.OrdinalIgnoreCase);
|
||||
|
||||
// v2
|
||||
bool isGood = (vtq.Quality.StatusCode & 0xC0000000) == 0x00000000;
|
||||
bool isBad = (vtq.Quality.StatusCode & 0xC0000000) == 0x80000000;
|
||||
// or use helper:
|
||||
bool isGood = QualityHelper.IsGood(vtq.Quality.StatusCode);
|
||||
```
|
||||
|
||||
**Constructing quality (server-side):**
|
||||
```csharp
|
||||
// v1
|
||||
vtq.Quality = "Good";
|
||||
|
||||
// v2
|
||||
vtq.Quality = new QualityCode { StatusCode = 0x00000000, SymbolicName = "Good" };
|
||||
// or for errors:
|
||||
vtq.Quality = new QualityCode { StatusCode = 0x806D0000, SymbolicName = "BadSensorFailure" };
|
||||
vtq.Quality = new QualityCode { StatusCode = 0x80050000, SymbolicName = "BadCommunicationFailure" };
|
||||
vtq.Quality = new QualityCode { StatusCode = 0x00D80000, SymbolicName = "GoodLocalOverride" };
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Document version: 1.0 — All decisions resolved. Complete proto, migration guide, and test scenarios.*
|
||||
@@ -0,0 +1,210 @@
|
||||
# LmxProxy v2 Rebuild — Design Document
|
||||
|
||||
**Date**: 2026-03-21
|
||||
**Status**: Approved
|
||||
**Scope**: Complete rebuild of LmxProxy Host and Client with v2 protocol
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Rebuild the LmxProxy gRPC proxy service from scratch, implementing the v2 protocol (TypedValue + QualityCode) as defined in `docs/lmxproxy_updates.md`. The existing code in `src/` is retained as reference only. No backward compatibility with v1.
|
||||
|
||||
## 2. Key Design Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| gRPC server for Host | Grpc.Core (C-core) | Only option for .NET Framework 4.8 server-side |
|
||||
| Service hosting | Topshelf | Proven, already deployed, simple install/uninstall |
|
||||
| Protocol version | v2 only, clean break | Small controlled client count, no value in v1 compat |
|
||||
| Shared code between projects | None — fully independent | Different .NET runtimes (.NET Fx 4.8 vs .NET 10), wire compat is the contract |
|
||||
| Client retry library | Polly v8+ | Building fresh on .NET 10, modern API |
|
||||
| Testing strategy | Unit tests during implementation, integration tests after Client functional | Phased approach, real hardware validation on windev |
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
### 3.1 Host (.NET Framework 4.8, x86)
|
||||
|
||||
```
|
||||
Program.cs (Topshelf entry point)
|
||||
└── LmxProxyService (lifecycle manager)
|
||||
├── Configuration (appsettings.json binding + validation)
|
||||
├── MxAccessClient (COM interop, STA dispatch thread)
|
||||
│ ├── Connection state machine
|
||||
│ ├── Read/Write with semaphore concurrency
|
||||
│ ├── Subscription storage for reconnect replay
|
||||
│ └── Auto-reconnect loop (5s interval)
|
||||
├── SessionManager (ConcurrentDictionary, 5-min inactivity scavenging)
|
||||
├── SubscriptionManager (per-client channels, shared MxAccess subscriptions)
|
||||
├── ApiKeyService (JSON file, FileSystemWatcher hot-reload)
|
||||
├── ScadaGrpcService (proto-generated, all 10 RPCs)
|
||||
│ └── ApiKeyInterceptor (x-api-key header enforcement)
|
||||
├── PerformanceMetrics (per-op tracking, p95, 60s log)
|
||||
├── HealthCheckService (basic + detailed with test tag)
|
||||
└── StatusWebServer (HTML dashboard, JSON status, health endpoint)
|
||||
```
|
||||
|
||||
### 3.2 Client (.NET 10, AnyCPU)
|
||||
|
||||
```
|
||||
ILmxProxyClient (public interface)
|
||||
└── LmxProxyClient (partial class)
|
||||
├── Connection (GrpcChannel, protobuf-net.Grpc, 30s keep-alive)
|
||||
├── Read/Write/Subscribe operations
|
||||
├── CodeFirstSubscription (IAsyncEnumerable streaming)
|
||||
├── ClientMetrics (p95/p99, 1000-sample buffer)
|
||||
└── Disposal (session disconnect, channel cleanup)
|
||||
|
||||
LmxProxyClientBuilder (fluent builder, Polly v8 resilience pipeline)
|
||||
ILmxProxyClientFactory + LmxProxyClientFactory (config-based creation)
|
||||
ServiceCollectionExtensions (DI registrations)
|
||||
StreamingExtensions (batched reads/writes, parallel processing)
|
||||
|
||||
Domain/
|
||||
├── ScadaContracts.cs (IScadaService + all DataContract messages)
|
||||
├── Quality.cs, QualityExtensions.cs
|
||||
├── Vtq.cs
|
||||
└── ConnectionState.cs
|
||||
```
|
||||
|
||||
### 3.3 Wire Compatibility
|
||||
|
||||
The `.proto` file is the single source of truth for the wire format. Host generates server stubs from it. Client implements code-first contracts (`[DataContract]`/`[ServiceContract]`) that mirror the proto exactly — same field numbers, names, nesting, and streaming shapes. Cross-stack serialization tests verify compatibility.
|
||||
|
||||
## 4. Protocol (v2)
|
||||
|
||||
### 4.1 TypedValue System
|
||||
|
||||
Protobuf `oneof` carrying native types:
|
||||
|
||||
| Case | Proto Type | .NET Type |
|
||||
|------|-----------|-----------|
|
||||
| bool_value | bool | bool |
|
||||
| int32_value | int32 | int |
|
||||
| int64_value | int64 | long |
|
||||
| float_value | float | float |
|
||||
| double_value | double | double |
|
||||
| string_value | string | string |
|
||||
| bytes_value | bytes | byte[] |
|
||||
| datetime_value | int64 (UTC Ticks) | DateTime |
|
||||
| array_value | ArrayValue | typed arrays |
|
||||
|
||||
Unset `oneof` = null. No string serialization heuristics.
|
||||
|
||||
### 4.2 COM Variant Coercion Table
|
||||
|
||||
| COM Variant Type | TypedValue Case | Notes |
|
||||
|-----------------|-----------------|-------|
|
||||
| VT_BOOL | bool_value | |
|
||||
| VT_I2 (short) | int32_value | Widened |
|
||||
| VT_I4 (int) | int32_value | |
|
||||
| VT_I8 (long) | int64_value | |
|
||||
| VT_UI2 (ushort) | int32_value | Widened |
|
||||
| VT_UI4 (uint) | int64_value | Widened to avoid sign issues |
|
||||
| VT_UI8 (ulong) | int64_value | Truncation risk logged if > long.MaxValue |
|
||||
| VT_R4 (float) | float_value | |
|
||||
| VT_R8 (double) | double_value | |
|
||||
| VT_BSTR (string) | string_value | |
|
||||
| VT_DATE (DateTime) | datetime_value | Converted to UTC Ticks |
|
||||
| VT_DECIMAL | double_value | Precision loss logged |
|
||||
| VT_CY (Currency) | double_value | |
|
||||
| VT_NULL, VT_EMPTY, DBNull | unset oneof | Represents null |
|
||||
| VT_ARRAY | array_value | Element type determines ArrayValue field |
|
||||
| VT_UNKNOWN | string_value | ToString() fallback, logged as warning |
|
||||
|
||||
### 4.3 QualityCode System
|
||||
|
||||
`status_code` (uint32, OPC UA-compatible) is canonical. `symbolic_name` is derived from a lookup table, never set independently.
|
||||
|
||||
Category derived from high bits:
|
||||
- `0x00xxxxxx` = Good
|
||||
- `0x40xxxxxx` = Uncertain
|
||||
- `0x80xxxxxx` = Bad
|
||||
|
||||
Domain `Quality` enum uses byte values for the low-order byte, with extension methods `IsGood()`, `IsBad()`, `IsUncertain()`.
|
||||
|
||||
### 4.4 Error Model
|
||||
|
||||
| Error Type | Mechanism | Examples |
|
||||
|-----------|-----------|----------|
|
||||
| Infrastructure | gRPC StatusCode | Unauthenticated (bad API key), PermissionDenied (ReadOnly write), InvalidArgument (bad session), Unavailable (MxAccess down) |
|
||||
| Business outcome | Payload `success`/`message` fields | Tag read failure, write type mismatch, batch partial failure, WriteBatchAndWait flag timeout |
|
||||
| Subscription | gRPC StatusCode on stream | Unauthenticated (invalid session), Internal (unexpected error) |
|
||||
|
||||
## 5. COM Threading Model
|
||||
|
||||
MxAccess is an STA COM component. All COM operations execute on a **dedicated STA thread** with a `BlockingCollection<Action>` dispatch queue:
|
||||
|
||||
- `MxAccessClient` creates a single STA thread at construction
|
||||
- All COM calls (connect, read, write, subscribe, disconnect) are dispatched to this thread via the queue
|
||||
- Callers await a `TaskCompletionSource<T>` that the STA thread completes after the COM call
|
||||
- The STA thread runs a message pump loop (`Application.Run` or manual `MSG` pump)
|
||||
- On disposal, a sentinel is enqueued and the thread joins with a 10-second timeout
|
||||
|
||||
This replaces the fragile `Task.Run` + `SemaphoreSlim` pattern in the reference code.
|
||||
|
||||
## 6. Session Lifecycle
|
||||
|
||||
- Sessions created on `Connect` with GUID "N" format (32-char hex)
|
||||
- Tracked in `ConcurrentDictionary<string, SessionInfo>`
|
||||
- **Inactivity scavenging**: sessions not accessed for 5 minutes are automatically terminated. Client keep-alive pings (30s) keep legitimate sessions alive.
|
||||
- On termination: subscriptions cleaned up, session removed from dictionary
|
||||
- All sessions lost on service restart (in-memory only)
|
||||
|
||||
## 7. Subscription Semantics
|
||||
|
||||
- **Shared MxAccess subscriptions**: first client to subscribe creates the underlying MxAccess subscription. Last to unsubscribe disposes it. Ref-counted.
|
||||
- **Sampling rate**: when multiple clients subscribe to the same tag with different `sampling_ms`, the fastest (lowest non-zero) rate is used for the MxAccess subscription. All clients receive updates at this rate.
|
||||
- **Per-client channels**: each client gets an independent `BoundedChannel<VtqMessage>` (capacity 1000, DropOldest). One slow consumer's drops do not affect other clients.
|
||||
- **MxAccess disconnect**: all subscribed clients receive a bad-quality notification for all their subscribed tags.
|
||||
- **Session termination**: all subscriptions for that session are cleaned up.
|
||||
|
||||
## 8. Authentication
|
||||
|
||||
- `x-api-key` gRPC metadata header is the authoritative authentication mechanism
|
||||
- `ConnectRequest.api_key` is accepted but the interceptor is the enforcement point
|
||||
- API keys loaded from JSON file with FileSystemWatcher hot-reload (1-second debounce)
|
||||
- Auto-generates default file with two random keys (ReadOnly + ReadWrite) if missing
|
||||
- Write-protected RPCs: Write, WriteBatch, WriteBatchAndWait
|
||||
|
||||
## 9. Phasing
|
||||
|
||||
| Phase | Scope | Depends On |
|
||||
|-------|-------|------------|
|
||||
| 1 | Protocol & Domain Types | — |
|
||||
| 2 | Host Core (MxAccessClient, SessionManager, SubscriptionManager) | Phase 1 |
|
||||
| 3 | Host gRPC Server, Security, Configuration, Service Hosting | Phase 2 |
|
||||
| 4 | Host Health, Metrics, Status Server | Phase 3 |
|
||||
| 5 | Client Core | Phase 1 |
|
||||
| 6 | Client Extras (Builder, Factory, DI, Streaming) | Phase 5 |
|
||||
| 7 | Integration Tests & Deployment | Phases 4 + 6 |
|
||||
|
||||
Phases 2-4 (Host) and 5-6 (Client) can proceed in parallel after Phase 1.
|
||||
|
||||
## 10. Guardrails
|
||||
|
||||
1. **Proto is the source of truth** — any wire format question is resolved by reading `scada.proto`, not the code-first contracts.
|
||||
2. **No v1 code in the new build** — reference only. Do not copy-paste and modify; write fresh.
|
||||
3. **Cross-stack tests in Phase 1** — Host proto serialize → Client code-first deserialize (and vice versa) before any business logic.
|
||||
4. **COM calls only on STA thread** — no `Task.Run` for COM operations. All go through the dispatch queue.
|
||||
5. **status_code is canonical for quality** — `symbolic_name` is always derived, never independently set.
|
||||
6. **Unit tests before integration** — every phase includes unit tests. Integration tests are Phase 7 only.
|
||||
7. **Each phase must compile and pass tests** before the next phase begins.
|
||||
8. **No string serialization heuristics** — v2 uses native TypedValue. No `double.TryParse` or `bool.TryParse` on values.
|
||||
|
||||
## 11. Resolved Conflicts
|
||||
|
||||
| Conflict | Resolution |
|
||||
|----------|-----------|
|
||||
| WriteBatchAndWait signature (MxAccessClient vs Protocol) | Follow Protocol spec: write items, poll flagTag for flagValue. IScadaClient interface matches protocol semantics. |
|
||||
| Builder default port 5050 vs Host 50051 | Standardize builder default to 50051 |
|
||||
| Auth in metadata vs payload | x-api-key header is authoritative; ConnectRequest.api_key accepted but interceptor enforces |
|
||||
|
||||
## 12. Reference Code
|
||||
|
||||
The existing code remains in `src/` as `src-reference/` for consultation:
|
||||
- `src-reference/ZB.MOM.WW.LmxProxy.Host/` — v1 Host implementation
|
||||
- `src-reference/ZB.MOM.WW.LmxProxy.Client/` — v1 Client implementation
|
||||
|
||||
Key reference files for COM interop patterns:
|
||||
- `Implementation/MxAccessClient.Connection.cs` — COM object lifecycle
|
||||
- `Implementation/MxAccessClient.EventHandlers.cs` — MxAccess callbacks
|
||||
- `Implementation/MxAccessClient.Subscription.cs` — Advise/Unadvise patterns
|
||||
@@ -0,0 +1,673 @@
|
||||
# Gap 1 & Gap 2: Active Health Probing + Subscription Handle Cleanup
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Fix two reconnect-related gaps: (1) the monitor loop cannot detect a silently-dead MxAccess connection, and (2) SubscriptionManager holds stale IAsyncDisposable handles after reconnect.
|
||||
|
||||
**Architecture:** Add a domain-level connection probe to `MxAccessClient` that classifies results as Healthy/TransportFailure/DataDegraded. The monitor loop uses this to decide reconnect vs degrade-and-backoff. Separately, remove `SubscriptionManager._mxAccessHandles` entirely and switch to address-based unsubscribe through `IScadaClient`, making `MxAccessClient` the sole owner of COM subscription lifecycle.
|
||||
|
||||
**Tech Stack:** .NET Framework 4.8, C#, MxAccess COM interop, Serilog
|
||||
|
||||
---
|
||||
|
||||
## Task 0: Add `ProbeResult` domain type
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.LmxProxy.Host/Domain/ProbeResult.cs`
|
||||
|
||||
**Step 1: Create the ProbeResult type**
|
||||
|
||||
```csharp
|
||||
using System;
|
||||
|
||||
namespace ZB.MOM.WW.LmxProxy.Host.Domain
|
||||
{
|
||||
public enum ProbeStatus
|
||||
{
|
||||
Healthy,
|
||||
TransportFailure,
|
||||
DataDegraded
|
||||
}
|
||||
|
||||
public sealed class ProbeResult
|
||||
{
|
||||
public ProbeStatus Status { get; }
|
||||
public Quality? Quality { get; }
|
||||
public DateTime? Timestamp { get; }
|
||||
public string? Message { get; }
|
||||
public Exception? Exception { get; }
|
||||
|
||||
private ProbeResult(ProbeStatus status, Quality? quality, DateTime? timestamp,
|
||||
string? message, Exception? exception)
|
||||
{
|
||||
Status = status;
|
||||
Quality = quality;
|
||||
Timestamp = timestamp;
|
||||
Message = message;
|
||||
Exception = exception;
|
||||
}
|
||||
|
||||
public static ProbeResult Healthy(Quality quality, DateTime timestamp)
|
||||
=> new ProbeResult(ProbeStatus.Healthy, quality, timestamp, null, null);
|
||||
|
||||
public static ProbeResult Degraded(Quality quality, DateTime timestamp, string message)
|
||||
=> new ProbeResult(ProbeStatus.DataDegraded, quality, timestamp, message, null);
|
||||
|
||||
public static ProbeResult TransportFailed(string message, Exception? ex = null)
|
||||
=> new ProbeResult(ProbeStatus.TransportFailure, null, null, message, ex);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/Domain/ProbeResult.cs
|
||||
git commit -m "feat: add ProbeResult domain type for connection health classification"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Add `ProbeConnectionAsync` to `MxAccessClient`
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs` — add `ProbeConnectionAsync` to interface
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — implement probe method
|
||||
|
||||
**Step 1: Add to IScadaClient interface**
|
||||
|
||||
In `IScadaClient.cs`, add after the `DisconnectAsync` method:
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Probes connection health by reading a test tag.
|
||||
/// Returns a classified result: Healthy, TransportFailure, or DataDegraded.
|
||||
/// </summary>
|
||||
Task<ProbeResult> ProbeConnectionAsync(string testTagAddress, int timeoutMs, CancellationToken ct = default);
|
||||
```
|
||||
|
||||
**Step 2: Implement in MxAccessClient.Connection.cs**
|
||||
|
||||
Add before `MonitorConnectionAsync`:
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Probes the connection by reading a test tag with a timeout.
|
||||
/// Classifies the result as transport failure vs data degraded.
|
||||
/// </summary>
|
||||
public async Task<ProbeResult> ProbeConnectionAsync(string testTagAddress, int timeoutMs,
|
||||
CancellationToken ct = default)
|
||||
{
|
||||
if (!IsConnected)
|
||||
return ProbeResult.TransportFailed("Not connected");
|
||||
|
||||
try
|
||||
{
|
||||
using (var cts = CancellationTokenSource.CreateLinkedTokenSource(ct))
|
||||
{
|
||||
cts.CancelAfter(timeoutMs);
|
||||
|
||||
Vtq vtq;
|
||||
try
|
||||
{
|
||||
vtq = await ReadAsync(testTagAddress, cts.Token);
|
||||
}
|
||||
catch (OperationCanceledException) when (!ct.IsCancellationRequested)
|
||||
{
|
||||
// Our timeout fired, not the caller's — treat as transport failure
|
||||
return ProbeResult.TransportFailed("Probe read timed out after " + timeoutMs + "ms");
|
||||
}
|
||||
|
||||
if (vtq.Quality == Domain.Quality.Bad_NotConnected ||
|
||||
vtq.Quality == Domain.Quality.Bad_CommFailure)
|
||||
{
|
||||
return ProbeResult.TransportFailed("Probe returned " + vtq.Quality);
|
||||
}
|
||||
|
||||
if (!vtq.Quality.IsGood())
|
||||
{
|
||||
return ProbeResult.Degraded(vtq.Quality, vtq.Timestamp,
|
||||
"Probe quality: " + vtq.Quality);
|
||||
}
|
||||
|
||||
if (DateTime.UtcNow - vtq.Timestamp > TimeSpan.FromMinutes(5))
|
||||
{
|
||||
return ProbeResult.Degraded(vtq.Quality, vtq.Timestamp,
|
||||
"Probe data stale (>" + 5 + "min)");
|
||||
}
|
||||
|
||||
return ProbeResult.Healthy(vtq.Quality, vtq.Timestamp);
|
||||
}
|
||||
}
|
||||
catch (System.Runtime.InteropServices.COMException ex)
|
||||
{
|
||||
return ProbeResult.TransportFailed("COM exception: " + ex.Message, ex);
|
||||
}
|
||||
catch (InvalidOperationException ex) when (ex.Message.Contains("Not connected"))
|
||||
{
|
||||
return ProbeResult.TransportFailed(ex.Message, ex);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
return ProbeResult.TransportFailed("Probe failed: " + ex.Message, ex);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs
|
||||
git commit -m "feat: add ProbeConnectionAsync to MxAccessClient for active health probing"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Add health check configuration
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Configuration/LmxProxyConfiguration.cs` — add HealthCheckConfiguration class and property
|
||||
|
||||
**Step 1: Add HealthCheckConfiguration**
|
||||
|
||||
Add a new class in the Configuration namespace (can be in the same file or a new file — keep it simple, same file):
|
||||
|
||||
```csharp
|
||||
/// <summary>Health check / probe configuration.</summary>
|
||||
public class HealthCheckConfiguration
|
||||
{
|
||||
/// <summary>Tag address to probe for connection liveness. Default: TestChildObject.TestBool.</summary>
|
||||
public string TestTagAddress { get; set; } = "TestChildObject.TestBool";
|
||||
|
||||
/// <summary>Probe timeout in milliseconds. Default: 5000.</summary>
|
||||
public int ProbeTimeoutMs { get; set; } = 5000;
|
||||
|
||||
/// <summary>Consecutive transport failures before forced reconnect. Default: 3.</summary>
|
||||
public int MaxConsecutiveTransportFailures { get; set; } = 3;
|
||||
|
||||
/// <summary>Probe interval while in degraded state (ms). Default: 30000 (30s).</summary>
|
||||
public int DegradedProbeIntervalMs { get; set; } = 30000;
|
||||
}
|
||||
```
|
||||
|
||||
Add to `LmxProxyConfiguration`:
|
||||
|
||||
```csharp
|
||||
/// <summary>Health check / active probe settings.</summary>
|
||||
public HealthCheckConfiguration HealthCheck { get; set; } = new HealthCheckConfiguration();
|
||||
```
|
||||
|
||||
**Step 2: Add to appsettings.json**
|
||||
|
||||
In the existing `appsettings.json`, add the `HealthCheck` section:
|
||||
|
||||
```json
|
||||
"HealthCheck": {
|
||||
"TestTagAddress": "TestChildObject.TestBool",
|
||||
"ProbeTimeoutMs": 5000,
|
||||
"MaxConsecutiveTransportFailures": 3,
|
||||
"DegradedProbeIntervalMs": 30000
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/Configuration/LmxProxyConfiguration.cs
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/appsettings.json
|
||||
git commit -m "feat: add HealthCheck configuration section for active connection probing"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Rewrite `MonitorConnectionAsync` with active probing
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.cs` — add probe state fields
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs` — rewrite monitor loop
|
||||
|
||||
The monitor needs configuration passed in. The simplest approach: add constructor parameters for the probe settings alongside the existing ones.
|
||||
|
||||
**Step 1: Add probe fields to MxAccessClient.cs**
|
||||
|
||||
Add fields after the existing reconnect fields (around line 42):
|
||||
|
||||
```csharp
|
||||
// Probe configuration
|
||||
private readonly string? _probeTestTagAddress;
|
||||
private readonly int _probeTimeoutMs;
|
||||
private readonly int _maxConsecutiveTransportFailures;
|
||||
private readonly int _degradedProbeIntervalMs;
|
||||
|
||||
// Probe state
|
||||
private int _consecutiveTransportFailures;
|
||||
private bool _isDegraded;
|
||||
```
|
||||
|
||||
Add constructor parameters and assignments. After the existing `_galaxyName = galaxyName;` line:
|
||||
|
||||
```csharp
|
||||
public MxAccessClient(
|
||||
int maxConcurrentOperations = 10,
|
||||
int readTimeoutSeconds = 5,
|
||||
int writeTimeoutSeconds = 5,
|
||||
int monitorIntervalSeconds = 5,
|
||||
bool autoReconnect = true,
|
||||
string? nodeName = null,
|
||||
string? galaxyName = null,
|
||||
string? probeTestTagAddress = null,
|
||||
int probeTimeoutMs = 5000,
|
||||
int maxConsecutiveTransportFailures = 3,
|
||||
int degradedProbeIntervalMs = 30000)
|
||||
```
|
||||
|
||||
And in the body:
|
||||
|
||||
```csharp
|
||||
_probeTestTagAddress = probeTestTagAddress;
|
||||
_probeTimeoutMs = probeTimeoutMs;
|
||||
_maxConsecutiveTransportFailures = maxConsecutiveTransportFailures;
|
||||
_degradedProbeIntervalMs = degradedProbeIntervalMs;
|
||||
```
|
||||
|
||||
**Step 2: Rewrite MonitorConnectionAsync in MxAccessClient.Connection.cs**
|
||||
|
||||
Replace the existing `MonitorConnectionAsync` (lines 177-213) with:
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Auto-reconnect monitor loop with active health probing.
|
||||
/// - If IsConnected is false: immediate reconnect (existing behavior).
|
||||
/// - If IsConnected is true and probe configured: read test tag each interval.
|
||||
/// - TransportFailure for N consecutive probes → forced disconnect + reconnect.
|
||||
/// - DataDegraded → stay connected, back off probe interval, report degraded.
|
||||
/// - Healthy → reset counters and resume normal interval.
|
||||
/// </summary>
|
||||
private async Task MonitorConnectionAsync(CancellationToken ct)
|
||||
{
|
||||
Log.Information("Connection monitor loop started (interval={IntervalMs}ms, probe={ProbeEnabled})",
|
||||
_monitorIntervalMs, _probeTestTagAddress != null);
|
||||
|
||||
while (!ct.IsCancellationRequested)
|
||||
{
|
||||
var interval = _isDegraded ? _degradedProbeIntervalMs : _monitorIntervalMs;
|
||||
|
||||
try
|
||||
{
|
||||
await Task.Delay(interval, ct);
|
||||
}
|
||||
catch (OperationCanceledException)
|
||||
{
|
||||
break;
|
||||
}
|
||||
|
||||
// ── Case 1: Already disconnected ──
|
||||
if (!IsConnected)
|
||||
{
|
||||
_isDegraded = false;
|
||||
_consecutiveTransportFailures = 0;
|
||||
await AttemptReconnectAsync(ct);
|
||||
continue;
|
||||
}
|
||||
|
||||
// ── Case 2: Connected, no probe configured — legacy behavior ──
|
||||
if (_probeTestTagAddress == null)
|
||||
continue;
|
||||
|
||||
// ── Case 3: Connected, probe configured — active health check ──
|
||||
var probe = await ProbeConnectionAsync(_probeTestTagAddress, _probeTimeoutMs, ct);
|
||||
|
||||
switch (probe.Status)
|
||||
{
|
||||
case ProbeStatus.Healthy:
|
||||
if (_isDegraded)
|
||||
{
|
||||
Log.Information("Probe healthy — exiting degraded mode");
|
||||
_isDegraded = false;
|
||||
}
|
||||
_consecutiveTransportFailures = 0;
|
||||
break;
|
||||
|
||||
case ProbeStatus.DataDegraded:
|
||||
_consecutiveTransportFailures = 0;
|
||||
if (!_isDegraded)
|
||||
{
|
||||
Log.Warning("Probe degraded: {Message} — entering degraded mode (probe interval {IntervalMs}ms)",
|
||||
probe.Message, _degradedProbeIntervalMs);
|
||||
_isDegraded = true;
|
||||
}
|
||||
break;
|
||||
|
||||
case ProbeStatus.TransportFailure:
|
||||
_isDegraded = false;
|
||||
_consecutiveTransportFailures++;
|
||||
Log.Warning("Probe transport failure ({Count}/{Max}): {Message}",
|
||||
_consecutiveTransportFailures, _maxConsecutiveTransportFailures, probe.Message);
|
||||
|
||||
if (_consecutiveTransportFailures >= _maxConsecutiveTransportFailures)
|
||||
{
|
||||
Log.Warning("Max consecutive transport failures reached — forcing reconnect");
|
||||
_consecutiveTransportFailures = 0;
|
||||
|
||||
try
|
||||
{
|
||||
await DisconnectAsync(ct);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Error during forced disconnect before reconnect");
|
||||
// DisconnectAsync already calls CleanupComObjectsAsync on error path
|
||||
}
|
||||
|
||||
await AttemptReconnectAsync(ct);
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
Log.Information("Connection monitor loop exited");
|
||||
}
|
||||
|
||||
private async Task AttemptReconnectAsync(CancellationToken ct)
|
||||
{
|
||||
Log.Information("Attempting reconnect...");
|
||||
SetState(ConnectionState.Reconnecting);
|
||||
|
||||
try
|
||||
{
|
||||
await ConnectAsync(ct);
|
||||
Log.Information("Reconnected to MxAccess successfully");
|
||||
}
|
||||
catch (OperationCanceledException)
|
||||
{
|
||||
// Let the outer loop handle cancellation
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Reconnect attempt failed, will retry at next interval");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.cs
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Connection.cs
|
||||
git commit -m "feat: rewrite monitor loop with active probing, transport vs degraded classification"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Wire probe config through `LmxProxyService.Start()`
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs` — pass HealthCheck config to MxAccessClient constructor
|
||||
|
||||
**Step 1: Update MxAccessClient construction**
|
||||
|
||||
In `LmxProxyService.Start()`, update the MxAccessClient creation (around line 62) to pass the new parameters:
|
||||
|
||||
```csharp
|
||||
_mxAccessClient = new MxAccessClient(
|
||||
maxConcurrentOperations: _config.Connection.MaxConcurrentOperations,
|
||||
readTimeoutSeconds: _config.Connection.ReadTimeoutSeconds,
|
||||
writeTimeoutSeconds: _config.Connection.WriteTimeoutSeconds,
|
||||
monitorIntervalSeconds: _config.Connection.MonitorIntervalSeconds,
|
||||
autoReconnect: _config.Connection.AutoReconnect,
|
||||
nodeName: _config.Connection.NodeName,
|
||||
galaxyName: _config.Connection.GalaxyName,
|
||||
probeTestTagAddress: _config.HealthCheck.TestTagAddress,
|
||||
probeTimeoutMs: _config.HealthCheck.ProbeTimeoutMs,
|
||||
maxConsecutiveTransportFailures: _config.HealthCheck.MaxConsecutiveTransportFailures,
|
||||
degradedProbeIntervalMs: _config.HealthCheck.DegradedProbeIntervalMs);
|
||||
```
|
||||
|
||||
**Step 2: Update DetailedHealthCheckService to use shared probe**
|
||||
|
||||
In `LmxProxyService.Start()`, update the DetailedHealthCheckService construction (around line 114) to pass the test tag address from config:
|
||||
|
||||
```csharp
|
||||
_detailedHealthCheckService = new DetailedHealthCheckService(
|
||||
_mxAccessClient, _config.HealthCheck.TestTagAddress);
|
||||
```
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs
|
||||
git commit -m "feat: wire HealthCheck config to MxAccessClient and DetailedHealthCheckService"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Add `UnsubscribeByAddressAsync` to `IScadaClient` and `MxAccessClient`
|
||||
|
||||
This is the foundation for removing handle-based unsubscribe from SubscriptionManager.
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs` — add `UnsubscribeByAddressAsync`
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Subscription.cs` — implement, change `UnsubscribeAsync` visibility
|
||||
|
||||
**Step 1: Add to IScadaClient**
|
||||
|
||||
After `SubscribeAsync`:
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Unsubscribes specific tag addresses. Removes from stored subscriptions
|
||||
/// and COM state. Safe to call after reconnect — uses current handle mappings.
|
||||
/// </summary>
|
||||
Task UnsubscribeByAddressAsync(IEnumerable<string> addresses);
|
||||
```
|
||||
|
||||
**Step 2: Implement in MxAccessClient.Subscription.cs**
|
||||
|
||||
The existing `UnsubscribeAsync` (line 53) already does exactly this — it's just `internal`. Rename it or add a public wrapper:
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Unsubscribes specific addresses by address name.
|
||||
/// Removes from both COM state and stored subscriptions (no reconnect replay).
|
||||
/// </summary>
|
||||
public async Task UnsubscribeByAddressAsync(IEnumerable<string> addresses)
|
||||
{
|
||||
await UnsubscribeAsync(addresses);
|
||||
}
|
||||
```
|
||||
|
||||
This keeps the existing `internal UnsubscribeAsync` unchanged (it's still used by `SubscriptionHandle.DisposeAsync`).
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/Domain/IScadaClient.cs
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.Subscription.cs
|
||||
git commit -m "feat: add UnsubscribeByAddressAsync to IScadaClient for address-based unsubscribe"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 6: Remove `_mxAccessHandles` from `SubscriptionManager`
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs`
|
||||
|
||||
**Step 1: Remove `_mxAccessHandles` field**
|
||||
|
||||
Delete line 34-35:
|
||||
|
||||
```csharp
|
||||
// REMOVE:
|
||||
private readonly ConcurrentDictionary<string, IAsyncDisposable> _mxAccessHandles
|
||||
= new ConcurrentDictionary<string, IAsyncDisposable>(StringComparer.OrdinalIgnoreCase);
|
||||
```
|
||||
|
||||
**Step 2: Rewrite `CreateMxAccessSubscriptionsAsync`**
|
||||
|
||||
The method no longer stores handles. It just calls `SubscribeAsync` to create the COM subscriptions. `MxAccessClient` stores them in `_storedSubscriptions` internally.
|
||||
|
||||
```csharp
|
||||
private async Task CreateMxAccessSubscriptionsAsync(List<string> addresses)
|
||||
{
|
||||
try
|
||||
{
|
||||
await _scadaClient.SubscribeAsync(
|
||||
addresses,
|
||||
(address, vtq) => OnTagValueChanged(address, vtq));
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Error(ex, "Failed to create MxAccess subscriptions for {Count} tags", addresses.Count);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: Rewrite unsubscribe logic in `UnsubscribeClient`**
|
||||
|
||||
Replace the handle disposal section (lines 198-212) with address-based unsubscribe:
|
||||
|
||||
```csharp
|
||||
// Unsubscribe tags with no remaining clients via address-based API
|
||||
if (tagsToDispose.Count > 0)
|
||||
{
|
||||
try
|
||||
{
|
||||
_scadaClient.UnsubscribeByAddressAsync(tagsToDispose).GetAwaiter().GetResult();
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Error unsubscribing {Count} tags from MxAccess", tagsToDispose.Count);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 4: Verify build**
|
||||
|
||||
```bash
|
||||
dotnet build src/ZB.MOM.WW.LmxProxy.Host
|
||||
```
|
||||
|
||||
Expected: Build succeeds. No references to `_mxAccessHandles` remain.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs
|
||||
git commit -m "fix: remove _mxAccessHandles from SubscriptionManager, use address-based unsubscribe"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 7: Wire `ConnectionStateChanged` for reconnect notification in `SubscriptionManager`
|
||||
|
||||
After reconnect, `RecreateStoredSubscriptionsAsync` recreates COM subscriptions, and `SubscriptionManager` continues to receive `OnTagValueChanged` callbacks because the callback references are preserved in `_storedSubscriptions`. However, we should notify subscribed clients that quality has been restored.
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs` — add `NotifyReconnection` method
|
||||
- Modify: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs` — wire Connected state to SubscriptionManager
|
||||
|
||||
**Step 1: Add `NotifyReconnection` to SubscriptionManager**
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Logs reconnection for observability. Data flow resumes automatically
|
||||
/// via MxAccessClient.RecreateStoredSubscriptionsAsync callbacks.
|
||||
/// </summary>
|
||||
public void NotifyReconnection()
|
||||
{
|
||||
Log.Information("MxAccess reconnected — subscriptions recreated, " +
|
||||
"data flow will resume via OnDataChange callbacks " +
|
||||
"({ClientCount} clients, {TagCount} tags)",
|
||||
_clientSubscriptions.Count, _tagSubscriptions.Count);
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: Wire in LmxProxyService.Start()**
|
||||
|
||||
Extend the existing `ConnectionStateChanged` handler (around line 97):
|
||||
|
||||
```csharp
|
||||
_mxAccessClient.ConnectionStateChanged += (sender, e) =>
|
||||
{
|
||||
if (e.CurrentState == Domain.ConnectionState.Disconnected ||
|
||||
e.CurrentState == Domain.ConnectionState.Error)
|
||||
{
|
||||
_subscriptionManager.NotifyDisconnection();
|
||||
}
|
||||
else if (e.CurrentState == Domain.ConnectionState.Connected &&
|
||||
e.PreviousState == Domain.ConnectionState.Reconnecting)
|
||||
{
|
||||
_subscriptionManager.NotifyReconnection();
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs
|
||||
git add src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs
|
||||
git commit -m "feat: wire reconnection notification to SubscriptionManager for observability"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 8: Build, deploy to windev, test
|
||||
|
||||
**Files:**
|
||||
- No code changes — build and deployment task.
|
||||
|
||||
**Step 1: Build the solution**
|
||||
|
||||
```bash
|
||||
dotnet build ZB.MOM.WW.LmxProxy.slnx
|
||||
```
|
||||
|
||||
Expected: Clean build, no errors.
|
||||
|
||||
**Step 2: Deploy to windev**
|
||||
|
||||
Follow existing deployment procedure per `docker/README.md` or manual copy to windev.
|
||||
|
||||
**Step 3: Manual test — Gap 1 (active probing)**
|
||||
|
||||
1. Start the v2 service on windev. Verify logs show: `Connection monitor loop started (interval=5000ms, probe=True)`.
|
||||
2. Verify probe runs: logs should show no warnings while platform is healthy.
|
||||
3. Kill aaBootstrap on windev. Within 15-20s (3 probe failures at 5s intervals), logs should show:
|
||||
- `Probe transport failure (1/3): Probe returned Bad_CommFailure` (or similar)
|
||||
- `Probe transport failure (2/3): ...`
|
||||
- `Probe transport failure (3/3): ...`
|
||||
- `Max consecutive transport failures reached — forcing reconnect`
|
||||
- `Attempting reconnect...`
|
||||
4. After platform restart (but objects still stopped): Logs should show `Probe degraded` and `entering degraded mode`, then probe backs off to 30s interval. No reconnect churn.
|
||||
5. After objects restart via SMC: Logs should show `Probe healthy — exiting degraded mode`.
|
||||
|
||||
**Step 4: Manual test — Gap 2 (subscription cleanup)**
|
||||
|
||||
1. Connect a gRPC client, subscribe to tags.
|
||||
2. Kill aaBootstrap → client receives `Bad_NotConnected` quality.
|
||||
3. Restart platform + objects. Verify client starts receiving Good quality updates again (via `RecreateStoredSubscriptionsAsync`).
|
||||
4. Disconnect the client. Verify logs show `Unsubscribed from N tags` (address-based) with no handle disposal errors.
|
||||
|
||||
---
|
||||
|
||||
## Design Rationale
|
||||
|
||||
### Why two failure modes in the probe?
|
||||
|
||||
| Failure Mode | Cause | Correct Response |
|
||||
|---|---|---|
|
||||
| **Transport failure** | COM object dead, platform process crashed, MxAccess unreachable | Force disconnect + reconnect |
|
||||
| **Data degraded** | COM session alive, AVEVA objects stopped, all reads return Bad quality | Stay connected, report degraded, back off probes |
|
||||
|
||||
Reconnecting on DataDegraded would churn COM objects with no benefit — the platform objects are stopped regardless of connection state. Observed: 40+ minutes of Bad quality after aaBootstrap crash until manual SMC restart.
|
||||
|
||||
### Why remove `_mxAccessHandles`?
|
||||
|
||||
1. **Batch handle bug**: `CreateMxAccessSubscriptionsAsync` stored the same `IAsyncDisposable` handle for every address in a batch. Disposing any one address disposed the entire batch, silently removing unrelated subscriptions from `_storedSubscriptions`.
|
||||
2. **Stale after reconnect**: `RecreateStoredSubscriptionsAsync` recreates COM subscriptions but doesn't produce new `SubscriptionManager` handles. Old handles point to disposed COM state.
|
||||
3. **Ownership violation**: `MxAccessClient` already owns subscription lifecycle via `_storedSubscriptions` and `_addressToHandle`. Duplicating ownership in `SubscriptionManager._mxAccessHandles` is a leaky abstraction.
|
||||
|
||||
The fix: `SubscriptionManager` owns client routing and ref counts only. `MxAccessClient` owns COM subscription lifecycle. Unsubscribe is by address, not by opaque handle.
|
||||
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"planPath": "lmxproxy/docs/plans/2026-03-22-gap1-gap2-reconnect-subscriptions.md",
|
||||
"tasks": [
|
||||
{"id": 0, "subject": "Task 0: Add ProbeResult domain type", "status": "pending"},
|
||||
{"id": 1, "subject": "Task 1: Add ProbeConnectionAsync to MxAccessClient", "status": "pending", "blockedBy": [0]},
|
||||
{"id": 2, "subject": "Task 2: Add health check configuration", "status": "pending"},
|
||||
{"id": 3, "subject": "Task 3: Rewrite MonitorConnectionAsync with active probing", "status": "pending", "blockedBy": [1, 2]},
|
||||
{"id": 4, "subject": "Task 4: Wire probe config through LmxProxyService.Start()", "status": "pending", "blockedBy": [2, 3]},
|
||||
{"id": 5, "subject": "Task 5: Add UnsubscribeByAddressAsync to IScadaClient", "status": "pending"},
|
||||
{"id": 6, "subject": "Task 6: Remove _mxAccessHandles from SubscriptionManager", "status": "pending", "blockedBy": [5]},
|
||||
{"id": 7, "subject": "Task 7: Wire ConnectionStateChanged for reconnect notification", "status": "pending", "blockedBy": [6]},
|
||||
{"id": 8, "subject": "Task 8: Build, deploy to windev, test", "status": "pending", "blockedBy": [4, 7]}
|
||||
],
|
||||
"lastUpdated": "2026-03-22T00:00:00Z"
|
||||
}
|
||||
185
deprecated/lmxproxy/docs/plans/lmxproxy-stale-session-fix.md
Normal file
185
deprecated/lmxproxy/docs/plans/lmxproxy-stale-session-fix.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# LmxProxy Stale Session Subscription Leak Fix
|
||||
|
||||
## Problem
|
||||
|
||||
When a gRPC client disconnects abruptly, Grpc.Core (the C-core library used by the .NET Framework 4.8 server) does not reliably fire the `ServerCallContext.CancellationToken`. This means:
|
||||
|
||||
1. The `Subscribe` RPC in `ScadaGrpcService` blocks forever on `reader.WaitToReadAsync(context.CancellationToken)` (line 368)
|
||||
2. The `finally` block with `_subscriptionManager.UnsubscribeClient(request.SessionId)` never runs
|
||||
3. The `ct.Register(() => UnsubscribeClient(clientId))` in `SubscriptionManager.SubscribeAsync` also never fires (same token)
|
||||
4. The old session's subscriptions leak in `SubscriptionManager._clientSubscriptions` and `_tagSubscriptions`
|
||||
|
||||
When the client reconnects with a new session ID, it creates duplicate subscriptions. Tags aren't cleaned up because they still have a ref-count from the leaked old session. Over time, client count grows and tag subscriptions accumulate.
|
||||
|
||||
The `SessionManager` does scavenge inactive sessions after 5 minutes, but it only removes the session from its own dictionary — it doesn't notify `SubscriptionManager` to clean up subscriptions.
|
||||
|
||||
## Fix
|
||||
|
||||
Bridge `SessionManager` scavenging to `SubscriptionManager` cleanup. When a session is scavenged due to inactivity, also call `SubscriptionManager.UnsubscribeClient()`.
|
||||
|
||||
### Step 1: Add cleanup callback to SessionManager
|
||||
|
||||
File: `src/ZB.MOM.WW.LmxProxy.Host/Sessions/SessionManager.cs`
|
||||
|
||||
Add a callback field and expose it:
|
||||
|
||||
```csharp
|
||||
// Add after the _inactivityTimeout field (line 22)
|
||||
private Action<string>? _onSessionScavenged;
|
||||
|
||||
/// <summary>
|
||||
/// Register a callback invoked when a session is scavenged due to inactivity.
|
||||
/// The callback receives the session ID.
|
||||
/// </summary>
|
||||
public void OnSessionScavenged(Action<string> callback)
|
||||
{
|
||||
_onSessionScavenged = callback;
|
||||
}
|
||||
```
|
||||
|
||||
Then in `ScavengeInactiveSessions`, invoke the callback for each scavenged session:
|
||||
|
||||
```csharp
|
||||
// In ScavengeInactiveSessions (line 103-118), change the foreach to:
|
||||
foreach (var kvp in expired)
|
||||
{
|
||||
if (_sessions.TryRemove(kvp.Key, out _))
|
||||
{
|
||||
Log.Information("Session {SessionId} scavenged (inactive since {LastActivity})",
|
||||
kvp.Key, kvp.Value.LastActivity);
|
||||
|
||||
// Notify subscriber cleanup
|
||||
try
|
||||
{
|
||||
_onSessionScavenged?.Invoke(kvp.Key);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Error in session scavenge callback for {SessionId}", kvp.Key);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Wire up the callback in LmxProxyService
|
||||
|
||||
File: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs`
|
||||
|
||||
After both `SessionManager` and `SubscriptionManager` are created, register the callback:
|
||||
|
||||
```csharp
|
||||
// Add after SubscriptionManager creation:
|
||||
_sessionManager.OnSessionScavenged(sessionId =>
|
||||
{
|
||||
Log.Information("Cleaning up subscriptions for scavenged session {SessionId}", sessionId);
|
||||
_subscriptionManager.UnsubscribeClient(sessionId);
|
||||
});
|
||||
```
|
||||
|
||||
Find where `_sessionManager` and `_subscriptionManager` are both initialized and add this line right after.
|
||||
|
||||
### Step 3: Also clean up on explicit Disconnect
|
||||
|
||||
This is already handled — `ScadaGrpcService.Disconnect()` (line 86) calls `_subscriptionManager.UnsubscribeClient(request.SessionId)` before terminating the session. No change needed.
|
||||
|
||||
### Step 4: Add proactive stream timeout (belt-and-suspenders)
|
||||
|
||||
The scavenger runs every 60 seconds with a 5-minute timeout. This means a leaked session could take up to 6 minutes to clean up. For faster detection, add a secondary timeout in the Subscribe RPC itself.
|
||||
|
||||
File: `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Services/ScadaGrpcService.cs`
|
||||
|
||||
In the `Subscribe` method, replace the simple `context.CancellationToken` with a combined token that also expires if the session becomes invalid:
|
||||
|
||||
```csharp
|
||||
// Replace the Subscribe method (lines 353-390) with:
|
||||
public override async Task Subscribe(
|
||||
Scada.SubscribeRequest request,
|
||||
IServerStreamWriter<Scada.VtqMessage> responseStream,
|
||||
ServerCallContext context)
|
||||
{
|
||||
if (!_sessionManager.ValidateSession(request.SessionId))
|
||||
{
|
||||
throw new RpcException(new GrpcStatus(StatusCode.Unauthenticated, "Invalid session"));
|
||||
}
|
||||
|
||||
var reader = await _subscriptionManager.SubscribeAsync(
|
||||
request.SessionId, request.Tags, context.CancellationToken);
|
||||
|
||||
try
|
||||
{
|
||||
// Use a combined approach: check both the gRPC cancellation token AND
|
||||
// periodic session validity. This works around Grpc.Core not reliably
|
||||
// firing CancellationToken on client disconnect.
|
||||
while (true)
|
||||
{
|
||||
// Wait for data with a timeout so we can periodically check session validity
|
||||
using var timeoutCts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
|
||||
using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(
|
||||
context.CancellationToken, timeoutCts.Token);
|
||||
|
||||
bool hasData;
|
||||
try
|
||||
{
|
||||
hasData = await reader.WaitToReadAsync(linkedCts.Token);
|
||||
}
|
||||
catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested
|
||||
&& !context.CancellationToken.IsCancellationRequested)
|
||||
{
|
||||
// Timeout expired, not a client disconnect — check if session is still valid
|
||||
if (!_sessionManager.ValidateSession(request.SessionId))
|
||||
{
|
||||
Log.Information("Subscribe stream ending — session {SessionId} no longer valid",
|
||||
request.SessionId);
|
||||
break;
|
||||
}
|
||||
continue; // Session still valid, keep waiting
|
||||
}
|
||||
|
||||
if (!hasData) break; // Channel completed
|
||||
|
||||
while (reader.TryRead(out var item))
|
||||
{
|
||||
var protoVtq = ConvertToProtoVtq(item.address, item.vtq);
|
||||
await responseStream.WriteAsync(protoVtq);
|
||||
}
|
||||
}
|
||||
}
|
||||
catch (OperationCanceledException)
|
||||
{
|
||||
// Client disconnected -- normal
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Error(ex, "Subscribe stream error for session {SessionId}", request.SessionId);
|
||||
throw new RpcException(new GrpcStatus(StatusCode.Internal, ex.Message));
|
||||
}
|
||||
finally
|
||||
{
|
||||
_subscriptionManager.UnsubscribeClient(request.SessionId);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This adds a 30-second periodic check: if no data arrives for 30 seconds, it checks whether the session is still valid. If the session was scavenged (client disconnected, 5-min timeout), the stream exits cleanly and runs the `finally` cleanup.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `Sessions/SessionManager.cs` | Add `_onSessionScavenged` callback, invoke during `ScavengeInactiveSessions` |
|
||||
| `LmxProxyService.cs` | Wire `_sessionManager.OnSessionScavenged` to `_subscriptionManager.UnsubscribeClient` |
|
||||
| `Grpc/Services/ScadaGrpcService.cs` | Add 30-second periodic session validity check in `Subscribe` loop |
|
||||
|
||||
## Testing
|
||||
|
||||
1. Start LmxProxy server
|
||||
2. Connect a client and subscribe to tags
|
||||
3. Kill the client process abruptly (not a clean disconnect)
|
||||
4. Check status page — client count should still show the old session
|
||||
5. Wait up to 5 minutes — session should be scavenged, subscription count should drop
|
||||
6. Reconnect client — should get a clean new session, no duplicate subscriptions
|
||||
7. Verify tag subscription counts match expected (no leaked refs)
|
||||
|
||||
## Optional: Reduce scavenge timeout for faster cleanup
|
||||
|
||||
In `LmxProxyService.cs` where `SessionManager` is constructed, consider reducing `inactivityTimeoutMinutes` from 5 to 2, since the Subscribe RPC now has its own 30-second validity check. The 5-minute timeout was the only cleanup path before; now it's belt-and-suspenders.
|
||||
2723
deprecated/lmxproxy/docs/plans/phase-1-protocol-domain-types.md
Normal file
2723
deprecated/lmxproxy/docs/plans/phase-1-protocol-domain-types.md
Normal file
File diff suppressed because it is too large
Load Diff
2067
deprecated/lmxproxy/docs/plans/phase-2-host-core.md
Normal file
2067
deprecated/lmxproxy/docs/plans/phase-2-host-core.md
Normal file
File diff suppressed because it is too large
Load Diff
1799
deprecated/lmxproxy/docs/plans/phase-3-host-grpc-security-config.md
Normal file
1799
deprecated/lmxproxy/docs/plans/phase-3-host-grpc-security-config.md
Normal file
File diff suppressed because it is too large
Load Diff
666
deprecated/lmxproxy/docs/plans/phase-4-host-health-metrics.md
Normal file
666
deprecated/lmxproxy/docs/plans/phase-4-host-health-metrics.md
Normal file
@@ -0,0 +1,666 @@
|
||||
# Phase 4: Host Health, Metrics & Status Server — Implementation Plan
|
||||
|
||||
**Date**: 2026-03-21
|
||||
**Prerequisites**: Phase 3 complete and passing (gRPC server, Security, Configuration, Service Hosting all functional)
|
||||
**Working Directory**: The lmxproxy repo is on windev at `C:\src\lmxproxy`
|
||||
|
||||
## Guardrails
|
||||
|
||||
1. **This is a v2 rebuild** — do not copy code from the v1 reference in `src-reference/`. Write fresh implementations guided by the design docs and the reference code's structure.
|
||||
2. **Host targets .NET Framework 4.8, x86** — all code must use C# 9.0 language features maximum (`LangVersion` is `9.0` in the csproj). No file-scoped namespaces, no `required` keyword, no collection expressions in Host code.
|
||||
3. **No new NuGet packages** — all required packages are already in the Host `.csproj` (`Microsoft.Extensions.Diagnostics.HealthChecks`, `Serilog`, `System.Threading.Channels`, `System.Text.Json` via framework).
|
||||
4. **Namespace**: `ZB.MOM.WW.LmxProxy.Host` with sub-namespaces matching folder structure (e.g., `ZB.MOM.WW.LmxProxy.Host.Health`, `ZB.MOM.WW.LmxProxy.Host.Metrics`, `ZB.MOM.WW.LmxProxy.Host.Status`).
|
||||
5. **All COM operations are on the STA thread** — health checks that read test tags must go through `MxAccessClient.ReadAsync()`, never directly touching COM objects.
|
||||
6. **Build must pass after each step**: `dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86`
|
||||
7. **Tests run on windev**: `dotnet test tests/ZB.MOM.WW.LmxProxy.Host.Tests --platform x86`
|
||||
|
||||
## Step 1: Create PerformanceMetrics
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs`
|
||||
|
||||
Create the `PerformanceMetrics` class in namespace `ZB.MOM.WW.LmxProxy.Host.Metrics`.
|
||||
|
||||
### 1.1 OperationMetrics (nested or separate class in same file)
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Host.Metrics
|
||||
{
|
||||
public class OperationMetrics
|
||||
{
|
||||
private readonly List<double> _durations = new List<double>();
|
||||
private readonly object _lock = new object();
|
||||
private long _totalCount;
|
||||
private long _successCount;
|
||||
private double _totalMilliseconds;
|
||||
private double _minMilliseconds = double.MaxValue;
|
||||
private double _maxMilliseconds;
|
||||
|
||||
public void Record(TimeSpan duration, bool success) { ... }
|
||||
public MetricsStatistics GetStatistics() { ... }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation details:
|
||||
- `Record(TimeSpan duration, bool success)`: Inside `lock (_lock)`, increment `_totalCount`, conditionally increment `_successCount`, add `duration.TotalMilliseconds` to `_durations` list, update `_totalMilliseconds`, `_minMilliseconds`, `_maxMilliseconds`. If `_durations.Count > 1000`, call `_durations.RemoveAt(0)` to maintain rolling buffer.
|
||||
- `GetStatistics()`: Inside `lock (_lock)`, return early with empty `MetricsStatistics` if `_totalCount == 0`. Otherwise sort `_durations`, compute p95 index as `(int)Math.Ceiling(sortedDurations.Count * 0.95) - 1`, clamp to `Math.Max(0, p95Index)`.
|
||||
|
||||
### 1.2 MetricsStatistics
|
||||
|
||||
```csharp
|
||||
public class MetricsStatistics
|
||||
{
|
||||
public long TotalCount { get; set; }
|
||||
public long SuccessCount { get; set; }
|
||||
public double SuccessRate { get; set; }
|
||||
public double AverageMilliseconds { get; set; }
|
||||
public double MinMilliseconds { get; set; }
|
||||
public double MaxMilliseconds { get; set; }
|
||||
public double Percentile95Milliseconds { get; set; }
|
||||
}
|
||||
```
|
||||
|
||||
### 1.3 ITimingScope interface and TimingScope implementation
|
||||
|
||||
```csharp
|
||||
public interface ITimingScope : IDisposable
|
||||
{
|
||||
void SetSuccess(bool success);
|
||||
}
|
||||
```
|
||||
|
||||
`TimingScope` is a private nested class inside `PerformanceMetrics`:
|
||||
- Constructor takes `PerformanceMetrics metrics, string operationName`, starts a `Stopwatch`.
|
||||
- `SetSuccess(bool success)` stores the flag (default `true`).
|
||||
- `Dispose()`: stops stopwatch, calls `_metrics.RecordOperation(_operationName, _stopwatch.Elapsed, _success)`. Guard against double-dispose with `_disposed` flag.
|
||||
|
||||
### 1.4 PerformanceMetrics class
|
||||
|
||||
```csharp
|
||||
public class PerformanceMetrics : IDisposable
|
||||
{
|
||||
private static readonly ILogger Logger = Log.ForContext<PerformanceMetrics>();
|
||||
private readonly ConcurrentDictionary<string, OperationMetrics> _metrics = new ConcurrentDictionary<string, OperationMetrics>();
|
||||
private readonly Timer _reportingTimer;
|
||||
private bool _disposed;
|
||||
|
||||
public PerformanceMetrics()
|
||||
{
|
||||
_reportingTimer = new Timer(ReportMetrics, null, TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(60));
|
||||
}
|
||||
|
||||
public void RecordOperation(string operationName, TimeSpan duration, bool success = true) { ... }
|
||||
public ITimingScope BeginOperation(string operationName) => new TimingScope(this, operationName);
|
||||
public OperationMetrics? GetMetrics(string operationName) { ... }
|
||||
public IReadOnlyDictionary<string, OperationMetrics> GetAllMetrics() { ... }
|
||||
public Dictionary<string, MetricsStatistics> GetStatistics() { ... }
|
||||
|
||||
private void ReportMetrics(object? state) { ... } // Log each operation's stats at Information level
|
||||
public void Dispose() { ... } // Dispose timer, call ReportMetrics one final time
|
||||
}
|
||||
```
|
||||
|
||||
`ReportMetrics` iterates `_metrics`, calls `GetStatistics()` on each, logs via Serilog structured logging with properties: `Operation`, `Count`, `SuccessRate`, `AverageMs`, `MinMs`, `MaxMs`, `P95Ms`.
|
||||
|
||||
### 1.5 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
|
||||
```
|
||||
|
||||
## Step 2: Create HealthCheckService
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs`
|
||||
|
||||
Namespace: `ZB.MOM.WW.LmxProxy.Host.Health`
|
||||
|
||||
### 2.1 Basic HealthCheckService
|
||||
|
||||
```csharp
|
||||
public class HealthCheckService : IHealthCheck
|
||||
{
|
||||
private static readonly ILogger Logger = Log.ForContext<HealthCheckService>();
|
||||
private readonly IScadaClient _scadaClient;
|
||||
private readonly SubscriptionManager _subscriptionManager;
|
||||
private readonly PerformanceMetrics _performanceMetrics;
|
||||
|
||||
public HealthCheckService(
|
||||
IScadaClient scadaClient,
|
||||
SubscriptionManager subscriptionManager,
|
||||
PerformanceMetrics performanceMetrics) { ... }
|
||||
|
||||
public Task<HealthCheckResult> CheckHealthAsync(
|
||||
HealthCheckContext context,
|
||||
CancellationToken cancellationToken = default) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
Dependencies imported:
|
||||
- `ZB.MOM.WW.LmxProxy.Host.Domain` for `IScadaClient`, `ConnectionState`
|
||||
- `ZB.MOM.WW.LmxProxy.Host.Services` for `SubscriptionManager` (if still in that namespace after Phase 2/3; adjust import to match actual location)
|
||||
- `ZB.MOM.WW.LmxProxy.Host.Metrics` for `PerformanceMetrics`
|
||||
- `Microsoft.Extensions.Diagnostics.HealthChecks` for `IHealthCheck`, `HealthCheckResult`, `HealthCheckContext`
|
||||
|
||||
`CheckHealthAsync` logic:
|
||||
1. Create `Dictionary<string, object> data`.
|
||||
2. Read `_scadaClient.IsConnected` and `_scadaClient.ConnectionState` into `data["scada_connected"]` and `data["scada_connection_state"]`.
|
||||
3. Get subscription stats via `_subscriptionManager.GetSubscriptionStats()` — store `TotalClients`, `TotalTags` in data.
|
||||
4. Iterate `_performanceMetrics.GetAllMetrics()` to compute `totalOperations` and `averageSuccessRate`.
|
||||
5. Store `total_operations` and `average_success_rate` in data.
|
||||
6. Decision tree:
|
||||
- If `!isConnected` → `HealthCheckResult.Unhealthy("SCADA client is not connected", data: data)`
|
||||
- If `averageSuccessRate < 0.5 && totalOperations > 100` → `HealthCheckResult.Degraded(...)`
|
||||
- If `subscriptionStats.TotalClients > 100` → `HealthCheckResult.Degraded(...)`
|
||||
- Otherwise → `HealthCheckResult.Healthy("LmxProxy is healthy", data)`
|
||||
7. Wrap everything in try/catch — on exception return `Unhealthy` with exception details.
|
||||
|
||||
### 2.2 DetailedHealthCheckService
|
||||
|
||||
In the same file or a separate file `src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs`:
|
||||
|
||||
```csharp
|
||||
public class DetailedHealthCheckService : IHealthCheck
|
||||
{
|
||||
private static readonly ILogger Logger = Log.ForContext<DetailedHealthCheckService>();
|
||||
private readonly IScadaClient _scadaClient;
|
||||
private readonly string _testTagAddress;
|
||||
|
||||
public DetailedHealthCheckService(IScadaClient scadaClient, string testTagAddress = "TestChildObject.TestBool") { ... }
|
||||
|
||||
public async Task<HealthCheckResult> CheckHealthAsync(
|
||||
HealthCheckContext context,
|
||||
CancellationToken cancellationToken = default) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
`CheckHealthAsync` logic:
|
||||
1. If `!_scadaClient.IsConnected` → return `Unhealthy`.
|
||||
2. Try `Vtq vtq = await _scadaClient.ReadAsync(_testTagAddress, cancellationToken)`.
|
||||
3. If `vtq.Quality != Quality.Good` → return `Degraded` with quality info.
|
||||
4. If `DateTime.UtcNow - vtq.Timestamp > TimeSpan.FromMinutes(5)` → return `Degraded` (stale data).
|
||||
5. Otherwise → `Healthy`.
|
||||
6. Catch read exceptions → return `Degraded("Could not read test tag")`.
|
||||
7. Catch all exceptions → return `Unhealthy`.
|
||||
|
||||
### 2.3 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
|
||||
```
|
||||
|
||||
## Step 3: Create StatusReportService
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs`
|
||||
|
||||
Namespace: `ZB.MOM.WW.LmxProxy.Host.Status`
|
||||
|
||||
### 3.1 Data model classes
|
||||
|
||||
Define in the same file (or a separate `StatusModels.cs` in the same folder):
|
||||
|
||||
```csharp
|
||||
public class StatusData
|
||||
{
|
||||
public DateTime Timestamp { get; set; }
|
||||
public string ServiceName { get; set; } = "";
|
||||
public string Version { get; set; } = "";
|
||||
public ConnectionStatus Connection { get; set; } = new ConnectionStatus();
|
||||
public SubscriptionStatus Subscriptions { get; set; } = new SubscriptionStatus();
|
||||
public PerformanceStatus Performance { get; set; } = new PerformanceStatus();
|
||||
public HealthInfo Health { get; set; } = new HealthInfo();
|
||||
public HealthInfo? DetailedHealth { get; set; }
|
||||
}
|
||||
|
||||
public class ConnectionStatus
|
||||
{
|
||||
public bool IsConnected { get; set; }
|
||||
public string State { get; set; } = "";
|
||||
public string NodeName { get; set; } = "";
|
||||
public string GalaxyName { get; set; } = "";
|
||||
}
|
||||
|
||||
public class SubscriptionStatus
|
||||
{
|
||||
public int TotalClients { get; set; }
|
||||
public int TotalTags { get; set; }
|
||||
public int ActiveSubscriptions { get; set; }
|
||||
}
|
||||
|
||||
public class PerformanceStatus
|
||||
{
|
||||
public long TotalOperations { get; set; }
|
||||
public double AverageSuccessRate { get; set; }
|
||||
public Dictionary<string, OperationStatus> Operations { get; set; } = new Dictionary<string, OperationStatus>();
|
||||
}
|
||||
|
||||
public class OperationStatus
|
||||
{
|
||||
public long TotalCount { get; set; }
|
||||
public double SuccessRate { get; set; }
|
||||
public double AverageMilliseconds { get; set; }
|
||||
public double MinMilliseconds { get; set; }
|
||||
public double MaxMilliseconds { get; set; }
|
||||
public double Percentile95Milliseconds { get; set; }
|
||||
}
|
||||
|
||||
public class HealthInfo
|
||||
{
|
||||
public string Status { get; set; } = "";
|
||||
public string Description { get; set; } = "";
|
||||
public Dictionary<string, string> Data { get; set; } = new Dictionary<string, string>();
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 StatusReportService
|
||||
|
||||
```csharp
|
||||
public class StatusReportService
|
||||
{
|
||||
private static readonly ILogger Logger = Log.ForContext<StatusReportService>();
|
||||
private readonly IScadaClient _scadaClient;
|
||||
private readonly SubscriptionManager _subscriptionManager;
|
||||
private readonly PerformanceMetrics _performanceMetrics;
|
||||
private readonly HealthCheckService _healthCheckService;
|
||||
private readonly DetailedHealthCheckService? _detailedHealthCheckService;
|
||||
|
||||
public StatusReportService(
|
||||
IScadaClient scadaClient,
|
||||
SubscriptionManager subscriptionManager,
|
||||
PerformanceMetrics performanceMetrics,
|
||||
HealthCheckService healthCheckService,
|
||||
DetailedHealthCheckService? detailedHealthCheckService = null) { ... }
|
||||
|
||||
public async Task<string> GenerateHtmlReportAsync() { ... }
|
||||
public async Task<string> GenerateJsonReportAsync() { ... }
|
||||
public async Task<bool> IsHealthyAsync() { ... }
|
||||
private async Task<StatusData> CollectStatusDataAsync() { ... }
|
||||
private static string GenerateHtmlFromStatusData(StatusData statusData) { ... }
|
||||
private static string GenerateErrorHtml(Exception ex) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
`CollectStatusDataAsync`:
|
||||
- Populate `StatusData.Timestamp = DateTime.UtcNow`, `ServiceName = "ZB.MOM.WW.LmxProxy.Host"`, `Version` from `Assembly.GetExecutingAssembly().GetName().Version`.
|
||||
- Connection info from `_scadaClient.IsConnected`, `_scadaClient.ConnectionState`.
|
||||
- Subscription stats from `_subscriptionManager.GetSubscriptionStats()`.
|
||||
- Performance stats from `_performanceMetrics.GetStatistics()` — include P95 in the `OperationStatus`.
|
||||
- Health from `_healthCheckService.CheckHealthAsync(new HealthCheckContext())`.
|
||||
- Detailed health from `_detailedHealthCheckService?.CheckHealthAsync(new HealthCheckContext())` if not null.
|
||||
|
||||
`GenerateJsonReportAsync`:
|
||||
- Use `System.Text.Json.JsonSerializer.Serialize(statusData, new JsonSerializerOptions { WriteIndented = true, PropertyNamingPolicy = JsonNamingPolicy.CamelCase })`.
|
||||
|
||||
`GenerateHtmlFromStatusData`:
|
||||
- Use `StringBuilder` to generate self-contained HTML.
|
||||
- Include inline CSS (Bootstrap-like grid, status cards with color-coded left borders).
|
||||
- Color coding: green (#28a745) for Healthy/Connected, yellow (#ffc107) for Degraded, red (#dc3545) for Unhealthy/Disconnected.
|
||||
- Operations table with columns: Operation, Count, Success Rate, Avg (ms), Min (ms), Max (ms), P95 (ms).
|
||||
- `<meta http-equiv="refresh" content="30">` for auto-refresh.
|
||||
- Last updated timestamp at the bottom.
|
||||
|
||||
`IsHealthyAsync`:
|
||||
- Run basic health check, return `result.Status == HealthStatus.Healthy`.
|
||||
|
||||
### 3.3 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
|
||||
```
|
||||
|
||||
## Step 4: Create StatusWebServer
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs`
|
||||
|
||||
Namespace: `ZB.MOM.WW.LmxProxy.Host.Status`
|
||||
|
||||
```csharp
|
||||
public class StatusWebServer : IDisposable
|
||||
{
|
||||
private static readonly ILogger Logger = Log.ForContext<StatusWebServer>();
|
||||
private readonly WebServerConfiguration _configuration;
|
||||
private readonly StatusReportService _statusReportService;
|
||||
private HttpListener? _httpListener;
|
||||
private CancellationTokenSource? _cancellationTokenSource;
|
||||
private Task? _listenerTask;
|
||||
private bool _disposed;
|
||||
|
||||
public StatusWebServer(WebServerConfiguration configuration, StatusReportService statusReportService) { ... }
|
||||
|
||||
public bool Start() { ... }
|
||||
public bool Stop() { ... }
|
||||
public void Dispose() { ... }
|
||||
|
||||
private async Task HandleRequestsAsync(CancellationToken cancellationToken) { ... }
|
||||
private async Task HandleRequestAsync(HttpListenerContext context) { ... }
|
||||
private async Task HandleStatusPageAsync(HttpListenerResponse response) { ... }
|
||||
private async Task HandleStatusApiAsync(HttpListenerResponse response) { ... }
|
||||
private async Task HandleHealthApiAsync(HttpListenerResponse response) { ... }
|
||||
private static async Task WriteResponseAsync(HttpListenerResponse response, string content, string contentType) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### 4.1 Start()
|
||||
|
||||
1. If `!_configuration.Enabled`, log info and return `true`.
|
||||
2. Create `HttpListener`, add prefix `_configuration.Prefix ?? $"http://+:{_configuration.Port}/"` (ensure trailing `/`).
|
||||
3. Call `_httpListener.Start()`.
|
||||
4. Create `_cancellationTokenSource = new CancellationTokenSource()`.
|
||||
5. Start `_listenerTask = Task.Run(() => HandleRequestsAsync(_cancellationTokenSource.Token))`.
|
||||
6. On exception, log error and return `false`.
|
||||
|
||||
### 4.2 Stop()
|
||||
|
||||
1. If not enabled or listener is null, return `true`.
|
||||
2. Cancel `_cancellationTokenSource`.
|
||||
3. Wait for `_listenerTask` with 5-second timeout.
|
||||
4. Stop and close `_httpListener`.
|
||||
|
||||
### 4.3 HandleRequestsAsync
|
||||
|
||||
- Loop while not cancelled and listener is listening.
|
||||
- `await _httpListener.GetContextAsync()` — on success, spawn `Task.Run` to handle.
|
||||
- Catch `ObjectDisposedException` and `HttpListenerException(995)` as expected shutdown signals.
|
||||
- On other errors, log and delay 1 second before continuing.
|
||||
|
||||
### 4.4 HandleRequestAsync routing
|
||||
|
||||
| Path (lowered) | Handler |
|
||||
|---|---|
|
||||
| `/` | `HandleStatusPageAsync` — calls `_statusReportService.GenerateHtmlReportAsync()`, content type `text/html; charset=utf-8` |
|
||||
| `/api/status` | `HandleStatusApiAsync` — calls `_statusReportService.GenerateJsonReportAsync()`, content type `application/json; charset=utf-8` |
|
||||
| `/api/health` | `HandleHealthApiAsync` — calls `_statusReportService.IsHealthyAsync()`, returns `"OK"` (200) or `"UNHEALTHY"` (503) as `text/plain` |
|
||||
| Non-GET method | Return 405 Method Not Allowed |
|
||||
| Unknown path | Return 404 Not Found |
|
||||
| Exception | Return 500 Internal Server Error |
|
||||
|
||||
### 4.5 WriteResponseAsync
|
||||
|
||||
- Set `Content-Type`, add `Cache-Control: no-cache, no-store, must-revalidate`, `Pragma: no-cache`, `Expires: 0`.
|
||||
- Convert content to UTF-8 bytes, set `ContentLength64`, write to `response.OutputStream`.
|
||||
|
||||
### 4.6 Dispose
|
||||
|
||||
- Guard with `_disposed` flag. Call `Stop()`. Dispose `_cancellationTokenSource` and close `_httpListener`.
|
||||
|
||||
### 4.7 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
|
||||
```
|
||||
|
||||
## Step 5: Wire into LmxProxyService
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs`
|
||||
|
||||
This file already exists. Modify the `Start()` method to create and wire the new components. The v2 rebuild should create these fresh, but the wiring pattern follows the same order as the reference.
|
||||
|
||||
### 5.1 Add using directives
|
||||
|
||||
```csharp
|
||||
using ZB.MOM.WW.LmxProxy.Host.Health;
|
||||
using ZB.MOM.WW.LmxProxy.Host.Metrics;
|
||||
using ZB.MOM.WW.LmxProxy.Host.Status;
|
||||
```
|
||||
|
||||
### 5.2 Add fields
|
||||
|
||||
```csharp
|
||||
private PerformanceMetrics? _performanceMetrics;
|
||||
private HealthCheckService? _healthCheckService;
|
||||
private DetailedHealthCheckService? _detailedHealthCheckService;
|
||||
private StatusReportService? _statusReportService;
|
||||
private StatusWebServer? _statusWebServer;
|
||||
```
|
||||
|
||||
### 5.3 In Start(), after SessionManager and SubscriptionManager creation
|
||||
|
||||
```csharp
|
||||
// Create performance metrics
|
||||
_performanceMetrics = new PerformanceMetrics();
|
||||
|
||||
// Create health check services
|
||||
_healthCheckService = new HealthCheckService(_scadaClient, _subscriptionManager, _performanceMetrics);
|
||||
_detailedHealthCheckService = new DetailedHealthCheckService(_scadaClient);
|
||||
|
||||
// Create status report service
|
||||
_statusReportService = new StatusReportService(
|
||||
_scadaClient, _subscriptionManager, _performanceMetrics,
|
||||
_healthCheckService, _detailedHealthCheckService);
|
||||
|
||||
// Start status web server
|
||||
_statusWebServer = new StatusWebServer(_configuration.WebServer, _statusReportService);
|
||||
if (!_statusWebServer.Start())
|
||||
{
|
||||
Logger.Warning("Status web server failed to start — continuing without it");
|
||||
}
|
||||
```
|
||||
|
||||
### 5.4 In Stop(), before gRPC server shutdown
|
||||
|
||||
```csharp
|
||||
// Stop status web server
|
||||
_statusWebServer?.Stop();
|
||||
|
||||
// Dispose performance metrics
|
||||
_performanceMetrics?.Dispose();
|
||||
```
|
||||
|
||||
### 5.5 Pass _performanceMetrics to ScadaGrpcService constructor
|
||||
|
||||
Ensure `ScadaGrpcService` receives `_performanceMetrics` so it can record timings on each RPC call. The gRPC service should call `_performanceMetrics.BeginOperation("Read")` (etc.) and dispose the timing scope at the end of each RPC handler.
|
||||
|
||||
### 5.6 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Host --platform x86"
|
||||
```
|
||||
|
||||
## Step 6: Unit Tests
|
||||
|
||||
**Project**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/`
|
||||
|
||||
If this project does not exist yet, create it:
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet new xunit -n ZB.MOM.WW.LmxProxy.Host.Tests -o tests/ZB.MOM.WW.LmxProxy.Host.Tests --framework net48"
|
||||
```
|
||||
|
||||
**Csproj adjustments** for `tests/ZB.MOM.WW.LmxProxy.Host.Tests/ZB.MOM.WW.LmxProxy.Host.Tests.csproj`:
|
||||
- `<TargetFramework>net48</TargetFramework>`
|
||||
- `<PlatformTarget>x86</PlatformTarget>`
|
||||
- `<LangVersion>9.0</LangVersion>`
|
||||
- Add `<ProjectReference Include="..\..\src\ZB.MOM.WW.LmxProxy.Host\ZB.MOM.WW.LmxProxy.Host.csproj" />`
|
||||
- Add `<PackageReference Include="xunit" Version="2.9.3" />`
|
||||
- Add `<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2" />`
|
||||
- Add `<PackageReference Include="NSubstitute" Version="5.3.0" />` (for mocking IScadaClient)
|
||||
- Add `<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />`
|
||||
|
||||
**Also add to solution** in `ZB.MOM.WW.LmxProxy.slnx`:
|
||||
```xml
|
||||
<Folder Name="/tests/">
|
||||
<Project Path="tests/ZB.MOM.WW.LmxProxy.Host.Tests/ZB.MOM.WW.LmxProxy.Host.Tests.csproj" />
|
||||
</Folder>
|
||||
```
|
||||
|
||||
### 6.1 PerformanceMetrics Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/Metrics/PerformanceMetricsTests.cs`
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Host.Tests.Metrics
|
||||
{
|
||||
public class PerformanceMetricsTests
|
||||
{
|
||||
[Fact]
|
||||
public void RecordOperation_TracksCountAndDuration()
|
||||
// Record 5 operations, verify GetStatistics returns TotalCount=5
|
||||
|
||||
[Fact]
|
||||
public void RecordOperation_TracksSuccessAndFailure()
|
||||
// Record 3 success + 2 failure, verify SuccessRate == 0.6
|
||||
|
||||
[Fact]
|
||||
public void GetStatistics_CalculatesP95Correctly()
|
||||
// Record 100 operations with known durations (1ms through 100ms)
|
||||
// Verify P95 is approximately 95ms
|
||||
|
||||
[Fact]
|
||||
public void RollingBuffer_CapsAt1000Samples()
|
||||
// Record 1500 operations, verify _durations list doesn't exceed 1000
|
||||
// (test via GetStatistics behavior — TotalCount is 1500 but percentile computed from 1000)
|
||||
|
||||
[Fact]
|
||||
public void BeginOperation_RecordsDurationOnDispose()
|
||||
// Use BeginOperation, await Task.Delay(50), dispose scope
|
||||
// Verify recorded duration >= 50ms
|
||||
|
||||
[Fact]
|
||||
public void TimingScope_DefaultsToSuccess()
|
||||
// BeginOperation + dispose without calling SetSuccess
|
||||
// Verify SuccessCount == 1
|
||||
|
||||
[Fact]
|
||||
public void TimingScope_RespectsSetSuccessFalse()
|
||||
// BeginOperation, SetSuccess(false), dispose
|
||||
// Verify SuccessCount == 0, TotalCount == 1
|
||||
|
||||
[Fact]
|
||||
public void GetMetrics_ReturnsNullForUnknownOperation()
|
||||
|
||||
[Fact]
|
||||
public void GetAllMetrics_ReturnsAllTrackedOperations()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 HealthCheckService Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/Health/HealthCheckServiceTests.cs`
|
||||
|
||||
Use NSubstitute to mock `IScadaClient`. Create a real `PerformanceMetrics` instance and a real or mock `SubscriptionManager` (depends on Phase 2/3 implementation — if `SubscriptionManager` has an interface, mock it; if not, use the `GetSubscriptionStats()` approach with a concrete instance).
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Host.Tests.Health
|
||||
{
|
||||
public class HealthCheckServiceTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task ReturnsHealthy_WhenConnectedAndNormalMetrics()
|
||||
// Mock: IsConnected=true, ConnectionState=Connected
|
||||
// SubscriptionStats: TotalClients=5, TotalTags=10
|
||||
// PerformanceMetrics: record some successes
|
||||
// Assert: HealthStatus.Healthy
|
||||
|
||||
[Fact]
|
||||
public async Task ReturnsUnhealthy_WhenNotConnected()
|
||||
// Mock: IsConnected=false
|
||||
// Assert: HealthStatus.Unhealthy, description contains "not connected"
|
||||
|
||||
[Fact]
|
||||
public async Task ReturnsDegraded_WhenSuccessRateBelow50Percent()
|
||||
// Mock: IsConnected=true
|
||||
// Record 200 operations with 40% success rate
|
||||
// Assert: HealthStatus.Degraded
|
||||
|
||||
[Fact]
|
||||
public async Task ReturnsDegraded_WhenClientCountOver100()
|
||||
// Mock: IsConnected=true, SubscriptionStats.TotalClients=150
|
||||
// Assert: HealthStatus.Degraded
|
||||
|
||||
[Fact]
|
||||
public async Task DoesNotFlagLowSuccessRate_Under100Operations()
|
||||
// Record 50 operations with 0% success rate
|
||||
// Assert: still Healthy (threshold is > 100 total ops)
|
||||
}
|
||||
|
||||
public class DetailedHealthCheckServiceTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task ReturnsUnhealthy_WhenNotConnected()
|
||||
|
||||
[Fact]
|
||||
public async Task ReturnsHealthy_WhenTestTagGoodAndRecent()
|
||||
// Mock ReadAsync returns Good quality with recent timestamp
|
||||
// Assert: Healthy
|
||||
|
||||
[Fact]
|
||||
public async Task ReturnsDegraded_WhenTestTagQualityNotGood()
|
||||
// Mock ReadAsync returns Uncertain quality
|
||||
// Assert: Degraded
|
||||
|
||||
[Fact]
|
||||
public async Task ReturnsDegraded_WhenTestTagTimestampStale()
|
||||
// Mock ReadAsync returns Good quality but timestamp 10 minutes ago
|
||||
// Assert: Degraded
|
||||
|
||||
[Fact]
|
||||
public async Task ReturnsDegraded_WhenTestTagReadThrows()
|
||||
// Mock ReadAsync throws exception
|
||||
// Assert: Degraded
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 StatusReportService Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Host.Tests/Status/StatusReportServiceTests.cs`
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Host.Tests.Status
|
||||
{
|
||||
public class StatusReportServiceTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task GenerateJsonReportAsync_ReturnsCamelCaseJson()
|
||||
// Verify JSON contains "serviceName", "connection", "isConnected" (camelCase)
|
||||
|
||||
[Fact]
|
||||
public async Task GenerateHtmlReportAsync_ContainsAutoRefresh()
|
||||
// Verify HTML contains <meta http-equiv="refresh" content="30">
|
||||
|
||||
[Fact]
|
||||
public async Task IsHealthyAsync_ReturnsTrueWhenHealthy()
|
||||
|
||||
[Fact]
|
||||
public async Task IsHealthyAsync_ReturnsFalseWhenUnhealthy()
|
||||
|
||||
[Fact]
|
||||
public async Task GenerateJsonReportAsync_IncludesPerformanceMetrics()
|
||||
// Record some operations, verify JSON includes operation names and stats
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.4 Run tests
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet test tests/ZB.MOM.WW.LmxProxy.Host.Tests --platform x86 --verbosity normal"
|
||||
```
|
||||
|
||||
## Step 7: Build Verification
|
||||
|
||||
Run full solution build and tests:
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx && dotnet test --verbosity normal"
|
||||
```
|
||||
|
||||
If the test project is .NET 4.8 x86, you may need:
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx --platform x86 && dotnet test tests/ZB.MOM.WW.LmxProxy.Host.Tests --platform x86"
|
||||
```
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
- [ ] `PerformanceMetrics` class with `OperationMetrics`, `MetricsStatistics`, `ITimingScope` in `src/ZB.MOM.WW.LmxProxy.Host/Metrics/`
|
||||
- [ ] `HealthCheckService` and `DetailedHealthCheckService` in `src/ZB.MOM.WW.LmxProxy.Host/Health/`
|
||||
- [ ] `StatusReportService` with data model classes in `src/ZB.MOM.WW.LmxProxy.Host/Status/`
|
||||
- [ ] `StatusWebServer` with HTML dashboard, JSON status, and health endpoints in `src/ZB.MOM.WW.LmxProxy.Host/Status/`
|
||||
- [ ] All components wired into `LmxProxyService.Start()` / `Stop()`
|
||||
- [ ] `ScadaGrpcService` uses `PerformanceMetrics.BeginOperation()` for Read, ReadBatch, Write, WriteBatch RPCs
|
||||
- [ ] Unit tests for PerformanceMetrics (recording, percentile, rolling buffer, timing scope)
|
||||
- [ ] Unit tests for HealthCheckService (healthy, unhealthy, degraded transitions)
|
||||
- [ ] Unit tests for DetailedHealthCheckService (connected, quality, staleness)
|
||||
- [ ] Unit tests for StatusReportService (JSON format, HTML format, health aggregation)
|
||||
- [ ] Solution builds without errors: `dotnet build ZB.MOM.WW.LmxProxy.slnx`
|
||||
- [ ] All tests pass: `dotnet test`
|
||||
852
deprecated/lmxproxy/docs/plans/phase-5-client-core.md
Normal file
852
deprecated/lmxproxy/docs/plans/phase-5-client-core.md
Normal file
@@ -0,0 +1,852 @@
|
||||
# Phase 5: Client Core — Implementation Plan
|
||||
|
||||
**Date**: 2026-03-21
|
||||
**Prerequisites**: Phase 1 complete and passing (Protocol & Domain Types — `ScadaContracts.cs` with v2 `TypedValue`/`QualityCode` messages, `Quality.cs`, `QualityExtensions.cs`, `Vtq.cs`, `ConnectionState.cs` all exist and cross-stack serialization tests pass)
|
||||
**Working Directory**: The lmxproxy repo is on windev at `C:\src\lmxproxy`
|
||||
|
||||
## Guardrails
|
||||
|
||||
1. **Client targets .NET 10, AnyCPU** — use latest C# features freely. The csproj `<TargetFramework>` is `net10.0`, `<LangVersion>latest</LangVersion>`.
|
||||
2. **Code-first gRPC only** — the Client uses `protobuf-net.Grpc` with `[ServiceContract]`/`[DataContract]` attributes. Never reference proto files or `Grpc.Tools`.
|
||||
3. **No string serialization heuristics** — v2 uses native `TypedValue`. Do not write `double.TryParse`, `bool.TryParse`, or any string-to-value parsing on tag values.
|
||||
4. **`status_code` is canonical for quality** — `symbolic_name` is derived. Never set `symbolic_name` independently.
|
||||
5. **Polly v8 API** — the Client csproj already has `<PackageReference Include="Polly" Version="8.5.2" />`. Use the v8 `ResiliencePipeline` API, not the legacy v7 `IAsyncPolicy` API.
|
||||
6. **No new NuGet packages** — all needed packages are already in `src/ZB.MOM.WW.LmxProxy.Client/ZB.MOM.WW.LmxProxy.Client.csproj`.
|
||||
7. **Build command**: `dotnet build src/ZB.MOM.WW.LmxProxy.Client`
|
||||
8. **Test command**: `dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests`
|
||||
9. **Namespace root**: `ZB.MOM.WW.LmxProxy.Client`
|
||||
|
||||
## Step 1: ClientTlsConfiguration
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ClientTlsConfiguration.cs`
|
||||
|
||||
This file already exists with the correct shape. Verify it has all these properties (from Component-Client.md):
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client;
|
||||
|
||||
public class ClientTlsConfiguration
|
||||
{
|
||||
public bool UseTls { get; set; } = false;
|
||||
public string? ClientCertificatePath { get; set; }
|
||||
public string? ClientKeyPath { get; set; }
|
||||
public string? ServerCaCertificatePath { get; set; }
|
||||
public string? ServerNameOverride { get; set; }
|
||||
public bool ValidateServerCertificate { get; set; } = true;
|
||||
public bool AllowSelfSignedCertificates { get; set; } = false;
|
||||
public bool IgnoreAllCertificateErrors { get; set; } = false;
|
||||
}
|
||||
```
|
||||
|
||||
If it matches, no changes needed. If any properties are missing, add them.
|
||||
|
||||
## Step 2: Security/GrpcChannelFactory
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/Security/GrpcChannelFactory.cs`
|
||||
|
||||
This file already exists. Verify the implementation covers:
|
||||
|
||||
1. `CreateChannel(Uri address, ClientTlsConfiguration? tlsConfiguration, ILogger logger)` — returns `GrpcChannel`.
|
||||
2. Creates `SocketsHttpHandler` with `EnableMultipleHttp2Connections = true`.
|
||||
3. For TLS: sets `SslProtocols = Tls12 | Tls13`, configures `ServerNameOverride` as `TargetHost`, loads client certificate from PEM files for mTLS.
|
||||
4. Certificate validation callback handles: `IgnoreAllCertificateErrors`, `!ValidateServerCertificate`, custom CA trust store via `ServerCaCertificatePath`, `AllowSelfSignedCertificates`.
|
||||
5. Static constructor sets `System.Net.Http.SocketsHttpHandler.Http2UnencryptedSupport = true` for non-TLS.
|
||||
|
||||
The existing implementation matches. No changes expected unless Phase 1 introduced breaking changes.
|
||||
|
||||
## Step 3: ILmxProxyClient Interface
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ILmxProxyClient.cs`
|
||||
|
||||
Rewrite for v2 protocol. The key changes from v1:
|
||||
- `WriteAsync` and `WriteBatchAsync` accept `TypedValue` instead of `object`
|
||||
- `SubscribeAsync` has an `onStreamError` callback parameter
|
||||
- `CheckApiKeyAsync` is added
|
||||
- Return types use v2 domain `Vtq` (which wraps `TypedValue` + `QualityCode`)
|
||||
|
||||
```csharp
|
||||
using ZB.MOM.WW.LmxProxy.Client.Domain;
|
||||
|
||||
namespace ZB.MOM.WW.LmxProxy.Client;
|
||||
|
||||
/// <summary>
|
||||
/// Interface for LmxProxy client operations.
|
||||
/// </summary>
|
||||
public interface ILmxProxyClient : IDisposable, IAsyncDisposable
|
||||
{
|
||||
/// <summary>Gets or sets the default timeout for operations (range: 1s to 10min).</summary>
|
||||
TimeSpan DefaultTimeout { get; set; }
|
||||
|
||||
/// <summary>Connects to the LmxProxy service and establishes a session.</summary>
|
||||
Task ConnectAsync(CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Disconnects from the LmxProxy service.</summary>
|
||||
Task DisconnectAsync();
|
||||
|
||||
/// <summary>Returns true if the client has an active session.</summary>
|
||||
Task<bool> IsConnectedAsync();
|
||||
|
||||
/// <summary>Reads a single tag value.</summary>
|
||||
Task<Vtq> ReadAsync(string address, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Reads multiple tag values in a single batch.</summary>
|
||||
Task<IDictionary<string, Vtq>> ReadBatchAsync(IEnumerable<string> addresses, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Writes a single tag value (native TypedValue — no string heuristics).</summary>
|
||||
Task WriteAsync(string address, TypedValue value, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Writes multiple tag values in a single batch.</summary>
|
||||
Task WriteBatchAsync(IDictionary<string, TypedValue> values, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Writes a batch of values, then polls a flag tag until it matches or timeout expires.
|
||||
/// Returns (writeResults, flagReached, elapsedMs).
|
||||
/// </summary>
|
||||
Task<WriteBatchAndWaitResponse> WriteBatchAndWaitAsync(
|
||||
IDictionary<string, TypedValue> values,
|
||||
string flagTag,
|
||||
TypedValue flagValue,
|
||||
int timeoutMs = 5000,
|
||||
int pollIntervalMs = 100,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Subscribes to tag updates with value and error callbacks.</summary>
|
||||
Task<ISubscription> SubscribeAsync(
|
||||
IEnumerable<string> addresses,
|
||||
Action<string, Vtq> onUpdate,
|
||||
Action<Exception>? onStreamError = null,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Validates an API key and returns info.</summary>
|
||||
Task<ApiKeyInfo> CheckApiKeyAsync(string apiKey, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Returns a snapshot of client-side metrics.</summary>
|
||||
Dictionary<string, object> GetMetrics();
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: The `TypedValue` class referenced here is from `Domain/ScadaContracts.cs` — it should already have been updated in Phase 1 to use `[DataContract]` with the v2 oneof-style properties (e.g., `BoolValue`, `Int32Value`, `DoubleValue`, `StringValue`, `DatetimeValue`, etc., with a `ValueCase` enum or similar discriminator).
|
||||
|
||||
## Step 4: LmxProxyClient — Main File
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.cs`
|
||||
|
||||
This is a partial class. The main file contains the constructor, fields, properties, and the Read/Write/WriteBatch/WriteBatchAndWait/CheckApiKey methods.
|
||||
|
||||
### 4.1 Fields and Constructor
|
||||
|
||||
```csharp
|
||||
public partial class LmxProxyClient : ILmxProxyClient
|
||||
{
|
||||
private readonly ILogger<LmxProxyClient> _logger;
|
||||
private readonly string _host;
|
||||
private readonly int _port;
|
||||
private readonly string? _apiKey;
|
||||
private readonly ClientTlsConfiguration? _tlsConfiguration;
|
||||
private readonly ClientMetrics _metrics = new();
|
||||
private readonly SemaphoreSlim _connectionLock = new(1, 1);
|
||||
private readonly List<ISubscription> _activeSubscriptions = [];
|
||||
private readonly Lock _subscriptionLock = new();
|
||||
|
||||
private GrpcChannel? _channel;
|
||||
private IScadaService? _client;
|
||||
private string _sessionId = string.Empty;
|
||||
private bool _disposed;
|
||||
private bool _isConnected;
|
||||
private TimeSpan _defaultTimeout = TimeSpan.FromSeconds(30);
|
||||
private ClientConfiguration? _configuration;
|
||||
private ResiliencePipeline? _resiliencePipeline; // Polly v8
|
||||
private Timer? _keepAliveTimer;
|
||||
private readonly TimeSpan _keepAliveInterval = TimeSpan.FromSeconds(30);
|
||||
|
||||
// IsConnected computed property
|
||||
public bool IsConnected => !_disposed && _isConnected && !string.IsNullOrEmpty(_sessionId);
|
||||
|
||||
public LmxProxyClient(
|
||||
string host, int port, string? apiKey,
|
||||
ClientTlsConfiguration? tlsConfiguration,
|
||||
ILogger<LmxProxyClient>? logger = null)
|
||||
{
|
||||
_host = host ?? throw new ArgumentNullException(nameof(host));
|
||||
_port = port;
|
||||
_apiKey = apiKey;
|
||||
_tlsConfiguration = tlsConfiguration;
|
||||
_logger = logger ?? NullLogger<LmxProxyClient>.Instance;
|
||||
}
|
||||
|
||||
internal void SetBuilderConfiguration(ClientConfiguration config)
|
||||
{
|
||||
_configuration = config;
|
||||
// Build Polly v8 ResiliencePipeline from config
|
||||
if (config.MaxRetryAttempts > 0)
|
||||
{
|
||||
_resiliencePipeline = new ResiliencePipelineBuilder()
|
||||
.AddRetry(new RetryStrategyOptions
|
||||
{
|
||||
MaxRetryAttempts = config.MaxRetryAttempts,
|
||||
Delay = config.RetryDelay,
|
||||
BackoffType = DelayBackoffType.Exponential,
|
||||
ShouldHandle = new PredicateBuilder()
|
||||
.Handle<RpcException>(ex =>
|
||||
ex.StatusCode == StatusCode.Unavailable ||
|
||||
ex.StatusCode == StatusCode.DeadlineExceeded ||
|
||||
ex.StatusCode == StatusCode.ResourceExhausted ||
|
||||
ex.StatusCode == StatusCode.Aborted),
|
||||
OnRetry = args =>
|
||||
{
|
||||
_logger.LogWarning("Retry {Attempt} after {Delay} for {Exception}",
|
||||
args.AttemptNumber, args.RetryDelay, args.Outcome.Exception?.Message);
|
||||
return ValueTask.CompletedTask;
|
||||
}
|
||||
})
|
||||
.Build();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 ReadAsync
|
||||
|
||||
```csharp
|
||||
public async Task<Vtq> ReadAsync(string address, CancellationToken cancellationToken = default)
|
||||
{
|
||||
EnsureConnected();
|
||||
_metrics.IncrementOperationCount("Read");
|
||||
var sw = Stopwatch.StartNew();
|
||||
try
|
||||
{
|
||||
var request = new ReadRequest { SessionId = _sessionId, Tag = address };
|
||||
ReadResponse response = await ExecuteWithRetry(
|
||||
() => _client!.ReadAsync(request).AsTask(), cancellationToken);
|
||||
if (!response.Success)
|
||||
throw new InvalidOperationException($"Read failed: {response.Message}");
|
||||
return ConvertVtqMessage(response.Vtq);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_metrics.IncrementErrorCount("Read");
|
||||
throw;
|
||||
}
|
||||
finally
|
||||
{
|
||||
sw.Stop();
|
||||
_metrics.RecordLatency("Read", sw.ElapsedMilliseconds);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 ReadBatchAsync
|
||||
|
||||
```csharp
|
||||
public async Task<IDictionary<string, Vtq>> ReadBatchAsync(
|
||||
IEnumerable<string> addresses, CancellationToken cancellationToken = default)
|
||||
{
|
||||
EnsureConnected();
|
||||
_metrics.IncrementOperationCount("ReadBatch");
|
||||
var sw = Stopwatch.StartNew();
|
||||
try
|
||||
{
|
||||
var request = new ReadBatchRequest { SessionId = _sessionId, Tags = addresses.ToList() };
|
||||
ReadBatchResponse response = await ExecuteWithRetry(
|
||||
() => _client!.ReadBatchAsync(request).AsTask(), cancellationToken);
|
||||
var result = new Dictionary<string, Vtq>();
|
||||
foreach (var vtqMsg in response.Vtqs)
|
||||
{
|
||||
result[vtqMsg.Tag] = ConvertVtqMessage(vtqMsg);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
catch
|
||||
{
|
||||
_metrics.IncrementErrorCount("ReadBatch");
|
||||
throw;
|
||||
}
|
||||
finally
|
||||
{
|
||||
sw.Stop();
|
||||
_metrics.RecordLatency("ReadBatch", sw.ElapsedMilliseconds);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 WriteAsync
|
||||
|
||||
```csharp
|
||||
public async Task WriteAsync(string address, TypedValue value, CancellationToken cancellationToken = default)
|
||||
{
|
||||
EnsureConnected();
|
||||
_metrics.IncrementOperationCount("Write");
|
||||
var sw = Stopwatch.StartNew();
|
||||
try
|
||||
{
|
||||
var request = new WriteRequest { SessionId = _sessionId, Tag = address, Value = value };
|
||||
WriteResponse response = await ExecuteWithRetry(
|
||||
() => _client!.WriteAsync(request).AsTask(), cancellationToken);
|
||||
if (!response.Success)
|
||||
throw new InvalidOperationException($"Write failed: {response.Message}");
|
||||
}
|
||||
catch
|
||||
{
|
||||
_metrics.IncrementErrorCount("Write");
|
||||
throw;
|
||||
}
|
||||
finally
|
||||
{
|
||||
sw.Stop();
|
||||
_metrics.RecordLatency("Write", sw.ElapsedMilliseconds);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.5 WriteBatchAsync
|
||||
|
||||
```csharp
|
||||
public async Task WriteBatchAsync(IDictionary<string, TypedValue> values, CancellationToken cancellationToken = default)
|
||||
{
|
||||
EnsureConnected();
|
||||
_metrics.IncrementOperationCount("WriteBatch");
|
||||
var sw = Stopwatch.StartNew();
|
||||
try
|
||||
{
|
||||
var request = new WriteBatchRequest
|
||||
{
|
||||
SessionId = _sessionId,
|
||||
Items = values.Select(kv => new WriteItem { Tag = kv.Key, Value = kv.Value }).ToList()
|
||||
};
|
||||
WriteBatchResponse response = await ExecuteWithRetry(
|
||||
() => _client!.WriteBatchAsync(request).AsTask(), cancellationToken);
|
||||
if (!response.Success)
|
||||
throw new InvalidOperationException($"WriteBatch failed: {response.Message}");
|
||||
}
|
||||
catch
|
||||
{
|
||||
_metrics.IncrementErrorCount("WriteBatch");
|
||||
throw;
|
||||
}
|
||||
finally
|
||||
{
|
||||
sw.Stop();
|
||||
_metrics.RecordLatency("WriteBatch", sw.ElapsedMilliseconds);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.6 WriteBatchAndWaitAsync
|
||||
|
||||
```csharp
|
||||
public async Task<WriteBatchAndWaitResponse> WriteBatchAndWaitAsync(
|
||||
IDictionary<string, TypedValue> values, string flagTag, TypedValue flagValue,
|
||||
int timeoutMs = 5000, int pollIntervalMs = 100, CancellationToken cancellationToken = default)
|
||||
{
|
||||
EnsureConnected();
|
||||
var request = new WriteBatchAndWaitRequest
|
||||
{
|
||||
SessionId = _sessionId,
|
||||
Items = values.Select(kv => new WriteItem { Tag = kv.Key, Value = kv.Value }).ToList(),
|
||||
FlagTag = flagTag,
|
||||
FlagValue = flagValue,
|
||||
TimeoutMs = timeoutMs,
|
||||
PollIntervalMs = pollIntervalMs
|
||||
};
|
||||
return await ExecuteWithRetry(
|
||||
() => _client!.WriteBatchAndWaitAsync(request).AsTask(), cancellationToken);
|
||||
}
|
||||
```
|
||||
|
||||
### 4.7 CheckApiKeyAsync
|
||||
|
||||
```csharp
|
||||
public async Task<ApiKeyInfo> CheckApiKeyAsync(string apiKey, CancellationToken cancellationToken = default)
|
||||
{
|
||||
EnsureConnected();
|
||||
var request = new CheckApiKeyRequest { ApiKey = apiKey };
|
||||
CheckApiKeyResponse response = await _client!.CheckApiKeyAsync(request);
|
||||
return new ApiKeyInfo { IsValid = response.IsValid, Description = response.Message };
|
||||
}
|
||||
```
|
||||
|
||||
### 4.8 ConvertVtqMessage helper
|
||||
|
||||
This converts the wire `VtqMessage` (v2 with `TypedValue` + `QualityCode`) to the domain `Vtq`:
|
||||
|
||||
```csharp
|
||||
private static Vtq ConvertVtqMessage(VtqMessage? msg)
|
||||
{
|
||||
if (msg is null)
|
||||
return new Vtq(null, DateTime.UtcNow, Quality.Bad);
|
||||
|
||||
object? value = ExtractTypedValue(msg.Value);
|
||||
DateTime timestamp = msg.TimestampUtcTicks > 0
|
||||
? new DateTime(msg.TimestampUtcTicks, DateTimeKind.Utc)
|
||||
: DateTime.UtcNow;
|
||||
Quality quality = QualityExtensions.FromStatusCode(msg.Quality?.StatusCode ?? 0x80000000u);
|
||||
return new Vtq(value, timestamp, quality);
|
||||
}
|
||||
|
||||
private static object? ExtractTypedValue(TypedValue? tv)
|
||||
{
|
||||
if (tv is null) return null;
|
||||
// Switch on whichever oneof-style property is set
|
||||
// The exact property names depend on the Phase 1 code-first contract design
|
||||
// e.g., tv.BoolValue, tv.Int32Value, tv.DoubleValue, tv.StringValue, etc.
|
||||
// Return the native .NET value directly — no string conversions
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Important**: The exact shape of `TypedValue` in code-first contracts depends on Phase 1's implementation. Phase 1 should have defined a discriminator pattern (e.g., `ValueCase` enum or nullable properties with a convention). Adapt `ExtractTypedValue` to whatever pattern was chosen. The key rule: **no string heuristics**.
|
||||
|
||||
### 4.9 ExecuteWithRetry helper
|
||||
|
||||
```csharp
|
||||
private async Task<T> ExecuteWithRetry<T>(Func<Task<T>> operation, CancellationToken ct)
|
||||
{
|
||||
if (_resiliencePipeline is not null)
|
||||
{
|
||||
return await _resiliencePipeline.ExecuteAsync(
|
||||
async token => await operation(), ct);
|
||||
}
|
||||
return await operation();
|
||||
}
|
||||
```
|
||||
|
||||
### 4.10 EnsureConnected, Dispose, DisposeAsync
|
||||
|
||||
```csharp
|
||||
private void EnsureConnected()
|
||||
{
|
||||
ObjectDisposedException.ThrowIf(_disposed, this);
|
||||
if (!IsConnected)
|
||||
throw new InvalidOperationException("Client is not connected. Call ConnectAsync first.");
|
||||
}
|
||||
|
||||
public void Dispose()
|
||||
{
|
||||
if (_disposed) return;
|
||||
_disposed = true;
|
||||
_keepAliveTimer?.Dispose();
|
||||
_channel?.Dispose();
|
||||
_connectionLock.Dispose();
|
||||
}
|
||||
|
||||
public async ValueTask DisposeAsync()
|
||||
{
|
||||
if (_disposed) return;
|
||||
try { await DisconnectAsync(); } catch { /* swallow */ }
|
||||
Dispose();
|
||||
}
|
||||
```
|
||||
|
||||
### 4.11 IsConnectedAsync
|
||||
|
||||
```csharp
|
||||
public Task<bool> IsConnectedAsync() => Task.FromResult(IsConnected);
|
||||
```
|
||||
|
||||
### 4.12 GetMetrics
|
||||
|
||||
```csharp
|
||||
public Dictionary<string, object> GetMetrics() => _metrics.GetSnapshot();
|
||||
```
|
||||
|
||||
### 4.13 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 5: LmxProxyClient.Connection
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.Connection.cs`
|
||||
|
||||
Partial class containing `ConnectAsync`, `DisconnectAsync`, keep-alive, `MarkDisconnectedAsync`, `BuildEndpointUri`.
|
||||
|
||||
### 5.1 ConnectAsync
|
||||
|
||||
1. Acquire `_connectionLock`.
|
||||
2. Throw `ObjectDisposedException` if disposed.
|
||||
3. Return early if already connected.
|
||||
4. Build endpoint URI via `BuildEndpointUri()`.
|
||||
5. Create channel: `GrpcChannelFactory.CreateChannel(endpoint, _tlsConfiguration, _logger)`.
|
||||
6. Create code-first client: `channel.CreateGrpcService<IScadaService>()` (from `ProtoBuf.Grpc.Client`).
|
||||
7. Send `ConnectRequest` with `ClientId = $"ScadaBridge-{Guid.NewGuid():N}"` and `ApiKey = _apiKey ?? string.Empty`.
|
||||
8. If `!response.Success`, dispose channel and throw.
|
||||
9. Store channel, client, sessionId. Set `_isConnected = true`.
|
||||
10. Call `StartKeepAlive()`.
|
||||
11. On failure, reset all state and rethrow.
|
||||
12. Release lock in `finally`.
|
||||
|
||||
### 5.2 DisconnectAsync
|
||||
|
||||
1. Acquire `_connectionLock`.
|
||||
2. Stop keep-alive.
|
||||
3. If client and session exist, send `DisconnectRequest`. Swallow exceptions.
|
||||
4. Clear client, sessionId, isConnected. Dispose channel.
|
||||
5. Release lock.
|
||||
|
||||
### 5.3 Keep-alive timer
|
||||
|
||||
- `StartKeepAlive()`: creates `Timer` with `_keepAliveInterval` (30s) interval.
|
||||
- Timer callback: sends `GetConnectionStateRequest`. On failure: stops timer, calls `MarkDisconnectedAsync(ex)`.
|
||||
- `StopKeepAlive()`: disposes timer, nulls it.
|
||||
|
||||
### 5.4 MarkDisconnectedAsync
|
||||
|
||||
1. If disposed, return.
|
||||
2. Acquire `_connectionLock`, set `_isConnected = false`, clear client/sessionId, dispose channel. Release lock.
|
||||
3. Copy and clear `_activeSubscriptions` under `_subscriptionLock`.
|
||||
4. Dispose each subscription (swallow errors).
|
||||
5. Log warning with the exception.
|
||||
|
||||
### 5.5 BuildEndpointUri
|
||||
|
||||
```csharp
|
||||
private Uri BuildEndpointUri()
|
||||
{
|
||||
string scheme = _tlsConfiguration?.UseTls == true ? Uri.UriSchemeHttps : Uri.UriSchemeHttp;
|
||||
return new UriBuilder { Scheme = scheme, Host = _host, Port = _port }.Uri;
|
||||
}
|
||||
```
|
||||
|
||||
### 5.6 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 6: LmxProxyClient.CodeFirstSubscription
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.CodeFirstSubscription.cs`
|
||||
|
||||
Nested class inside `LmxProxyClient` implementing `ISubscription`.
|
||||
|
||||
### 6.1 CodeFirstSubscription class
|
||||
|
||||
```csharp
|
||||
private class CodeFirstSubscription : ISubscription
|
||||
{
|
||||
private readonly IScadaService _client;
|
||||
private readonly string _sessionId;
|
||||
private readonly List<string> _tags;
|
||||
private readonly Action<string, Vtq> _onUpdate;
|
||||
private readonly Action<Exception>? _onStreamError;
|
||||
private readonly ILogger<LmxProxyClient> _logger;
|
||||
private readonly Action<ISubscription>? _onDispose;
|
||||
private readonly CancellationTokenSource _cts = new();
|
||||
private Task? _processingTask;
|
||||
private bool _disposed;
|
||||
private bool _streamErrorFired;
|
||||
```
|
||||
|
||||
Constructor takes all of these. `StartAsync` stores `_processingTask = ProcessUpdatesAsync(cancellationToken)`.
|
||||
|
||||
### 6.2 ProcessUpdatesAsync
|
||||
|
||||
```csharp
|
||||
private async Task ProcessUpdatesAsync(CancellationToken cancellationToken)
|
||||
{
|
||||
try
|
||||
{
|
||||
var request = new SubscribeRequest
|
||||
{
|
||||
SessionId = _sessionId,
|
||||
Tags = _tags,
|
||||
SamplingMs = 1000
|
||||
};
|
||||
using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, _cts.Token);
|
||||
|
||||
await foreach (VtqMessage vtqMsg in _client.SubscribeAsync(request, linkedCts.Token))
|
||||
{
|
||||
try
|
||||
{
|
||||
Vtq vtq = ConvertVtqMessage(vtqMsg); // static method from outer class
|
||||
_onUpdate(vtqMsg.Tag, vtq);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Error processing subscription update for {Tag}", vtqMsg.Tag);
|
||||
}
|
||||
}
|
||||
}
|
||||
catch (OperationCanceledException) when (_cts.IsCancellationRequested || cancellationToken.IsCancellationRequested)
|
||||
{
|
||||
_logger.LogDebug("Subscription cancelled");
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Error in subscription processing");
|
||||
FireStreamError(ex);
|
||||
}
|
||||
finally
|
||||
{
|
||||
if (!_disposed)
|
||||
{
|
||||
_disposed = true;
|
||||
_onDispose?.Invoke(this);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private void FireStreamError(Exception ex)
|
||||
{
|
||||
if (_streamErrorFired) return;
|
||||
_streamErrorFired = true;
|
||||
try { _onStreamError?.Invoke(ex); }
|
||||
catch (Exception cbEx) { _logger.LogWarning(cbEx, "onStreamError callback threw"); }
|
||||
}
|
||||
```
|
||||
|
||||
**Key difference from v1**: The `ConvertVtqMessage` now handles `TypedValue` + `QualityCode` natively instead of parsing strings. Also, `_onStreamError` callback is invoked exactly once on stream termination (per Component-Client.md section 5.1).
|
||||
|
||||
### 6.3 DisposeAsync and Dispose
|
||||
|
||||
`DisposeAsync()`: Cancel CTS, await `_processingTask` (swallow errors), dispose CTS. 5-second timeout guard.
|
||||
|
||||
`Dispose()`: Calls `DisposeAsync()` synchronously with `Task.Wait(TimeSpan.FromSeconds(5))`.
|
||||
|
||||
### 6.4 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 7: LmxProxyClient.ClientMetrics
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.ClientMetrics.cs`
|
||||
|
||||
Internal class. Already exists in v1 reference. Rewrite for v2 with p99 support.
|
||||
|
||||
```csharp
|
||||
internal class ClientMetrics
|
||||
{
|
||||
private readonly ConcurrentDictionary<string, long> _operationCounts = new();
|
||||
private readonly ConcurrentDictionary<string, long> _errorCounts = new();
|
||||
private readonly ConcurrentDictionary<string, List<long>> _latencies = new();
|
||||
private readonly Lock _latencyLock = new();
|
||||
|
||||
public void IncrementOperationCount(string operation) { ... }
|
||||
public void IncrementErrorCount(string operation) { ... }
|
||||
public void RecordLatency(string operation, long milliseconds) { ... }
|
||||
public Dictionary<string, object> GetSnapshot() { ... }
|
||||
}
|
||||
```
|
||||
|
||||
`RecordLatency`: Under `_latencyLock`, add to list. If count > 1000, `RemoveAt(0)`.
|
||||
|
||||
`GetSnapshot`: Returns dictionary with keys `{op}_count`, `{op}_errors`, `{op}_avg_latency_ms`, `{op}_p95_latency_ms`, `{op}_p99_latency_ms`.
|
||||
|
||||
`GetPercentile(List<long> values, int percentile)`: Sort, compute index as `(int)Math.Ceiling(percentile / 100.0 * sorted.Count) - 1`, clamp with `Math.Max(0, ...)`.
|
||||
|
||||
## Step 8: LmxProxyClient.ApiKeyInfo
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.ApiKeyInfo.cs`
|
||||
|
||||
Simple DTO returned by `CheckApiKeyAsync`:
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client;
|
||||
|
||||
public partial class LmxProxyClient
|
||||
{
|
||||
/// <summary>
|
||||
/// Result of an API key validation check.
|
||||
/// </summary>
|
||||
public class ApiKeyInfo
|
||||
{
|
||||
public bool IsValid { get; init; }
|
||||
public string? Role { get; init; }
|
||||
public string? Description { get; init; }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Step 9: LmxProxyClient.ISubscription
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClient.ISubscription.cs`
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client;
|
||||
|
||||
public partial class LmxProxyClient
|
||||
{
|
||||
/// <summary>
|
||||
/// Represents an active tag subscription. Dispose to unsubscribe.
|
||||
/// </summary>
|
||||
public interface ISubscription : IDisposable
|
||||
{
|
||||
/// <summary>Asynchronous disposal with cancellation support.</summary>
|
||||
Task DisposeAsync();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Step 10: Unit Tests
|
||||
|
||||
**Project**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/`
|
||||
|
||||
Create if not exists:
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet new xunit -n ZB.MOM.WW.LmxProxy.Client.Tests -o tests/ZB.MOM.WW.LmxProxy.Client.Tests --framework net10.0"
|
||||
```
|
||||
|
||||
**Csproj** for `tests/ZB.MOM.WW.LmxProxy.Client.Tests/ZB.MOM.WW.LmxProxy.Client.Tests.csproj`:
|
||||
- `<TargetFramework>net10.0</TargetFramework>`
|
||||
- `<ProjectReference Include="..\..\src\ZB.MOM.WW.LmxProxy.Client\ZB.MOM.WW.LmxProxy.Client.csproj" />`
|
||||
- `<PackageReference Include="xunit" Version="2.9.3" />`
|
||||
- `<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2" />`
|
||||
- `<PackageReference Include="NSubstitute" Version="5.3.0" />`
|
||||
- `<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />`
|
||||
|
||||
**Add to solution** `ZB.MOM.WW.LmxProxy.slnx`:
|
||||
```xml
|
||||
<Folder Name="/tests/">
|
||||
<Project Path="tests/ZB.MOM.WW.LmxProxy.Client.Tests/ZB.MOM.WW.LmxProxy.Client.Tests.csproj" />
|
||||
</Folder>
|
||||
```
|
||||
|
||||
### 10.1 Connection Lifecycle Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientConnectionTests.cs`
|
||||
|
||||
Mock `IScadaService` using NSubstitute.
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientConnectionTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task ConnectAsync_EstablishesSessionAndStartsKeepAlive()
|
||||
|
||||
[Fact]
|
||||
public async Task ConnectAsync_ThrowsWhenServerReturnsFailure()
|
||||
|
||||
[Fact]
|
||||
public async Task DisconnectAsync_SendsDisconnectAndClearsState()
|
||||
|
||||
[Fact]
|
||||
public async Task IsConnectedAsync_ReturnsFalseBeforeConnect()
|
||||
|
||||
[Fact]
|
||||
public async Task IsConnectedAsync_ReturnsTrueAfterConnect()
|
||||
|
||||
[Fact]
|
||||
public async Task KeepAliveFailure_MarksDisconnected()
|
||||
}
|
||||
```
|
||||
|
||||
Note: Testing the keep-alive requires either waiting 30s (too slow) or making the interval configurable for tests. Consider passing the interval as an internal constructor parameter or using a test-only subclass. Alternatively, test `MarkDisconnectedAsync` directly.
|
||||
|
||||
### 10.2 Read/Write Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientReadWriteTests.cs`
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientReadWriteTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task ReadAsync_ReturnsVtqFromResponse()
|
||||
// Mock ReadAsync to return a VtqMessage with TypedValue.DoubleValue = 42.5
|
||||
// Verify returned Vtq.Value is 42.5 (double)
|
||||
|
||||
[Fact]
|
||||
public async Task ReadAsync_ThrowsOnFailureResponse()
|
||||
|
||||
[Fact]
|
||||
public async Task ReadBatchAsync_ReturnsDictionaryOfVtqs()
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_SendsTypedValueDirectly()
|
||||
// Verify the WriteRequest.Value is the TypedValue passed in, not a string
|
||||
|
||||
[Fact]
|
||||
public async Task WriteBatchAsync_SendsAllItems()
|
||||
|
||||
[Fact]
|
||||
public async Task WriteBatchAndWaitAsync_ReturnsResponse()
|
||||
}
|
||||
```
|
||||
|
||||
### 10.3 Subscription Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientSubscriptionTests.cs`
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientSubscriptionTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task SubscribeAsync_InvokesCallbackForEachUpdate()
|
||||
|
||||
[Fact]
|
||||
public async Task SubscribeAsync_InvokesStreamErrorOnFailure()
|
||||
|
||||
[Fact]
|
||||
public async Task SubscribeAsync_DisposeStopsProcessing()
|
||||
}
|
||||
```
|
||||
|
||||
### 10.4 TypedValue Conversion Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/TypedValueConversionTests.cs`
|
||||
|
||||
```csharp
|
||||
public class TypedValueConversionTests
|
||||
{
|
||||
[Fact] public void ConvertVtqMessage_ExtractsBoolValue()
|
||||
[Fact] public void ConvertVtqMessage_ExtractsInt32Value()
|
||||
[Fact] public void ConvertVtqMessage_ExtractsInt64Value()
|
||||
[Fact] public void ConvertVtqMessage_ExtractsFloatValue()
|
||||
[Fact] public void ConvertVtqMessage_ExtractsDoubleValue()
|
||||
[Fact] public void ConvertVtqMessage_ExtractsStringValue()
|
||||
[Fact] public void ConvertVtqMessage_ExtractsDateTimeValue()
|
||||
[Fact] public void ConvertVtqMessage_HandlesNullTypedValue()
|
||||
[Fact] public void ConvertVtqMessage_HandlesNullMessage()
|
||||
[Fact] public void ConvertVtqMessage_MapsQualityCodeCorrectly()
|
||||
[Fact] public void ConvertVtqMessage_GoodQualityCode()
|
||||
[Fact] public void ConvertVtqMessage_BadQualityCode()
|
||||
[Fact] public void ConvertVtqMessage_UncertainQualityCode()
|
||||
}
|
||||
```
|
||||
|
||||
### 10.5 Metrics Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/ClientMetricsTests.cs`
|
||||
|
||||
```csharp
|
||||
public class ClientMetricsTests
|
||||
{
|
||||
[Fact] public void IncrementOperationCount_Increments()
|
||||
[Fact] public void IncrementErrorCount_Increments()
|
||||
[Fact] public void RecordLatency_StoresValues()
|
||||
[Fact] public void RollingBuffer_CapsAt1000()
|
||||
[Fact] public void GetSnapshot_IncludesP95AndP99()
|
||||
}
|
||||
```
|
||||
|
||||
### 10.6 Run tests
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests --verbosity normal"
|
||||
```
|
||||
|
||||
## Step 11: Build Verification
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx && dotnet test --verbosity normal"
|
||||
```
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
- [ ] `ILmxProxyClient` interface updated for v2 (TypedValue parameters, onStreamError callback, CheckApiKeyAsync)
|
||||
- [ ] `LmxProxyClient.cs` — main file with Read/Write/WriteBatch/WriteBatchAndWait/CheckApiKey using v2 TypedValue
|
||||
- [ ] `LmxProxyClient.Connection.cs` — ConnectAsync, DisconnectAsync, keep-alive (30s), MarkDisconnectedAsync
|
||||
- [ ] `LmxProxyClient.CodeFirstSubscription.cs` — IAsyncEnumerable processing, onStreamError callback, 5s dispose timeout
|
||||
- [ ] `LmxProxyClient.ClientMetrics.cs` — per-op counts/errors/latency, 1000-sample buffer, p95/p99
|
||||
- [ ] `LmxProxyClient.ApiKeyInfo.cs` — simple DTO
|
||||
- [ ] `LmxProxyClient.ISubscription.cs` — IDisposable + DisposeAsync
|
||||
- [ ] `ClientTlsConfiguration.cs` — all properties present
|
||||
- [ ] `Security/GrpcChannelFactory.cs` — TLS 1.2/1.3, cert validation, custom CA, self-signed support
|
||||
- [ ] No string serialization heuristics anywhere in Client code
|
||||
- [ ] ConvertVtqMessage extracts native TypedValue without parsing
|
||||
- [ ] Polly v8 ResiliencePipeline for retry (not v7 IAsyncPolicy)
|
||||
- [ ] All unit tests pass
|
||||
- [ ] Solution builds cleanly
|
||||
815
deprecated/lmxproxy/docs/plans/phase-6-client-extras.md
Normal file
815
deprecated/lmxproxy/docs/plans/phase-6-client-extras.md
Normal file
@@ -0,0 +1,815 @@
|
||||
# Phase 6: Client Extras — Implementation Plan
|
||||
|
||||
**Date**: 2026-03-21
|
||||
**Prerequisites**: Phase 5 complete and passing (Client Core — `ILmxProxyClient`, `LmxProxyClient` partial classes, `ClientMetrics`, `ISubscription`, `ApiKeyInfo` all functional with unit tests passing)
|
||||
**Working Directory**: The lmxproxy repo is on windev at `C:\src\lmxproxy`
|
||||
|
||||
## Guardrails
|
||||
|
||||
1. **Client targets .NET 10, AnyCPU** — latest C# features permitted.
|
||||
2. **Polly v8 API** — `ResiliencePipeline`, `ResiliencePipelineBuilder`, `RetryStrategyOptions`. Do NOT use Polly v7 `IAsyncPolicy`, `Policy.Handle<>().WaitAndRetryAsync(...)`.
|
||||
3. **Builder default port is 50051** (per design doc section 11 — resolved conflict).
|
||||
4. **No new NuGet packages** — `Polly 8.5.2`, `Microsoft.Extensions.DependencyInjection.Abstractions 10.0.0`, `Microsoft.Extensions.Configuration.Abstractions 10.0.0`, `Microsoft.Extensions.Configuration.Binder 10.0.0`, `Microsoft.Extensions.Logging.Abstractions 10.0.0` are already in the csproj.
|
||||
5. **Build command**: `dotnet build src/ZB.MOM.WW.LmxProxy.Client`
|
||||
6. **Test command**: `dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests`
|
||||
|
||||
## Step 1: LmxProxyClientBuilder
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/LmxProxyClientBuilder.cs`
|
||||
|
||||
Rewrite the builder for v2. Key changes from v1:
|
||||
- Default port changes from `5050` to `50051`
|
||||
- Retry uses Polly v8 `ResiliencePipeline` (built in `SetBuilderConfiguration`)
|
||||
- `WithCorrelationIdHeader` support
|
||||
|
||||
### 1.1 Builder fields
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientBuilder
|
||||
{
|
||||
private string? _host;
|
||||
private int _port = 50051; // CHANGED from 5050
|
||||
private string? _apiKey;
|
||||
private ILogger<LmxProxyClient>? _logger;
|
||||
private TimeSpan _defaultTimeout = TimeSpan.FromSeconds(30);
|
||||
private int _maxRetryAttempts = 3;
|
||||
private TimeSpan _retryDelay = TimeSpan.FromSeconds(1);
|
||||
private bool _enableMetrics;
|
||||
private string? _correlationIdHeader;
|
||||
private ClientTlsConfiguration? _tlsConfiguration;
|
||||
```
|
||||
|
||||
### 1.2 Fluent methods
|
||||
|
||||
Each method returns `this` for chaining. Validation at call site:
|
||||
|
||||
| Method | Default | Validation |
|
||||
|---|---|---|
|
||||
| `WithHost(string host)` | Required | `!string.IsNullOrWhiteSpace(host)` |
|
||||
| `WithPort(int port)` | 50051 | 1-65535 |
|
||||
| `WithApiKey(string? apiKey)` | null | none |
|
||||
| `WithLogger(ILogger<LmxProxyClient> logger)` | NullLogger | `!= null` |
|
||||
| `WithTimeout(TimeSpan timeout)` | 30s | `> TimeSpan.Zero && <= TimeSpan.FromMinutes(10)` |
|
||||
| `WithSslCredentials(string? certificatePath)` | disabled | creates/updates `_tlsConfiguration` with `UseTls=true` |
|
||||
| `WithTlsConfiguration(ClientTlsConfiguration config)` | null | `!= null` |
|
||||
| `WithRetryPolicy(int maxAttempts, TimeSpan retryDelay)` | 3, 1s | `maxAttempts > 0`, `retryDelay > TimeSpan.Zero` |
|
||||
| `WithMetrics()` | disabled | sets `_enableMetrics = true` |
|
||||
| `WithCorrelationIdHeader(string headerName)` | null | `!string.IsNullOrEmpty` |
|
||||
|
||||
### 1.3 Build()
|
||||
|
||||
```csharp
|
||||
public LmxProxyClient Build()
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(_host))
|
||||
throw new InvalidOperationException("Host must be specified. Call WithHost() before Build().");
|
||||
|
||||
ValidateTlsConfiguration();
|
||||
|
||||
var client = new LmxProxyClient(_host, _port, _apiKey, _tlsConfiguration, _logger)
|
||||
{
|
||||
DefaultTimeout = _defaultTimeout
|
||||
};
|
||||
|
||||
client.SetBuilderConfiguration(new ClientConfiguration
|
||||
{
|
||||
MaxRetryAttempts = _maxRetryAttempts,
|
||||
RetryDelay = _retryDelay,
|
||||
EnableMetrics = _enableMetrics,
|
||||
CorrelationIdHeader = _correlationIdHeader
|
||||
});
|
||||
|
||||
return client;
|
||||
}
|
||||
```
|
||||
|
||||
### 1.4 ValidateTlsConfiguration
|
||||
|
||||
If `_tlsConfiguration?.UseTls == true`:
|
||||
- If `ServerCaCertificatePath` is set and file doesn't exist → throw `FileNotFoundException`.
|
||||
- If `ClientCertificatePath` is set and file doesn't exist → throw `FileNotFoundException`.
|
||||
- If `ClientKeyPath` is set and file doesn't exist → throw `FileNotFoundException`.
|
||||
|
||||
### 1.5 Polly v8 ResiliencePipeline setup (in LmxProxyClient.SetBuilderConfiguration)
|
||||
|
||||
This was defined in Step 4 of Phase 5. Verify it uses:
|
||||
|
||||
```csharp
|
||||
using Polly;
|
||||
using Polly.Retry;
|
||||
using Grpc.Core;
|
||||
|
||||
_resiliencePipeline = new ResiliencePipelineBuilder()
|
||||
.AddRetry(new RetryStrategyOptions
|
||||
{
|
||||
MaxRetryAttempts = config.MaxRetryAttempts,
|
||||
Delay = config.RetryDelay,
|
||||
BackoffType = DelayBackoffType.Exponential,
|
||||
ShouldHandle = new PredicateBuilder()
|
||||
.Handle<RpcException>(ex =>
|
||||
ex.StatusCode == StatusCode.Unavailable ||
|
||||
ex.StatusCode == StatusCode.DeadlineExceeded ||
|
||||
ex.StatusCode == StatusCode.ResourceExhausted ||
|
||||
ex.StatusCode == StatusCode.Aborted),
|
||||
OnRetry = args =>
|
||||
{
|
||||
_logger.LogWarning(
|
||||
"Retry {Attempt}/{Max} after {Delay}ms — {Error}",
|
||||
args.AttemptNumber, config.MaxRetryAttempts,
|
||||
args.RetryDelay.TotalMilliseconds,
|
||||
args.Outcome.Exception?.Message ?? "unknown");
|
||||
return ValueTask.CompletedTask;
|
||||
}
|
||||
})
|
||||
.Build();
|
||||
```
|
||||
|
||||
Backoff sequence: `retryDelay * 2^(attempt-1)` → 1s, 2s, 4s for defaults.
|
||||
|
||||
### 1.6 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 2: ClientConfiguration
|
||||
|
||||
**File**: This is already defined in `LmxProxyClientBuilder.cs` (at the bottom of the file, as an `internal class`). Verify it contains:
|
||||
|
||||
```csharp
|
||||
internal class ClientConfiguration
|
||||
{
|
||||
public int MaxRetryAttempts { get; set; }
|
||||
public TimeSpan RetryDelay { get; set; }
|
||||
public bool EnableMetrics { get; set; }
|
||||
public string? CorrelationIdHeader { get; set; }
|
||||
}
|
||||
```
|
||||
|
||||
No changes needed if it matches.
|
||||
|
||||
## Step 3: ILmxProxyClientFactory + LmxProxyClientFactory
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ILmxProxyClientFactory.cs`
|
||||
|
||||
### 3.1 Interface
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client;
|
||||
|
||||
public interface ILmxProxyClientFactory
|
||||
{
|
||||
LmxProxyClient CreateClient();
|
||||
LmxProxyClient CreateClient(string configName);
|
||||
LmxProxyClient CreateClient(Action<LmxProxyClientBuilder> builderAction);
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Implementation
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientFactory : ILmxProxyClientFactory
|
||||
{
|
||||
private readonly IConfiguration _configuration;
|
||||
|
||||
public LmxProxyClientFactory(IConfiguration configuration)
|
||||
{
|
||||
_configuration = configuration ?? throw new ArgumentNullException(nameof(configuration));
|
||||
}
|
||||
|
||||
public LmxProxyClient CreateClient() => CreateClient("LmxProxy");
|
||||
|
||||
public LmxProxyClient CreateClient(string configName)
|
||||
{
|
||||
IConfigurationSection section = _configuration.GetSection(configName);
|
||||
var options = new LmxProxyClientOptions();
|
||||
section.Bind(options);
|
||||
return BuildFromOptions(options);
|
||||
}
|
||||
|
||||
public LmxProxyClient CreateClient(Action<LmxProxyClientBuilder> builderAction)
|
||||
{
|
||||
var builder = new LmxProxyClientBuilder();
|
||||
builderAction(builder);
|
||||
return builder.Build();
|
||||
}
|
||||
|
||||
private static LmxProxyClient BuildFromOptions(LmxProxyClientOptions options)
|
||||
{
|
||||
var builder = new LmxProxyClientBuilder()
|
||||
.WithHost(options.Host)
|
||||
.WithPort(options.Port)
|
||||
.WithTimeout(options.Timeout)
|
||||
.WithRetryPolicy(options.Retry.MaxAttempts, options.Retry.Delay);
|
||||
|
||||
if (!string.IsNullOrEmpty(options.ApiKey))
|
||||
builder.WithApiKey(options.ApiKey);
|
||||
|
||||
if (options.EnableMetrics)
|
||||
builder.WithMetrics();
|
||||
|
||||
if (!string.IsNullOrEmpty(options.CorrelationIdHeader))
|
||||
builder.WithCorrelationIdHeader(options.CorrelationIdHeader);
|
||||
|
||||
if (options.UseSsl)
|
||||
{
|
||||
builder.WithTlsConfiguration(new ClientTlsConfiguration
|
||||
{
|
||||
UseTls = true,
|
||||
ServerCaCertificatePath = options.CertificatePath
|
||||
});
|
||||
}
|
||||
|
||||
return builder.Build();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 4: ServiceCollectionExtensions
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/ServiceCollectionExtensions.cs`
|
||||
|
||||
### 4.1 Options classes
|
||||
|
||||
Define at the bottom of the file or in a separate `LmxProxyClientOptions.cs`:
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientOptions
|
||||
{
|
||||
public string Host { get; set; } = "localhost";
|
||||
public int Port { get; set; } = 50051; // CHANGED from 5050
|
||||
public string? ApiKey { get; set; }
|
||||
public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(30);
|
||||
public bool UseSsl { get; set; }
|
||||
public string? CertificatePath { get; set; }
|
||||
public bool EnableMetrics { get; set; }
|
||||
public string? CorrelationIdHeader { get; set; }
|
||||
public RetryOptions Retry { get; set; } = new();
|
||||
}
|
||||
|
||||
public class RetryOptions
|
||||
{
|
||||
public int MaxAttempts { get; set; } = 3;
|
||||
public TimeSpan Delay { get; set; } = TimeSpan.FromSeconds(1);
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Extension methods
|
||||
|
||||
```csharp
|
||||
public static class ServiceCollectionExtensions
|
||||
{
|
||||
/// <summary>Registers a singleton ILmxProxyClient from the "LmxProxy" config section.</summary>
|
||||
public static IServiceCollection AddLmxProxyClient(
|
||||
this IServiceCollection services, IConfiguration configuration)
|
||||
{
|
||||
return services.AddLmxProxyClient(configuration, "LmxProxy");
|
||||
}
|
||||
|
||||
/// <summary>Registers a singleton ILmxProxyClient from a named config section.</summary>
|
||||
public static IServiceCollection AddLmxProxyClient(
|
||||
this IServiceCollection services, IConfiguration configuration, string sectionName)
|
||||
{
|
||||
services.AddSingleton<ILmxProxyClientFactory>(
|
||||
sp => new LmxProxyClientFactory(configuration));
|
||||
services.AddSingleton<ILmxProxyClient>(sp =>
|
||||
{
|
||||
var factory = sp.GetRequiredService<ILmxProxyClientFactory>();
|
||||
return factory.CreateClient(sectionName);
|
||||
});
|
||||
return services;
|
||||
}
|
||||
|
||||
/// <summary>Registers a singleton ILmxProxyClient via builder action.</summary>
|
||||
public static IServiceCollection AddLmxProxyClient(
|
||||
this IServiceCollection services, Action<LmxProxyClientBuilder> configure)
|
||||
{
|
||||
services.AddSingleton<ILmxProxyClient>(sp =>
|
||||
{
|
||||
var builder = new LmxProxyClientBuilder();
|
||||
configure(builder);
|
||||
return builder.Build();
|
||||
});
|
||||
return services;
|
||||
}
|
||||
|
||||
/// <summary>Registers a scoped ILmxProxyClient from the "LmxProxy" config section.</summary>
|
||||
public static IServiceCollection AddScopedLmxProxyClient(
|
||||
this IServiceCollection services, IConfiguration configuration)
|
||||
{
|
||||
services.AddSingleton<ILmxProxyClientFactory>(
|
||||
sp => new LmxProxyClientFactory(configuration));
|
||||
services.AddScoped<ILmxProxyClient>(sp =>
|
||||
{
|
||||
var factory = sp.GetRequiredService<ILmxProxyClientFactory>();
|
||||
return factory.CreateClient();
|
||||
});
|
||||
return services;
|
||||
}
|
||||
|
||||
/// <summary>Registers a keyed singleton ILmxProxyClient.</summary>
|
||||
public static IServiceCollection AddNamedLmxProxyClient(
|
||||
this IServiceCollection services, string name, Action<LmxProxyClientBuilder> configure)
|
||||
{
|
||||
services.AddKeyedSingleton<ILmxProxyClient>(name, (sp, key) =>
|
||||
{
|
||||
var builder = new LmxProxyClientBuilder();
|
||||
configure(builder);
|
||||
return builder.Build();
|
||||
});
|
||||
return services;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 5: StreamingExtensions
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/StreamingExtensions.cs`
|
||||
|
||||
### 5.1 ReadStreamAsync
|
||||
|
||||
```csharp
|
||||
public static class StreamingExtensions
|
||||
{
|
||||
/// <summary>
|
||||
/// Reads multiple tags as an async stream in batches.
|
||||
/// Retries up to 2 times per batch. Aborts after 3 consecutive batch errors.
|
||||
/// </summary>
|
||||
public static async IAsyncEnumerable<KeyValuePair<string, Vtq>> ReadStreamAsync(
|
||||
this ILmxProxyClient client,
|
||||
IEnumerable<string> addresses,
|
||||
int batchSize = 100,
|
||||
[EnumeratorCancellation] CancellationToken cancellationToken = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(client);
|
||||
ArgumentNullException.ThrowIfNull(addresses);
|
||||
if (batchSize <= 0)
|
||||
throw new ArgumentOutOfRangeException(nameof(batchSize));
|
||||
|
||||
var batch = new List<string>(batchSize);
|
||||
int consecutiveErrors = 0;
|
||||
const int maxConsecutiveErrors = 3;
|
||||
const int maxRetries = 2;
|
||||
|
||||
foreach (string address in addresses)
|
||||
{
|
||||
cancellationToken.ThrowIfCancellationRequested();
|
||||
batch.Add(address);
|
||||
|
||||
if (batch.Count >= batchSize)
|
||||
{
|
||||
await foreach (var kvp in ReadBatchWithRetry(
|
||||
client, batch, maxRetries, cancellationToken))
|
||||
{
|
||||
consecutiveErrors = 0;
|
||||
yield return kvp;
|
||||
}
|
||||
// If we get here without yielding, it was an error
|
||||
// (handled inside ReadBatchWithRetry)
|
||||
batch.Clear();
|
||||
}
|
||||
}
|
||||
|
||||
// Process remaining
|
||||
if (batch.Count > 0)
|
||||
{
|
||||
await foreach (var kvp in ReadBatchWithRetry(
|
||||
client, batch, maxRetries, cancellationToken))
|
||||
{
|
||||
yield return kvp;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private static async IAsyncEnumerable<KeyValuePair<string, Vtq>> ReadBatchWithRetry(
|
||||
ILmxProxyClient client,
|
||||
List<string> batch,
|
||||
int maxRetries,
|
||||
[EnumeratorCancellation] CancellationToken ct)
|
||||
{
|
||||
int retries = 0;
|
||||
while (retries <= maxRetries)
|
||||
{
|
||||
IDictionary<string, Vtq>? results = null;
|
||||
try
|
||||
{
|
||||
results = await client.ReadBatchAsync(batch, ct);
|
||||
}
|
||||
catch when (retries < maxRetries)
|
||||
{
|
||||
retries++;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (results is not null)
|
||||
{
|
||||
foreach (var kvp in results)
|
||||
yield return kvp;
|
||||
yield break;
|
||||
}
|
||||
retries++;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 WriteStreamAsync
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Writes values from an async enumerable in batches. Returns total count written.
|
||||
/// </summary>
|
||||
public static async Task<int> WriteStreamAsync(
|
||||
this ILmxProxyClient client,
|
||||
IAsyncEnumerable<KeyValuePair<string, TypedValue>> values,
|
||||
int batchSize = 100,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(client);
|
||||
ArgumentNullException.ThrowIfNull(values);
|
||||
if (batchSize <= 0)
|
||||
throw new ArgumentOutOfRangeException(nameof(batchSize));
|
||||
|
||||
var batch = new Dictionary<string, TypedValue>(batchSize);
|
||||
int totalWritten = 0;
|
||||
|
||||
await foreach (var kvp in values.WithCancellation(cancellationToken))
|
||||
{
|
||||
batch[kvp.Key] = kvp.Value;
|
||||
|
||||
if (batch.Count >= batchSize)
|
||||
{
|
||||
await client.WriteBatchAsync(batch, cancellationToken);
|
||||
totalWritten += batch.Count;
|
||||
batch.Clear();
|
||||
}
|
||||
}
|
||||
|
||||
if (batch.Count > 0)
|
||||
{
|
||||
await client.WriteBatchAsync(batch, cancellationToken);
|
||||
totalWritten += batch.Count;
|
||||
}
|
||||
|
||||
return totalWritten;
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 ProcessInParallelAsync
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Processes items in parallel with a configurable max concurrency (default 4).
|
||||
/// </summary>
|
||||
public static async Task ProcessInParallelAsync<T>(
|
||||
this IAsyncEnumerable<T> source,
|
||||
Func<T, CancellationToken, Task> processor,
|
||||
int maxConcurrency = 4,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(source);
|
||||
ArgumentNullException.ThrowIfNull(processor);
|
||||
if (maxConcurrency <= 0)
|
||||
throw new ArgumentOutOfRangeException(nameof(maxConcurrency));
|
||||
|
||||
using var semaphore = new SemaphoreSlim(maxConcurrency);
|
||||
var tasks = new List<Task>();
|
||||
|
||||
await foreach (T item in source.WithCancellation(cancellationToken))
|
||||
{
|
||||
await semaphore.WaitAsync(cancellationToken);
|
||||
|
||||
tasks.Add(Task.Run(async () =>
|
||||
{
|
||||
try
|
||||
{
|
||||
await processor(item, cancellationToken);
|
||||
}
|
||||
finally
|
||||
{
|
||||
semaphore.Release();
|
||||
}
|
||||
}, cancellationToken));
|
||||
}
|
||||
|
||||
await Task.WhenAll(tasks);
|
||||
}
|
||||
```
|
||||
|
||||
### 5.4 SubscribeStreamAsync
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Wraps a callback-based subscription into an IAsyncEnumerable via System.Threading.Channels.
|
||||
/// </summary>
|
||||
public static async IAsyncEnumerable<(string Tag, Vtq Vtq)> SubscribeStreamAsync(
|
||||
this ILmxProxyClient client,
|
||||
IEnumerable<string> addresses,
|
||||
[EnumeratorCancellation] CancellationToken cancellationToken = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(client);
|
||||
ArgumentNullException.ThrowIfNull(addresses);
|
||||
|
||||
var channel = Channel.CreateBounded<(string, Vtq)>(
|
||||
new BoundedChannelOptions(1000)
|
||||
{
|
||||
FullMode = BoundedChannelFullMode.DropOldest,
|
||||
SingleReader = true,
|
||||
SingleWriter = false
|
||||
});
|
||||
|
||||
ISubscription? subscription = null;
|
||||
try
|
||||
{
|
||||
subscription = await client.SubscribeAsync(
|
||||
addresses,
|
||||
(tag, vtq) =>
|
||||
{
|
||||
channel.Writer.TryWrite((tag, vtq));
|
||||
},
|
||||
ex =>
|
||||
{
|
||||
channel.Writer.TryComplete(ex);
|
||||
},
|
||||
cancellationToken);
|
||||
|
||||
await foreach (var item in channel.Reader.ReadAllAsync(cancellationToken))
|
||||
{
|
||||
yield return item;
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
subscription?.Dispose();
|
||||
channel.Writer.TryComplete();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.5 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 6: Properties/AssemblyInfo.cs
|
||||
|
||||
**File**: `src/ZB.MOM.WW.LmxProxy.Client/Properties/AssemblyInfo.cs`
|
||||
|
||||
Create this file if it doesn't already exist:
|
||||
|
||||
```csharp
|
||||
using System.Runtime.CompilerServices;
|
||||
|
||||
[assembly: InternalsVisibleTo("ZB.MOM.WW.LmxProxy.Client.Tests")]
|
||||
```
|
||||
|
||||
This allows the test project to access `internal` types like `ClientMetrics` and `ClientConfiguration`.
|
||||
|
||||
### 6.1 Verify build
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build src/ZB.MOM.WW.LmxProxy.Client"
|
||||
```
|
||||
|
||||
## Step 7: Unit Tests
|
||||
|
||||
Add tests to the existing `tests/ZB.MOM.WW.LmxProxy.Client.Tests/` project (created in Phase 5).
|
||||
|
||||
### 7.1 Builder Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientBuilderTests.cs`
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientBuilderTests
|
||||
{
|
||||
[Fact]
|
||||
public void Build_ThrowsWhenHostNotSet()
|
||||
{
|
||||
var builder = new LmxProxyClientBuilder();
|
||||
Assert.Throws<InvalidOperationException>(() => builder.Build());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Build_DefaultPort_Is50051()
|
||||
{
|
||||
var client = new LmxProxyClientBuilder()
|
||||
.WithHost("localhost")
|
||||
.Build();
|
||||
// Verify via reflection or by checking connection attempt URI
|
||||
Assert.NotNull(client);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithPort_ThrowsOnZero()
|
||||
{
|
||||
Assert.Throws<ArgumentOutOfRangeException>(() =>
|
||||
new LmxProxyClientBuilder().WithPort(0));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithPort_ThrowsOn65536()
|
||||
{
|
||||
Assert.Throws<ArgumentOutOfRangeException>(() =>
|
||||
new LmxProxyClientBuilder().WithPort(65536));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithTimeout_ThrowsOnNegative()
|
||||
{
|
||||
Assert.Throws<ArgumentOutOfRangeException>(() =>
|
||||
new LmxProxyClientBuilder().WithTimeout(TimeSpan.FromSeconds(-1)));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithTimeout_ThrowsOver10Minutes()
|
||||
{
|
||||
Assert.Throws<ArgumentOutOfRangeException>(() =>
|
||||
new LmxProxyClientBuilder().WithTimeout(TimeSpan.FromMinutes(11)));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithRetryPolicy_ThrowsOnZeroAttempts()
|
||||
{
|
||||
Assert.Throws<ArgumentOutOfRangeException>(() =>
|
||||
new LmxProxyClientBuilder().WithRetryPolicy(0, TimeSpan.FromSeconds(1)));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithRetryPolicy_ThrowsOnZeroDelay()
|
||||
{
|
||||
Assert.Throws<ArgumentOutOfRangeException>(() =>
|
||||
new LmxProxyClientBuilder().WithRetryPolicy(3, TimeSpan.Zero));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Build_WithAllOptions_Succeeds()
|
||||
{
|
||||
var client = new LmxProxyClientBuilder()
|
||||
.WithHost("10.100.0.48")
|
||||
.WithPort(50051)
|
||||
.WithApiKey("test-key")
|
||||
.WithTimeout(TimeSpan.FromSeconds(15))
|
||||
.WithRetryPolicy(5, TimeSpan.FromSeconds(2))
|
||||
.WithMetrics()
|
||||
.WithCorrelationIdHeader("X-Correlation-ID")
|
||||
.Build();
|
||||
Assert.NotNull(client);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Build_WithTls_ValidatesCertificatePaths()
|
||||
{
|
||||
var builder = new LmxProxyClientBuilder()
|
||||
.WithHost("localhost")
|
||||
.WithTlsConfiguration(new ClientTlsConfiguration
|
||||
{
|
||||
UseTls = true,
|
||||
ServerCaCertificatePath = "/nonexistent/cert.pem"
|
||||
});
|
||||
Assert.Throws<FileNotFoundException>(() => builder.Build());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithHost_ThrowsOnNull()
|
||||
{
|
||||
Assert.Throws<ArgumentException>(() =>
|
||||
new LmxProxyClientBuilder().WithHost(null!));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WithHost_ThrowsOnEmpty()
|
||||
{
|
||||
Assert.Throws<ArgumentException>(() =>
|
||||
new LmxProxyClientBuilder().WithHost(""));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 Factory Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/LmxProxyClientFactoryTests.cs`
|
||||
|
||||
```csharp
|
||||
public class LmxProxyClientFactoryTests
|
||||
{
|
||||
[Fact]
|
||||
public void CreateClient_BindsFromConfiguration()
|
||||
{
|
||||
var config = new ConfigurationBuilder()
|
||||
.AddInMemoryCollection(new Dictionary<string, string?>
|
||||
{
|
||||
["LmxProxy:Host"] = "10.100.0.48",
|
||||
["LmxProxy:Port"] = "50052",
|
||||
["LmxProxy:ApiKey"] = "test-key",
|
||||
["LmxProxy:Retry:MaxAttempts"] = "5",
|
||||
["LmxProxy:Retry:Delay"] = "00:00:02",
|
||||
})
|
||||
.Build();
|
||||
|
||||
var factory = new LmxProxyClientFactory(config);
|
||||
var client = factory.CreateClient();
|
||||
Assert.NotNull(client);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void CreateClient_NamedSection()
|
||||
{
|
||||
var config = new ConfigurationBuilder()
|
||||
.AddInMemoryCollection(new Dictionary<string, string?>
|
||||
{
|
||||
["MyProxy:Host"] = "10.100.0.48",
|
||||
["MyProxy:Port"] = "50052",
|
||||
})
|
||||
.Build();
|
||||
|
||||
var factory = new LmxProxyClientFactory(config);
|
||||
var client = factory.CreateClient("MyProxy");
|
||||
Assert.NotNull(client);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void CreateClient_BuilderAction()
|
||||
{
|
||||
var config = new ConfigurationBuilder().Build();
|
||||
var factory = new LmxProxyClientFactory(config);
|
||||
var client = factory.CreateClient(b => b.WithHost("localhost").WithPort(50051));
|
||||
Assert.NotNull(client);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7.3 StreamingExtensions Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.Tests/StreamingExtensionsTests.cs`
|
||||
|
||||
```csharp
|
||||
public class StreamingExtensionsTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task ReadStreamAsync_BatchesCorrectly()
|
||||
// Create mock client, provide 250 addresses with batchSize=100
|
||||
// Verify ReadBatchAsync called 3 times (100, 100, 50)
|
||||
|
||||
[Fact]
|
||||
public async Task ReadStreamAsync_RetriesOnError()
|
||||
// Mock first ReadBatchAsync to throw, second to succeed
|
||||
// Verify results returned from second attempt
|
||||
|
||||
[Fact]
|
||||
public async Task WriteStreamAsync_BatchesAndReturnsCount()
|
||||
// Provide async enumerable of 250 items, batchSize=100
|
||||
// Verify WriteBatchAsync called 3 times, total returned = 250
|
||||
|
||||
[Fact]
|
||||
public async Task ProcessInParallelAsync_RespectsMaxConcurrency()
|
||||
// Track concurrent count with SemaphoreSlim
|
||||
// maxConcurrency=2, verify never exceeds 2 concurrent calls
|
||||
|
||||
[Fact]
|
||||
public async Task SubscribeStreamAsync_YieldsFromChannel()
|
||||
// Mock SubscribeAsync to invoke onUpdate callback with test values
|
||||
// Verify IAsyncEnumerable yields matching items
|
||||
}
|
||||
```
|
||||
|
||||
### 7.4 Run all tests
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet test tests/ZB.MOM.WW.LmxProxy.Client.Tests --verbosity normal"
|
||||
```
|
||||
|
||||
## Step 8: Build Verification
|
||||
|
||||
Run full solution build and all tests:
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet build ZB.MOM.WW.LmxProxy.slnx && dotnet test --verbosity normal"
|
||||
```
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
- [ ] `LmxProxyClientBuilder` with default port 50051, Polly v8 wiring, all fluent methods, TLS validation
|
||||
- [ ] `ClientConfiguration` internal record with retry, metrics, correlation header fields
|
||||
- [ ] `ILmxProxyClientFactory` + `LmxProxyClientFactory` with 3 `CreateClient` overloads
|
||||
- [ ] `ServiceCollectionExtensions` with `AddLmxProxyClient` (3 overloads), `AddScopedLmxProxyClient`, `AddNamedLmxProxyClient`
|
||||
- [ ] `LmxProxyClientOptions` + `RetryOptions` configuration classes
|
||||
- [ ] `StreamingExtensions` with `ReadStreamAsync` (batched, 2 retries, 3 consecutive error abort), `WriteStreamAsync` (batched), `ProcessInParallelAsync` (SemaphoreSlim, max 4), `SubscribeStreamAsync` (Channel-based IAsyncEnumerable)
|
||||
- [ ] `Properties/AssemblyInfo.cs` with `InternalsVisibleTo` for test project
|
||||
- [ ] Builder tests: validation, defaults, Polly pipeline wiring, TLS cert validation
|
||||
- [ ] Factory tests: config binding from IConfiguration, named sections, builder action
|
||||
- [ ] StreamingExtensions tests: batching, error recovery, parallel throttling, subscription streaming
|
||||
- [ ] Solution builds cleanly
|
||||
- [ ] All tests pass
|
||||
837
deprecated/lmxproxy/docs/plans/phase-7-integration-deployment.md
Normal file
837
deprecated/lmxproxy/docs/plans/phase-7-integration-deployment.md
Normal file
@@ -0,0 +1,837 @@
|
||||
# Phase 7: Integration Tests & Deployment — Implementation Plan
|
||||
|
||||
**Date**: 2026-03-21
|
||||
**Prerequisites**: Phase 4 (Host complete) and Phase 6 (Client complete) both passing. All unit tests green.
|
||||
**Working Directory (Mac)**: `/Users/dohertj2/Desktop/scadalink-design/lmxproxy`
|
||||
**Working Directory (windev)**: `C:\src\lmxproxy`
|
||||
**windev SSH**: `ssh windev` (alias configured in `~/.ssh/config`, passwordless ed25519, user `dohertj2`)
|
||||
|
||||
## Guardrails
|
||||
|
||||
1. **Never stop the v1 service until v2 is verified** — deploy v2 on alternate ports first.
|
||||
2. **Take a Veeam backup before cutover** — provides rollback point.
|
||||
3. **Integration tests run from Mac against windev** — they use `Grpc.Net.Client` which is cross-platform.
|
||||
4. **All integration tests must pass before cutover**.
|
||||
5. **API keys**: The existing `apikeys.json` on windev is the source of truth for valid keys. Read it to get test keys.
|
||||
6. **Real MxAccess tags**: Use the `TestChildObject` tags on windev's AVEVA System Platform instance. Available tags cover all TypedValue cases:
|
||||
- `TestChildObject.TestBool` (bool)
|
||||
- `TestChildObject.TestInt` (int)
|
||||
- `TestChildObject.TestFloat` (float)
|
||||
- `TestChildObject.TestDouble` (double)
|
||||
- `TestChildObject.TestString` (string)
|
||||
- `TestChildObject.TestDateTime` (datetime)
|
||||
- `TestChildObject.TestBoolArray[]` (bool array)
|
||||
- `TestChildObject.TestDateTimeArray[]` (datetime array)
|
||||
- `TestChildObject.TestDoubleArray[]` (double array)
|
||||
- `TestChildObject.TestFloatArray[]` (float array)
|
||||
- `TestChildObject.TestIntArray[]` (int array)
|
||||
- `TestChildObject.TestStringArray[]` (string array)
|
||||
|
||||
## Step 1: Build Host on windev
|
||||
|
||||
### 1.1 Pull latest code
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && git pull"
|
||||
```
|
||||
|
||||
If the repo doesn't exist on windev yet:
|
||||
|
||||
```bash
|
||||
ssh windev "git clone https://gitea.dohertylan.com/dohertj2/lmxproxy.git C:\src\lmxproxy"
|
||||
```
|
||||
|
||||
### 1.2 Publish Host binary
|
||||
|
||||
```bash
|
||||
ssh windev "cd C:\src\lmxproxy && dotnet publish src/ZB.MOM.WW.LmxProxy.Host -c Release -r win-x86 --self-contained false -o C:\publish-v2\"
|
||||
```
|
||||
|
||||
**Expected output**: `C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe` plus dependencies.
|
||||
|
||||
### 1.3 Create v2 appsettings.json
|
||||
|
||||
Create `C:\publish-v2\appsettings.json` configured for testing on alternate ports:
|
||||
|
||||
```bash
|
||||
ssh windev "powershell -Command \"@'
|
||||
{
|
||||
\"GrpcPort\": 50052,
|
||||
\"ApiKeyConfigFile\": \"apikeys.json\",
|
||||
\"Connection\": {
|
||||
\"MonitorIntervalSeconds\": 5,
|
||||
\"ConnectionTimeoutSeconds\": 30,
|
||||
\"ReadTimeoutSeconds\": 5,
|
||||
\"WriteTimeoutSeconds\": 5,
|
||||
\"MaxConcurrentOperations\": 10,
|
||||
\"AutoReconnect\": true
|
||||
},
|
||||
\"Subscription\": {
|
||||
\"ChannelCapacity\": 1000,
|
||||
\"ChannelFullMode\": \"DropOldest\"
|
||||
},
|
||||
\"HealthCheck\": {
|
||||
\"Enabled\": true,
|
||||
\"TestTagAddress\": \"TestChildObject.TestBool\",
|
||||
\"MaxStaleDataMinutes\": 5
|
||||
},
|
||||
\"Tls\": {
|
||||
\"Enabled\": false
|
||||
},
|
||||
\"WebServer\": {
|
||||
\"Enabled\": true,
|
||||
\"Port\": 8081
|
||||
},
|
||||
\"Serilog\": {
|
||||
\"MinimumLevel\": {
|
||||
\"Default\": \"Information\",
|
||||
\"Override\": {
|
||||
\"Microsoft\": \"Warning\",
|
||||
\"System\": \"Warning\",
|
||||
\"Grpc\": \"Information\"
|
||||
}
|
||||
},
|
||||
\"WriteTo\": [
|
||||
{ \"Name\": \"Console\" },
|
||||
{
|
||||
\"Name\": \"File\",
|
||||
\"Args\": {
|
||||
\"path\": \"logs/lmxproxy-v2-.txt\",
|
||||
\"rollingInterval\": \"Day\",
|
||||
\"retainedFileCountLimit\": 30
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
'@ | Set-Content -Path 'C:\publish-v2\appsettings.json' -Encoding UTF8\""
|
||||
```
|
||||
|
||||
**Key differences from production config**: gRPC port is 50052 (not 50051), web port is 8081 (not 8080), log file prefix is `lmxproxy-v2-`.
|
||||
|
||||
### 1.4 Copy apikeys.json
|
||||
|
||||
If v2 should use the same API keys as v1:
|
||||
|
||||
```bash
|
||||
ssh windev "copy C:\publish\apikeys.json C:\publish-v2\apikeys.json"
|
||||
```
|
||||
|
||||
If `C:\publish\apikeys.json` doesn't exist (the v2 service will auto-generate one on first start):
|
||||
|
||||
```bash
|
||||
ssh windev "if not exist C:\publish\apikeys.json echo No existing apikeys.json - v2 will auto-generate"
|
||||
```
|
||||
|
||||
### 1.5 Verify the publish directory
|
||||
|
||||
```bash
|
||||
ssh windev "dir C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe && dir C:\publish-v2\appsettings.json"
|
||||
```
|
||||
|
||||
## Step 2: Deploy v2 Host Service
|
||||
|
||||
### 2.1 Install as a separate Topshelf service
|
||||
|
||||
The v2 service runs alongside v1 on different ports. Install with a distinct service name:
|
||||
|
||||
```bash
|
||||
ssh windev "C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe install -servicename \"ZB.MOM.WW.LmxProxy.Host.V2\" -displayname \"SCADA Bridge LMX Proxy V2\" -description \"LmxProxy v2 gRPC service (test deployment)\" --autostart"
|
||||
```
|
||||
|
||||
### 2.2 Start the v2 service
|
||||
|
||||
```bash
|
||||
ssh windev "sc start ZB.MOM.WW.LmxProxy.Host.V2"
|
||||
```
|
||||
|
||||
### 2.3 Wait 10 seconds for startup, then verify
|
||||
|
||||
```bash
|
||||
ssh windev "timeout /t 10 /nobreak >nul && sc query ZB.MOM.WW.LmxProxy.Host.V2"
|
||||
```
|
||||
|
||||
Expected: `STATE: 4 RUNNING`.
|
||||
|
||||
### 2.4 Verify status page
|
||||
|
||||
From Mac, use curl to check the v2 status page:
|
||||
|
||||
```bash
|
||||
curl -s http://10.100.0.48:8081/ | head -20
|
||||
```
|
||||
|
||||
Expected: HTML containing "LmxProxy Status Dashboard".
|
||||
|
||||
```bash
|
||||
curl -s http://10.100.0.48:8081/api/health
|
||||
```
|
||||
|
||||
Expected: `OK` with HTTP 200.
|
||||
|
||||
```bash
|
||||
curl -s http://10.100.0.48:8081/api/status | python3 -m json.tool | head -30
|
||||
```
|
||||
|
||||
Expected: JSON with `serviceName`, `connection.isConnected: true`, version info.
|
||||
|
||||
### 2.5 Verify MxAccess connected
|
||||
|
||||
The status page should show `MxAccess Connection: Connected`. If it shows `Disconnected`, check the logs:
|
||||
|
||||
```bash
|
||||
ssh windev "type C:\publish-v2\logs\lmxproxy-v2-*.txt | findstr /i \"error\""
|
||||
```
|
||||
|
||||
### 2.6 Read the apikeys.json to get test keys
|
||||
|
||||
```bash
|
||||
ssh windev "type C:\publish-v2\apikeys.json"
|
||||
```
|
||||
|
||||
Record the ReadWrite and ReadOnly API keys for use in integration tests. Example structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"Keys": [
|
||||
{ "Key": "abc123...", "Role": "ReadWrite", "Description": "Default ReadWrite key" },
|
||||
{ "Key": "def456...", "Role": "ReadOnly", "Description": "Default ReadOnly key" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Step 3: Create Integration Test Project
|
||||
|
||||
### 3.1 Create project
|
||||
|
||||
On windev (or Mac — the test project is .NET 10 and cross-platform):
|
||||
|
||||
```bash
|
||||
cd /Users/dohertj2/Desktop/scadalink-design/lmxproxy
|
||||
dotnet new xunit -n ZB.MOM.WW.LmxProxy.Client.IntegrationTests -o tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests --framework net10.0
|
||||
```
|
||||
|
||||
### 3.2 Configure csproj
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests.csproj`
|
||||
|
||||
```xml
|
||||
<Project Sdk="Microsoft.NET.Sdk">
|
||||
|
||||
<PropertyGroup>
|
||||
<TargetFramework>net10.0</TargetFramework>
|
||||
<LangVersion>latest</LangVersion>
|
||||
<Nullable>enable</Nullable>
|
||||
<IsPackable>false</IsPackable>
|
||||
</PropertyGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<PackageReference Include="xunit" Version="2.9.3" />
|
||||
<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2" />
|
||||
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />
|
||||
<PackageReference Include="Microsoft.Extensions.Configuration" Version="10.0.0" />
|
||||
<PackageReference Include="Microsoft.Extensions.Configuration.Json" Version="10.0.0" />
|
||||
</ItemGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<ProjectReference Include="..\..\src\ZB.MOM.WW.LmxProxy.Client\ZB.MOM.WW.LmxProxy.Client.csproj" />
|
||||
</ItemGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<None Update="appsettings.test.json">
|
||||
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
|
||||
</None>
|
||||
</ItemGroup>
|
||||
|
||||
</Project>
|
||||
```
|
||||
|
||||
### 3.3 Add to solution
|
||||
|
||||
Edit `ZB.MOM.WW.LmxProxy.slnx`:
|
||||
|
||||
```xml
|
||||
<Solution>
|
||||
<Folder Name="/src/">
|
||||
<Project Path="src/ZB.MOM.WW.LmxProxy.Host/ZB.MOM.WW.LmxProxy.Host.csproj" />
|
||||
<Project Path="src/ZB.MOM.WW.LmxProxy.Client/ZB.MOM.WW.LmxProxy.Client.csproj" />
|
||||
</Folder>
|
||||
<Folder Name="/tests/">
|
||||
<Project Path="tests/ZB.MOM.WW.LmxProxy.Host.Tests/ZB.MOM.WW.LmxProxy.Host.Tests.csproj" />
|
||||
<Project Path="tests/ZB.MOM.WW.LmxProxy.Client.Tests/ZB.MOM.WW.LmxProxy.Client.Tests.csproj" />
|
||||
<Project Path="tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests.csproj" />
|
||||
</Folder>
|
||||
</Solution>
|
||||
```
|
||||
|
||||
### 3.4 Create test configuration
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/appsettings.test.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"LmxProxy": {
|
||||
"Host": "10.100.0.48",
|
||||
"Port": 50052,
|
||||
"ReadWriteApiKey": "REPLACE_WITH_ACTUAL_KEY",
|
||||
"ReadOnlyApiKey": "REPLACE_WITH_ACTUAL_KEY",
|
||||
"InvalidApiKey": "invalid-key-that-does-not-exist"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**IMPORTANT**: After reading the actual `apikeys.json` from windev in Step 2.6, replace the placeholder values with the real keys.
|
||||
|
||||
### 3.5 Create test base class
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/IntegrationTestBase.cs`
|
||||
|
||||
```csharp
|
||||
using Microsoft.Extensions.Configuration;
|
||||
using ZB.MOM.WW.LmxProxy.Client;
|
||||
|
||||
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
|
||||
|
||||
public abstract class IntegrationTestBase : IAsyncLifetime
|
||||
{
|
||||
protected IConfiguration Configuration { get; }
|
||||
protected string Host { get; }
|
||||
protected int Port { get; }
|
||||
protected string ReadWriteApiKey { get; }
|
||||
protected string ReadOnlyApiKey { get; }
|
||||
protected string InvalidApiKey { get; }
|
||||
protected LmxProxyClient? Client { get; set; }
|
||||
|
||||
protected IntegrationTestBase()
|
||||
{
|
||||
Configuration = new ConfigurationBuilder()
|
||||
.AddJsonFile("appsettings.test.json")
|
||||
.Build();
|
||||
|
||||
var section = Configuration.GetSection("LmxProxy");
|
||||
Host = section["Host"] ?? "10.100.0.48";
|
||||
Port = int.Parse(section["Port"] ?? "50052");
|
||||
ReadWriteApiKey = section["ReadWriteApiKey"] ?? throw new Exception("ReadWriteApiKey not configured");
|
||||
ReadOnlyApiKey = section["ReadOnlyApiKey"] ?? throw new Exception("ReadOnlyApiKey not configured");
|
||||
InvalidApiKey = section["InvalidApiKey"] ?? "invalid-key";
|
||||
}
|
||||
|
||||
protected LmxProxyClient CreateClient(string? apiKey = null)
|
||||
{
|
||||
return new LmxProxyClientBuilder()
|
||||
.WithHost(Host)
|
||||
.WithPort(Port)
|
||||
.WithApiKey(apiKey ?? ReadWriteApiKey)
|
||||
.WithTimeout(TimeSpan.FromSeconds(10))
|
||||
.WithRetryPolicy(2, TimeSpan.FromSeconds(1))
|
||||
.WithMetrics()
|
||||
.Build();
|
||||
}
|
||||
|
||||
public virtual async Task InitializeAsync()
|
||||
{
|
||||
Client = CreateClient();
|
||||
await Client.ConnectAsync();
|
||||
}
|
||||
|
||||
public virtual async Task DisposeAsync()
|
||||
{
|
||||
if (Client is not null)
|
||||
{
|
||||
await Client.DisconnectAsync();
|
||||
Client.Dispose();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Step 4: Integration Test Scenarios
|
||||
|
||||
### 4.1 Connection Lifecycle
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ConnectionTests.cs`
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
|
||||
|
||||
public class ConnectionTests : IntegrationTestBase
|
||||
{
|
||||
[Fact]
|
||||
public async Task ConnectAndDisconnect_Succeeds()
|
||||
{
|
||||
// Client is connected in InitializeAsync
|
||||
Assert.True(await Client!.IsConnectedAsync());
|
||||
await Client.DisconnectAsync();
|
||||
Assert.False(await Client.IsConnectedAsync());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ConnectWithInvalidApiKey_Fails()
|
||||
{
|
||||
using var badClient = CreateClient(InvalidApiKey);
|
||||
// Expect RpcException with StatusCode.Unauthenticated
|
||||
var ex = await Assert.ThrowsAsync<Grpc.Core.RpcException>(
|
||||
() => badClient.ConnectAsync());
|
||||
Assert.Equal(Grpc.Core.StatusCode.Unauthenticated, ex.StatusCode);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task DoubleConnect_IsIdempotent()
|
||||
{
|
||||
await Client!.ConnectAsync(); // Already connected — should be no-op
|
||||
Assert.True(await Client.IsConnectedAsync());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Read Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/ReadTests.cs`
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
|
||||
|
||||
public class ReadTests : IntegrationTestBase
|
||||
{
|
||||
[Fact]
|
||||
public async Task Read_BoolTag_ReturnsBoolValue()
|
||||
{
|
||||
var vtq = await Client!.ReadAsync("TestChildObject.TestBool");
|
||||
Assert.IsType<bool>(vtq.Value);
|
||||
Assert.True(vtq.Quality.IsGood());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Read_IntTag_ReturnsIntValue()
|
||||
{
|
||||
var vtq = await Client!.ReadAsync("TestChildObject.TestInt");
|
||||
Assert.True(vtq.Value is int or long);
|
||||
Assert.True(vtq.Quality.IsGood());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Read_FloatTag_ReturnsFloatValue()
|
||||
{
|
||||
var vtq = await Client!.ReadAsync("TestChildObject.TestFloat");
|
||||
Assert.True(vtq.Value is float or double);
|
||||
Assert.True(vtq.Quality.IsGood());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Read_DoubleTag_ReturnsDoubleValue()
|
||||
{
|
||||
var vtq = await Client!.ReadAsync("TestChildObject.TestDouble");
|
||||
Assert.IsType<double>(vtq.Value);
|
||||
Assert.True(vtq.Quality.IsGood());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Read_StringTag_ReturnsStringValue()
|
||||
{
|
||||
var vtq = await Client!.ReadAsync("TestChildObject.TestString");
|
||||
Assert.IsType<string>(vtq.Value);
|
||||
Assert.True(vtq.Quality.IsGood());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Read_DateTimeTag_ReturnsDateTimeValue()
|
||||
{
|
||||
var vtq = await Client!.ReadAsync("TestChildObject.TestDateTime");
|
||||
Assert.IsType<DateTime>(vtq.Value);
|
||||
Assert.True(vtq.Quality.IsGood());
|
||||
Assert.True(DateTime.UtcNow - vtq.Timestamp < TimeSpan.FromHours(1));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ReadBatch_MultiplesTags_ReturnsDictionary()
|
||||
{
|
||||
var tags = new[] { "TestChildObject.TestString", "TestChildObject.TestString" };
|
||||
var results = await Client!.ReadBatchAsync(tags);
|
||||
Assert.Equal(2, results.Count);
|
||||
Assert.True(results.ContainsKey("TestChildObject.TestString"));
|
||||
Assert.True(results.ContainsKey("TestChildObject.TestString"));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Read_NonexistentTag_ReturnsBadQuality()
|
||||
{
|
||||
// Reading a tag that doesn't exist should return Bad quality
|
||||
// (or throw — depends on Host implementation. Adjust assertion accordingly.)
|
||||
var vtq = await Client!.ReadAsync("NonExistent.Tag.12345");
|
||||
// If the Host returns success=false, ReadAsync will throw.
|
||||
// If it returns success=true with bad quality, check quality.
|
||||
// Adjust based on actual behavior.
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Write Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteTests.cs`
|
||||
|
||||
```csharp
|
||||
using ZB.MOM.WW.LmxProxy.Client.Domain;
|
||||
|
||||
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
|
||||
|
||||
public class WriteTests : IntegrationTestBase
|
||||
{
|
||||
[Fact]
|
||||
public async Task WriteAndReadBack_StringValue()
|
||||
{
|
||||
string testValue = $"IntTest-{DateTime.UtcNow:HHmmss}";
|
||||
// Write to a writable string tag
|
||||
await Client!.WriteAsync("TestChildObject.TestString",
|
||||
new TypedValue { StringValue = testValue });
|
||||
|
||||
// Read back and verify
|
||||
await Task.Delay(500); // Allow time for write to propagate
|
||||
var vtq = await Client.ReadAsync("TestChildObject.TestString");
|
||||
Assert.Equal(testValue, vtq.Value);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteWithReadOnlyKey_ThrowsPermissionDenied()
|
||||
{
|
||||
using var readOnlyClient = CreateClient(ReadOnlyApiKey);
|
||||
await readOnlyClient.ConnectAsync();
|
||||
|
||||
var ex = await Assert.ThrowsAsync<Grpc.Core.RpcException>(
|
||||
() => readOnlyClient.WriteAsync("TestChildObject.TestString",
|
||||
new TypedValue { StringValue = "should-fail" }));
|
||||
Assert.Equal(Grpc.Core.StatusCode.PermissionDenied, ex.StatusCode);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 Subscribe Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/SubscribeTests.cs`
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
|
||||
|
||||
public class SubscribeTests : IntegrationTestBase
|
||||
{
|
||||
[Fact]
|
||||
public async Task Subscribe_ReceivesUpdates()
|
||||
{
|
||||
var received = new List<(string Tag, Vtq Vtq)>();
|
||||
var receivedEvent = new TaskCompletionSource<bool>();
|
||||
|
||||
var subscription = await Client!.SubscribeAsync(
|
||||
new[] { "TestChildObject.TestInt" },
|
||||
(tag, vtq) =>
|
||||
{
|
||||
received.Add((tag, vtq));
|
||||
if (received.Count >= 3)
|
||||
receivedEvent.TrySetResult(true);
|
||||
},
|
||||
ex => receivedEvent.TrySetException(ex));
|
||||
|
||||
// Wait up to 30 seconds for at least 3 updates
|
||||
var completed = await Task.WhenAny(receivedEvent.Task, Task.Delay(TimeSpan.FromSeconds(30)));
|
||||
subscription.Dispose();
|
||||
|
||||
Assert.True(received.Count >= 1, $"Expected at least 1 update, got {received.Count}");
|
||||
|
||||
// Verify the VTQ has correct structure
|
||||
var first = received[0];
|
||||
Assert.Equal("TestChildObject.TestInt", first.Tag);
|
||||
Assert.NotNull(first.Vtq.Value);
|
||||
// ScanTime should be a DateTime value
|
||||
Assert.True(first.Vtq.Timestamp > DateTime.MinValue);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.5 WriteBatchAndWait Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/WriteBatchAndWaitTests.cs`
|
||||
|
||||
```csharp
|
||||
using ZB.MOM.WW.LmxProxy.Client.Domain;
|
||||
|
||||
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
|
||||
|
||||
public class WriteBatchAndWaitTests : IntegrationTestBase
|
||||
{
|
||||
[Fact]
|
||||
public async Task WriteBatchAndWait_TypeAwareComparison()
|
||||
{
|
||||
// This test requires a writable tag and a flag tag.
|
||||
// Adjust tag names based on available tags in TestChildObject.
|
||||
// Example: write values and poll a flag.
|
||||
|
||||
var values = new Dictionary<string, TypedValue>
|
||||
{
|
||||
["TestChildObject.TestString"] = new TypedValue { StringValue = "BatchTest" }
|
||||
};
|
||||
|
||||
// Poll the same tag we wrote to (simple self-check)
|
||||
var response = await Client!.WriteBatchAndWaitAsync(
|
||||
values,
|
||||
flagTag: "TestChildObject.TestString",
|
||||
flagValue: new TypedValue { StringValue = "BatchTest" },
|
||||
timeoutMs: 5000,
|
||||
pollIntervalMs: 200);
|
||||
|
||||
Assert.True(response.Success);
|
||||
Assert.True(response.FlagReached);
|
||||
Assert.True(response.ElapsedMs < 5000);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.6 CheckApiKey Tests
|
||||
|
||||
**File**: `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/CheckApiKeyTests.cs`
|
||||
|
||||
```csharp
|
||||
namespace ZB.MOM.WW.LmxProxy.Client.IntegrationTests;
|
||||
|
||||
public class CheckApiKeyTests : IntegrationTestBase
|
||||
{
|
||||
[Fact]
|
||||
public async Task CheckApiKey_ValidReadWrite_ReturnsValid()
|
||||
{
|
||||
var info = await Client!.CheckApiKeyAsync(ReadWriteApiKey);
|
||||
Assert.True(info.IsValid);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task CheckApiKey_ValidReadOnly_ReturnsValid()
|
||||
{
|
||||
var info = await Client!.CheckApiKeyAsync(ReadOnlyApiKey);
|
||||
Assert.True(info.IsValid);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task CheckApiKey_Invalid_ReturnsInvalid()
|
||||
{
|
||||
var info = await Client!.CheckApiKeyAsync("totally-invalid-key-12345");
|
||||
Assert.False(info.IsValid);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Step 5: Run Integration Tests
|
||||
|
||||
### 5.1 Build the test project (from Mac)
|
||||
|
||||
```bash
|
||||
cd /Users/dohertj2/Desktop/scadalink-design/lmxproxy
|
||||
dotnet build tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests
|
||||
```
|
||||
|
||||
### 5.2 Run integration tests against v2 on alternate port
|
||||
|
||||
```bash
|
||||
dotnet test tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests --verbosity normal
|
||||
```
|
||||
|
||||
All tests should pass against `10.100.0.48:50052`.
|
||||
|
||||
### 5.3 Debug failures
|
||||
|
||||
If tests fail, check:
|
||||
1. v2 service is running: `ssh windev "sc query ZB.MOM.WW.LmxProxy.Host.V2"`
|
||||
2. v2 service logs: `ssh windev "type C:\publish-v2\logs\lmxproxy-v2-*.txt | findstr /i error"`
|
||||
3. Network connectivity: `curl -s http://10.100.0.48:8081/api/health`
|
||||
4. API keys match: `ssh windev "type C:\publish-v2\apikeys.json"`
|
||||
|
||||
### 5.4 Verify metrics after test run
|
||||
|
||||
```bash
|
||||
curl -s http://10.100.0.48:8081/api/status | python3 -m json.tool
|
||||
```
|
||||
|
||||
Should show non-zero operation counts for Read, ReadBatch, Write, etc.
|
||||
|
||||
## Step 6: Cutover
|
||||
|
||||
**Only proceed if ALL integration tests pass.**
|
||||
|
||||
### 6.1 Stop v1 service
|
||||
|
||||
```bash
|
||||
ssh windev "sc stop ZB.MOM.WW.LmxProxy.Host"
|
||||
```
|
||||
|
||||
Verify stopped:
|
||||
|
||||
```bash
|
||||
ssh windev "sc query ZB.MOM.WW.LmxProxy.Host"
|
||||
```
|
||||
|
||||
Expected: `STATE: 1 STOPPED`.
|
||||
|
||||
### 6.2 Stop v2 service
|
||||
|
||||
```bash
|
||||
ssh windev "sc stop ZB.MOM.WW.LmxProxy.Host.V2"
|
||||
```
|
||||
|
||||
### 6.3 Reconfigure v2 to production ports
|
||||
|
||||
Update `C:\publish-v2\appsettings.json`:
|
||||
- Change `GrpcPort` from `50052` to `50051`
|
||||
- Change `WebServer.Port` from `8081` to `8080`
|
||||
- Change log file prefix from `lmxproxy-v2-` to `lmxproxy-`
|
||||
|
||||
```bash
|
||||
ssh windev "powershell -Command \"(Get-Content 'C:\publish-v2\appsettings.json') -replace '50052','50051' -replace '8081','8080' -replace 'lmxproxy-v2-','lmxproxy-' | Set-Content 'C:\publish-v2\appsettings.json'\""
|
||||
```
|
||||
|
||||
### 6.4 Uninstall v1 service
|
||||
|
||||
```bash
|
||||
ssh windev "C:\publish\ZB.MOM.WW.LmxProxy.Host.exe uninstall -servicename \"ZB.MOM.WW.LmxProxy.Host\""
|
||||
```
|
||||
|
||||
### 6.5 Uninstall v2 test service and reinstall as production service
|
||||
|
||||
```bash
|
||||
ssh windev "C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe uninstall -servicename \"ZB.MOM.WW.LmxProxy.Host.V2\""
|
||||
```
|
||||
|
||||
```bash
|
||||
ssh windev "C:\publish-v2\ZB.MOM.WW.LmxProxy.Host.exe install -servicename \"ZB.MOM.WW.LmxProxy.Host\" -displayname \"SCADA Bridge LMX Proxy\" -description \"LmxProxy v2 gRPC service\" --autostart"
|
||||
```
|
||||
|
||||
### 6.6 Start the production service
|
||||
|
||||
```bash
|
||||
ssh windev "sc start ZB.MOM.WW.LmxProxy.Host"
|
||||
```
|
||||
|
||||
### 6.7 Verify on production ports
|
||||
|
||||
```bash
|
||||
ssh windev "timeout /t 10 /nobreak >nul && sc query ZB.MOM.WW.LmxProxy.Host"
|
||||
```
|
||||
|
||||
Expected: `STATE: 4 RUNNING`.
|
||||
|
||||
```bash
|
||||
curl -s http://10.100.0.48:8080/api/health
|
||||
```
|
||||
|
||||
Expected: `OK`.
|
||||
|
||||
```bash
|
||||
curl -s http://10.100.0.48:8080/api/status | python3 -m json.tool | head -15
|
||||
```
|
||||
|
||||
Expected: Connected, version shows v2.
|
||||
|
||||
### 6.8 Update test configuration and re-run integration tests
|
||||
|
||||
Update `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/appsettings.test.json`:
|
||||
- Change `Port` from `50052` to `50051`
|
||||
|
||||
```bash
|
||||
dotnet test tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests --verbosity normal
|
||||
```
|
||||
|
||||
All tests should pass on the production port.
|
||||
|
||||
### 6.9 Configure service recovery
|
||||
|
||||
```bash
|
||||
ssh windev "sc failure ZB.MOM.WW.LmxProxy.Host reset= 86400 actions= restart/60000/restart/300000/restart/600000"
|
||||
```
|
||||
|
||||
This configures: restart after 1 min on first failure, 5 min on second, 10 min on subsequent. Reset counter after 1 day (86400 seconds).
|
||||
|
||||
## Step 7: Documentation Updates
|
||||
|
||||
### 7.1 Update windev.md
|
||||
|
||||
Add a section about the LmxProxy v2 service to `/Users/dohertj2/Desktop/scadalink-design/windev.md`:
|
||||
|
||||
```markdown
|
||||
## LmxProxy v2
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Service Name | ZB.MOM.WW.LmxProxy.Host |
|
||||
| Display Name | SCADA Bridge LMX Proxy |
|
||||
| gRPC Port | 50051 |
|
||||
| Status Page | http://10.100.0.48:8080/ |
|
||||
| Health Endpoint | http://10.100.0.48:8080/api/health |
|
||||
| Publish Directory | C:\publish-v2\ |
|
||||
| API Keys | C:\publish-v2\apikeys.json |
|
||||
| Logs | C:\publish-v2\logs\ |
|
||||
| Protocol | v2 (TypedValue + QualityCode) |
|
||||
```
|
||||
|
||||
### 7.2 Update lmxproxy CLAUDE.md
|
||||
|
||||
If `lmxproxy/CLAUDE.md` references v1 behavior, update:
|
||||
- Change "currently v1 protocol" references to "v2 protocol"
|
||||
- Update publish directory references from `C:\publish\` to `C:\publish-v2\`
|
||||
- Update any value conversion notes (no more string heuristics)
|
||||
|
||||
### 7.3 Clean up v1 publish directory (optional)
|
||||
|
||||
```bash
|
||||
ssh windev "if exist C:\publish\ ren C:\publish publish-v1-backup"
|
||||
```
|
||||
|
||||
## Step 8: Veeam Backup
|
||||
|
||||
### 8.1 Take incremental backup
|
||||
|
||||
```bash
|
||||
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; Start-VBRJob -Job (Get-VBRJob -Name 'Backup WW_DEV_VM')\""
|
||||
```
|
||||
|
||||
### 8.2 Wait for backup to complete (check status)
|
||||
|
||||
```bash
|
||||
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; (Get-VBRJob -Name 'Backup WW_DEV_VM').FindLastSession() | Select-Object State, Result, CreationTime, EndTime\""
|
||||
```
|
||||
|
||||
Expected: `State: Stopped, Result: Success`.
|
||||
|
||||
### 8.3 Get the restore point ID
|
||||
|
||||
```bash
|
||||
ssh dohertj2@10.100.0.30 "powershell -Command \"Add-PSSnapin VeeamPSSnapin; Connect-VBRServer -Server localhost; Get-VBRRestorePoint -Backup (Get-VBRBackup -Name 'Backup WW_DEV_VM') | Select-Object Id, CreationTime, Type, @{N='SizeGB';E={[math]::Round(\`$_.ApproxSize/1GB,2)}} | Format-Table -AutoSize\""
|
||||
```
|
||||
|
||||
### 8.4 Record in windev.md
|
||||
|
||||
Add a new row to the Restore Points table in `windev.md`:
|
||||
|
||||
```markdown
|
||||
| `XXXXXXXX` | 2026-XX-XX XX:XX | Increment | **Post-v2 deployment** — LmxProxy v2 live on port 50051 |
|
||||
```
|
||||
|
||||
Replace placeholders with actual restore point ID and timestamp.
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
- [ ] v2 Host binary published to `C:\publish-v2\` on windev
|
||||
- [ ] v2 service installed and running on alternate ports (50052/8081) — verified via status page
|
||||
- [ ] Integration test project created at `tests/ZB.MOM.WW.LmxProxy.Client.IntegrationTests/`
|
||||
- [ ] All integration tests pass against v2 on alternate ports:
|
||||
- [ ] Connect/disconnect lifecycle
|
||||
- [ ] Read string tag `TestChildObject.TestString` — value "JoeDev", Good quality
|
||||
- [ ] Read writable tag `TestChildObject.TestString`
|
||||
- [ ] Write string then read-back verification
|
||||
- [ ] ReadBatch multiple tags
|
||||
- [ ] Subscribe to `TestChildObject.TestInt` — verify updates received with TypedValue + QualityCode
|
||||
- [ ] WriteBatchAndWait with type-aware flag comparison
|
||||
- [ ] CheckApiKey — valid ReadWrite, valid ReadOnly, invalid
|
||||
- [ ] Write with ReadOnly key — PermissionDenied
|
||||
- [ ] Connect with invalid API key — Unauthenticated
|
||||
- [ ] v1 service stopped and uninstalled
|
||||
- [ ] v2 service reconfigured to production ports (50051/8080) and reinstalled
|
||||
- [ ] All integration tests pass on production ports
|
||||
- [ ] Service recovery configured (restart on failure)
|
||||
- [ ] `windev.md` updated with v2 service details
|
||||
- [ ] `lmxproxy/CLAUDE.md` updated for v2
|
||||
- [ ] Veeam backup taken and restore point ID recorded in `windev.md`
|
||||
- [ ] v1 publish directory backed up or removed
|
||||
200
deprecated/lmxproxy/docs/requirements/Component-Client.md
Normal file
200
deprecated/lmxproxy/docs/requirements/Component-Client.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# Component: Client
|
||||
|
||||
## Purpose
|
||||
|
||||
A .NET 10 class library providing a typed gRPC client for consuming the LmxProxy service. Used by ScadaLink's Data Connection Layer to connect to AVEVA System Platform via the LmxProxy Host.
|
||||
|
||||
## Location
|
||||
|
||||
`src/ZB.MOM.WW.LmxProxy.Client/` — all files in this project.
|
||||
|
||||
Key files:
|
||||
- `ILmxProxyClient.cs` — public interface.
|
||||
- `LmxProxyClient.cs` — main implementation (partial class across multiple files).
|
||||
- `LmxProxyClientBuilder.cs` — fluent builder for client construction.
|
||||
- `ServiceCollectionExtensions.cs` — DI integration and options classes.
|
||||
- `ILmxProxyClientFactory.cs` — factory interface and implementation.
|
||||
- `StreamingExtensions.cs` — batch and parallel streaming helpers.
|
||||
- `Domain/ScadaContracts.cs` — code-first gRPC contracts.
|
||||
- `Security/GrpcChannelFactory.cs` — TLS channel creation.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Connect to and communicate with the LmxProxy Host gRPC service.
|
||||
- Manage session lifecycle (connect, keep-alive, disconnect).
|
||||
- Execute read, write, and subscribe operations with retry and concurrency control.
|
||||
- Provide a fluent builder and DI integration for configuration.
|
||||
- Track client-side performance metrics.
|
||||
- Support TLS and mutual TLS connections.
|
||||
|
||||
## 1. Public Interface (ILmxProxyClient)
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `ConnectAsync(ct)` | Establish gRPC channel and session |
|
||||
| `DisconnectAsync()` | Graceful disconnect |
|
||||
| `IsConnectedAsync()` | Thread-safe connection state check |
|
||||
| `ReadAsync(address, ct)` | Read single tag, returns Vtq |
|
||||
| `ReadBatchAsync(addresses, ct)` | Read multiple tags, returns dictionary |
|
||||
| `WriteAsync(address, value, ct)` | Write single tag value |
|
||||
| `WriteBatchAsync(values, ct)` | Write multiple tag values |
|
||||
| `SubscribeAsync(addresses, onUpdate, onStreamError, ct)` | Subscribe to tag updates with value and error callbacks |
|
||||
| `GetMetrics()` | Return operation counts, errors, latency stats |
|
||||
| `DefaultTimeout` | Configurable timeout (default 30s, range 1s–10min) |
|
||||
|
||||
Implements `IDisposable` and `IAsyncDisposable`.
|
||||
|
||||
## 2. Connection Management
|
||||
|
||||
### 2.1 Connect
|
||||
|
||||
`ConnectAsync()`:
|
||||
1. Creates a gRPC channel via `GrpcChannelFactory` (HTTP or HTTPS based on TLS config).
|
||||
2. Creates a `protobuf-net.Grpc` client for `IScadaService`.
|
||||
3. Calls the `Connect` RPC with a client ID (format: `ScadaBridge-{guid}`) and optional API key.
|
||||
4. Stores the returned session ID.
|
||||
5. Starts the keep-alive timer.
|
||||
|
||||
### 2.2 Keep-Alive
|
||||
|
||||
- Timer-based ping every **30 seconds** (hardcoded).
|
||||
- Sends a lightweight `GetConnectionState` RPC.
|
||||
- On failure: stops the timer, marks disconnected, triggers subscription cleanup.
|
||||
|
||||
### 2.3 Disconnect
|
||||
|
||||
`DisconnectAsync()`:
|
||||
1. Stops keep-alive timer.
|
||||
2. Calls `Disconnect` RPC.
|
||||
3. Clears session ID.
|
||||
4. Disposes gRPC channel.
|
||||
|
||||
### 2.4 Connection State
|
||||
|
||||
`IsConnected` property: `!_disposed && _isConnected && !string.IsNullOrEmpty(_sessionId)`.
|
||||
|
||||
## 3. Builder Pattern (LmxProxyClientBuilder)
|
||||
|
||||
| Method | Default | Constraint |
|
||||
|--------|---------|-----------|
|
||||
| `WithHost(string)` | Required | Non-null/non-empty |
|
||||
| `WithPort(int)` | 5050 | 1–65535 |
|
||||
| `WithApiKey(string?)` | null | Optional |
|
||||
| `WithTimeout(TimeSpan)` | 30 seconds | > 0 and ≤ 10 minutes |
|
||||
| `WithLogger(ILogger)` | NullLogger | Optional |
|
||||
| `WithSslCredentials(string?)` | Disabled | Optional cert path |
|
||||
| `WithTlsConfiguration(ClientTlsConfiguration)` | null | Full TLS config |
|
||||
| `WithRetryPolicy(int, TimeSpan)` | 3 attempts, 1s delay | maxAttempts > 0, delay > 0 |
|
||||
| `WithMetrics()` | Disabled | Enables metric collection |
|
||||
| `WithCorrelationIdHeader(string)` | null | Custom header name |
|
||||
|
||||
## 4. Retry Policy
|
||||
|
||||
Polly-based exponential backoff:
|
||||
- Default: **3 attempts** with **1-second** initial delay.
|
||||
- Backoff sequence: `delay * 2^(retryAttempt - 1)` → 1s, 2s, 4s.
|
||||
- Transient errors retried: `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, `Aborted`.
|
||||
- Each retry is logged with correlation ID at Warning level.
|
||||
|
||||
## 5. Subscription
|
||||
|
||||
### 5.1 Subscribe API
|
||||
|
||||
`SubscribeAsync(addresses, onUpdate, onStreamError, ct)` returns an `ISubscription`:
|
||||
- Calls the `Subscribe` RPC (server streaming) with the tag list and default sampling interval (**1000ms**).
|
||||
- Processes streamed `VtqMessage` items asynchronously, invoking the `onUpdate(tag, vtq)` callback for each.
|
||||
- On stream termination (server disconnect, gRPC error, or connection drop), invokes the `onStreamError` callback exactly once.
|
||||
- On stream error, the client immediately nullifies its session ID, causing `IsConnected` to return `false`. This triggers the DCL adapter's `Disconnected` event and reconnection cycle.
|
||||
- Errors are logged per-subscription.
|
||||
|
||||
### 5.2 ISubscription
|
||||
|
||||
- `Dispose()` — synchronous disposal with **5-second** timeout.
|
||||
- Automatic callback on disposal for cleanup.
|
||||
|
||||
## 6. DI Integration
|
||||
|
||||
### 6.1 Service Collection Extensions
|
||||
|
||||
| Method | Lifetime | Description |
|
||||
|--------|----------|-------------|
|
||||
| `AddLmxProxyClient(IConfiguration)` | Singleton | Bind `LmxProxy` config section |
|
||||
| `AddLmxProxyClient(IConfiguration, string)` | Singleton | Bind named config section |
|
||||
| `AddLmxProxyClient(Action<Builder>)` | Singleton | Builder action |
|
||||
| `AddScopedLmxProxyClient(IConfiguration)` | Scoped | Per-scope lifetime |
|
||||
| `AddNamedLmxProxyClient(string, Action<Builder>)` | Keyed singleton | Named/keyed registration |
|
||||
|
||||
### 6.2 Configuration Options (LmxProxyClientOptions)
|
||||
|
||||
Bound from `appsettings.json`:
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| Host | `localhost` | Server hostname |
|
||||
| Port | 5050 | Server port |
|
||||
| ApiKey | null | API key |
|
||||
| Timeout | 30 seconds | Operation timeout |
|
||||
| UseSsl | false | Enable TLS |
|
||||
| CertificatePath | null | SSL certificate path |
|
||||
| EnableMetrics | false | Enable client metrics |
|
||||
| CorrelationIdHeader | null | Custom correlation header |
|
||||
| Retry:MaxAttempts | 3 | Retry attempts |
|
||||
| Retry:Delay | 1 second | Initial retry delay |
|
||||
|
||||
### 6.3 Factory Pattern
|
||||
|
||||
`ILmxProxyClientFactory` creates configured clients:
|
||||
- `CreateClient()` — uses default `LmxProxy` config section.
|
||||
- `CreateClient(string)` — uses named config section.
|
||||
- `CreateClient(Action<Builder>)` — uses builder action.
|
||||
|
||||
Registered as singleton in DI.
|
||||
|
||||
## 7. Streaming Extensions
|
||||
|
||||
Helper methods for large-scale batch operations:
|
||||
|
||||
| Method | Default Batch Size | Description |
|
||||
|--------|--------------------|-------------|
|
||||
| `ReadStreamAsync` | 100 | Batched reads, 2 retries per batch, stops after 3 consecutive errors. Returns `IAsyncEnumerable<KeyValuePair<string, Vtq>>`. |
|
||||
| `WriteStreamAsync` | 100 | Batched writes from async enumerable input. Returns total count written. |
|
||||
| `ProcessInParallelAsync` | — | Parallel processing with max concurrency of **4** (configurable). Semaphore-based rate limiting. |
|
||||
| `SubscribeStreamAsync` | — | Wraps callback-based subscription into `IAsyncEnumerable<Vtq>` via `System.Threading.Channels`. |
|
||||
|
||||
## 8. Client Metrics
|
||||
|
||||
When metrics are enabled (`WithMetrics()`):
|
||||
- Per-operation tracking: counts, error counts, latency.
|
||||
- Rolling buffer of **1000** latency samples per operation (prevents memory growth).
|
||||
- Snapshot via `GetMetrics()` returns: `{op}_count`, `{op}_errors`, `{op}_avg_latency_ms`, `{op}_p95_latency_ms`, `{op}_p99_latency_ms`.
|
||||
|
||||
## 9. Value and Quality Handling
|
||||
|
||||
### 9.1 Values (TypedValue)
|
||||
|
||||
Read responses and subscription updates return values as `TypedValue` (protobuf oneof). The client extracts the value directly from the appropriate oneof field (e.g., `vtq.Value.DoubleValue`, `vtq.Value.BoolValue`). Write operations construct `TypedValue` with the correct oneof case for the value's native type. No string serialization or parsing is needed.
|
||||
|
||||
### 9.2 Quality (QualityCode)
|
||||
|
||||
Quality is received as a `QualityCode` message. Category checks use bitmask: `IsGood = (statusCode & 0xC0000000) == 0x00000000`, `IsBad = (statusCode & 0xC0000000) == 0x80000000`. The `symbolic_name` field provides human-readable quality for logging and display.
|
||||
|
||||
### 9.3 Current Implementation (V1 Legacy)
|
||||
|
||||
The current codebase still uses v1 string-based encoding. During v2 migration, the following will be removed:
|
||||
- `ConvertToVtq()` — parses string values via heuristic (double → bool → null → raw string).
|
||||
- `ConvertToString()` — serializes values via `.ToString()`.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **protobuf-net.Grpc** — code-first gRPC client.
|
||||
- **Grpc.Net.Client** — HTTP/2 gRPC transport.
|
||||
- **Polly** — retry policies.
|
||||
- **Microsoft.Extensions.DependencyInjection** — DI integration.
|
||||
- **Microsoft.Extensions.Configuration** — options binding.
|
||||
- **Microsoft.Extensions.Logging** — logging abstraction.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **ScadaLink Data Connection Layer** consumes the client library via `ILmxProxyClient`.
|
||||
- **Protocol** — the client uses code-first contracts (`IScadaService`) that are wire-compatible with the Host's proto-generated service.
|
||||
- **Security** — `GrpcChannelFactory` creates TLS-configured channels matching the Host's TLS configuration.
|
||||
122
deprecated/lmxproxy/docs/requirements/Component-Configuration.md
Normal file
122
deprecated/lmxproxy/docs/requirements/Component-Configuration.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Component: Configuration
|
||||
|
||||
## Purpose
|
||||
|
||||
Defines the `appsettings.json` structure, configuration binding, and startup validation for the LmxProxy Host service.
|
||||
|
||||
## Location
|
||||
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Configuration/LmxProxyConfiguration.cs` — root configuration class.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Configuration/ConfigurationValidator.cs` — validation logic.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/appsettings.json` — default configuration file.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Define all configurable settings as strongly-typed classes.
|
||||
- Bind `appsettings.json` sections to configuration objects via `Microsoft.Extensions.Configuration`.
|
||||
- Validate all settings at startup, failing fast on invalid values.
|
||||
- Support environment variable overrides.
|
||||
|
||||
## 1. Configuration Structure
|
||||
|
||||
### 1.1 Root: LmxProxyConfiguration
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| GrpcPort | int | 50051 | gRPC server listen port |
|
||||
| ApiKeyConfigFile | string | `apikeys.json` | Path to API key configuration file |
|
||||
| Subscription | SubscriptionConfiguration | — | Subscription channel settings |
|
||||
| ServiceRecovery | ServiceRecoveryConfiguration | — | Windows SCM recovery settings |
|
||||
| Connection | ConnectionConfiguration | — | MxAccess connection settings |
|
||||
| Tls | TlsConfiguration | — | TLS/SSL settings |
|
||||
| WebServer | WebServerConfiguration | — | Status web server settings |
|
||||
|
||||
### 1.2 ConnectionConfiguration
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| MonitorIntervalSeconds | int | 5 | Auto-reconnect check interval |
|
||||
| ConnectionTimeoutSeconds | int | 30 | Initial connection timeout |
|
||||
| ReadTimeoutSeconds | int | 5 | Per-read operation timeout |
|
||||
| WriteTimeoutSeconds | int | 5 | Per-write operation timeout |
|
||||
| MaxConcurrentOperations | int | 10 | Semaphore limit for concurrent MxAccess operations |
|
||||
| AutoReconnect | bool | true | Enable auto-reconnect loop |
|
||||
| NodeName | string? | null | MxAccess node name (optional) |
|
||||
| GalaxyName | string? | null | MxAccess galaxy name (optional) |
|
||||
|
||||
### 1.3 SubscriptionConfiguration
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| ChannelCapacity | int | 1000 | Per-client subscription buffer size |
|
||||
| ChannelFullMode | string | `DropOldest` | Backpressure strategy: `DropOldest`, `DropNewest`, `Wait` |
|
||||
|
||||
### 1.4 TlsConfiguration
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| Enabled | bool | false | Enable TLS on gRPC server |
|
||||
| ServerCertificatePath | string | `certs/server.crt` | PEM server certificate |
|
||||
| ServerKeyPath | string | `certs/server.key` | PEM server private key |
|
||||
| ClientCaCertificatePath | string | `certs/ca.crt` | CA certificate for mTLS |
|
||||
| RequireClientCertificate | bool | false | Require client certificates |
|
||||
| CheckCertificateRevocation | bool | false | Enable CRL checking |
|
||||
|
||||
### 1.5 WebServerConfiguration
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| Enabled | bool | true | Enable status web server |
|
||||
| Port | int | 8080 | HTTP listen port |
|
||||
| Prefix | string? | null | Custom URL prefix (defaults to `http://+:{Port}/`) |
|
||||
|
||||
### 1.6 ServiceRecoveryConfiguration
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| FirstFailureDelayMinutes | int | 1 | Restart delay after first failure |
|
||||
| SecondFailureDelayMinutes | int | 5 | Restart delay after second failure |
|
||||
| SubsequentFailureDelayMinutes | int | 10 | Restart delay after subsequent failures |
|
||||
| ResetPeriodDays | int | 1 | Days before failure count resets |
|
||||
|
||||
## 2. Validation
|
||||
|
||||
`ConfigurationValidator.ValidateAndLog()` runs at startup and checks:
|
||||
|
||||
- **GrpcPort**: Must be 1–65535.
|
||||
- **Connection**: All timeout values > 0. NodeName and GalaxyName ≤ 255 characters.
|
||||
- **Subscription**: ChannelCapacity 0–100000. ChannelFullMode must be one of `DropOldest`, `DropNewest`, `Wait`.
|
||||
- **ServiceRecovery**: All failure delay values ≥ 0. ResetPeriodDays > 0.
|
||||
- **TLS**: If enabled, validates certificate file paths exist.
|
||||
|
||||
Validation errors are logged and cause the service to throw `InvalidOperationException`, preventing startup.
|
||||
|
||||
## 3. Configuration Sources
|
||||
|
||||
Configuration is loaded via `Microsoft.Extensions.Configuration.ConfigurationBuilder`:
|
||||
1. `appsettings.json` (required).
|
||||
2. Environment variables (override any JSON setting).
|
||||
|
||||
## 4. Serilog Configuration
|
||||
|
||||
Logging is configured in the `Serilog` section of `appsettings.json`:
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Console sink | ANSI theme, custom template with HH:mm:ss timestamp |
|
||||
| File sink | `logs/lmxproxy-.txt`, daily rolling, 30 files retained |
|
||||
| Default level | Information |
|
||||
| Override: Microsoft | Warning |
|
||||
| Override: System | Warning |
|
||||
| Override: Grpc | Information |
|
||||
| Enrichment | FromLogContext, WithMachineName, WithThreadId |
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Microsoft.Extensions.Configuration** — configuration binding.
|
||||
- **Serilog.Settings.Configuration** — Serilog configuration from appsettings.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **ServiceHost** (Program.cs) loads and validates configuration at startup.
|
||||
- All other components receive their settings from the bound configuration objects.
|
||||
@@ -0,0 +1,86 @@
|
||||
# Component: GrpcServer
|
||||
|
||||
## Purpose
|
||||
|
||||
The gRPC service implementation that receives client RPCs, validates sessions, and delegates operations to the MxAccessClient. It is the network-facing entry point for all SCADA operations.
|
||||
|
||||
## Location
|
||||
|
||||
`src/ZB.MOM.WW.LmxProxy.Host/Grpc/ScadaGrpcService.cs` — inherits proto-generated `ScadaService.ScadaServiceBase`.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Implement all 10 gRPC RPCs defined in `scada.proto`.
|
||||
- Validate session IDs on all data operations before processing.
|
||||
- Delegate read/write/subscribe operations to the MxAccessClient.
|
||||
- Convert between gRPC message types and internal domain types (Vtq, Quality).
|
||||
- Track operation timing and success/failure via PerformanceMetrics.
|
||||
- Handle errors gracefully, returning structured error responses rather than throwing.
|
||||
|
||||
## 1. RPC Implementations
|
||||
|
||||
### 1.1 Connection Management
|
||||
|
||||
- **Connect**: Creates a new session via SessionManager if MxAccess is connected. Returns the session ID (32-character hex GUID). Rejects if MxAccess is disconnected.
|
||||
- **Disconnect**: Terminates the session via SessionManager.
|
||||
- **GetConnectionState**: Returns `IsConnected`, `ClientId`, and `ConnectedSinceUtcTicks` from the MxAccessClient.
|
||||
|
||||
### 1.2 Read Operations
|
||||
|
||||
- **Read**: Validates session, applies Polly retry policy, calls MxAccessClient.ReadAsync(), returns VtqMessage. On invalid session, returns a VtqMessage with `Quality.Bad`.
|
||||
- **ReadBatch**: Validates session, reads all tags via MxAccessClient.ReadBatchAsync() with semaphore-controlled concurrency (max 10 concurrent). Returns results in request order. Batch reads are partially successful — individual tags may have Bad quality (with current UTC timestamp) while the overall response succeeds. If a tag read throws an exception, its VTQ is returned with Bad quality.
|
||||
|
||||
### 1.3 Write Operations
|
||||
|
||||
- **Write**: Validates session, parses the string value using the type heuristic, calls MxAccessClient.WriteAsync().
|
||||
- **WriteBatch**: Validates session, writes all items in parallel via MxAccessClient with semaphore concurrency control. Returns per-item success/failure results. Overall `success` is `false` if any item fails (all-or-nothing at the reporting level).
|
||||
- **WriteBatchAndWait**: Validates session, writes all items first. If any write fails, returns immediately with `success=false`. If writes succeed, polls `flag_tag` at `poll_interval_ms` intervals using type-aware `TypedValueEquals()` comparison (same oneof case required, native type equality, case-sensitive strings, null equals null only). Default timeout: 5000ms, default poll interval: 100ms. If flag matches before timeout: `success=true`, `flag_reached=true`. If timeout expires: `success=true`, `flag_reached=false` (timeout is not an error). Returns `flag_reached` boolean and `elapsed_ms`.
|
||||
|
||||
### 1.4 Subscription
|
||||
|
||||
- **Subscribe**: Validates session (throws `RpcException(Unauthenticated)` on invalid). Creates a subscription handle via SubscriptionManager. Streams VtqMessage items from the subscription channel to the client. Cleans up the subscription on stream cancellation or error.
|
||||
|
||||
### 1.5 API Key Check
|
||||
|
||||
- **CheckApiKey**: Returns validity and role information from the interceptor context.
|
||||
|
||||
## 2. Value and Quality Handling
|
||||
|
||||
### 2.1 Values (TypedValue)
|
||||
|
||||
Read responses and subscription updates return values as `TypedValue` (protobuf oneof carrying native types). Write requests receive `TypedValue` and apply the value directly to MxAccess by its native type. If the `oneof` case doesn't match the tag's expected data type, the write returns `WriteResult` with `success=false` indicating type mismatch. No string serialization or parsing heuristics are used.
|
||||
|
||||
### 2.2 Quality (QualityCode)
|
||||
|
||||
Quality is returned as a `QualityCode` message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. The server maps MxAccess quality codes to OPC UA status codes per the quality table in Component-Protocol. Specific error scenarios return specific quality codes (e.g., tag not found → `BadConfigurationError`, comms loss → `BadCommunicationFailure`).
|
||||
|
||||
### 2.3 Current Implementation (V1 Legacy)
|
||||
|
||||
The current codebase still uses v1 string-based encoding. During v2 migration, the following v1 behavior will be removed:
|
||||
- `ConvertValueToString()` — serializes values to strings (bool → lowercase, DateTime → ISO-8601, arrays → JSON, others → `.ToString()`).
|
||||
- `ParseValue()` — parses string values in order: bool → int → long → double → DateTime → raw string.
|
||||
- Three-state string quality mapping: ≥192 → `"Good"`, 64–191 → `"Uncertain"`, <64 → `"Bad"`.
|
||||
|
||||
## 3. Error Handling
|
||||
|
||||
- All RPC methods catch exceptions and return error responses with `success=false` and a descriptive message. Exceptions do not propagate as gRPC status codes (except Subscribe, which throws `RpcException` for invalid sessions).
|
||||
- Each operation is wrapped in a PerformanceMetrics timing scope that records duration and success/failure.
|
||||
|
||||
## 4. Session Validation
|
||||
|
||||
- All data operations (Read, ReadBatch, Write, WriteBatch, WriteBatchAndWait, Subscribe) validate the session ID before processing.
|
||||
- Invalid session on read/write operations returns a response with Bad quality VTQ.
|
||||
- Invalid session on Subscribe throws `RpcException` with `StatusCode.Unauthenticated`.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **MxAccessClient** (IScadaClient) — all SCADA operations are delegated here.
|
||||
- **SessionManager** — session creation, validation, and termination.
|
||||
- **SubscriptionManager** — subscription lifecycle for the Subscribe RPC.
|
||||
- **PerformanceMetrics** — operation timing and success/failure tracking.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **ApiKeyInterceptor** intercepts all RPCs before they reach ScadaGrpcService, enforcing API key authentication and role-based write authorization.
|
||||
- **SubscriptionManager** provides the channel that Subscribe streams from.
|
||||
- **StatusReportService** reads PerformanceMetrics data that ScadaGrpcService populates.
|
||||
@@ -0,0 +1,121 @@
|
||||
# Component: HealthAndMetrics
|
||||
|
||||
## Purpose
|
||||
|
||||
Provides health checking, performance metrics collection, and an HTTP status dashboard for monitoring the LmxProxy service.
|
||||
|
||||
## Location
|
||||
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs` — basic health check.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs` — detailed health check with test tag read.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs` — operation metrics collection.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs` — status report generation.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs` — HTTP status endpoint.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Evaluate service health based on connection state, operation success rates, and test tag reads.
|
||||
- Track per-operation performance metrics (counts, latencies, percentiles).
|
||||
- Serve an HTML status dashboard and JSON/health HTTP endpoints.
|
||||
- Report metrics to logs on a periodic interval.
|
||||
|
||||
## 1. Health Checks
|
||||
|
||||
### 1.1 Basic Health Check (HealthCheckService)
|
||||
|
||||
`CheckHealthAsync()` evaluates:
|
||||
|
||||
| Check | Healthy | Degraded |
|
||||
|-------|---------|----------|
|
||||
| MxAccess connected | Yes | — |
|
||||
| Success rate (if > 100 total ops) | ≥ 50% | < 50% |
|
||||
| Client count | ≤ 100 | > 100 |
|
||||
|
||||
Returns health data dictionary: `scada_connected`, `scada_connection_state`, `total_clients`, `total_tags`, `total_operations`, `average_success_rate`.
|
||||
|
||||
### 1.2 Detailed Health Check (DetailedHealthCheckService)
|
||||
|
||||
`CheckHealthAsync()` performs an active probe:
|
||||
|
||||
1. Checks `IsConnected` — returns **Unhealthy** if not connected.
|
||||
2. Reads a test tag (default `System.Heartbeat`).
|
||||
3. If test tag quality is not Good — returns **Degraded**.
|
||||
4. If test tag timestamp is older than **5 minutes** — returns **Degraded** (stale data detection).
|
||||
5. Otherwise returns **Healthy**.
|
||||
|
||||
## 2. Performance Metrics
|
||||
|
||||
### 2.1 Tracking
|
||||
|
||||
`PerformanceMetrics` uses a `ConcurrentDictionary<string, OperationMetrics>` to track operations by name.
|
||||
|
||||
Operations tracked: `Read`, `ReadBatch`, `Write`, `WriteBatch` (recorded by ScadaGrpcService).
|
||||
|
||||
### 2.2 Recording
|
||||
|
||||
Two recording patterns:
|
||||
- `RecordOperation(name, duration, success)` — explicit recording.
|
||||
- `BeginOperation(name)` — returns an `ITimingScope` (disposable). On dispose, automatically records duration (via `Stopwatch`) and success flag (set via `SetSuccess(bool)`).
|
||||
|
||||
### 2.3 Per-Operation Statistics
|
||||
|
||||
`OperationMetrics` maintains:
|
||||
- `_totalCount`, `_successCount` — running counters.
|
||||
- `_totalMilliseconds`, `_minMilliseconds`, `_maxMilliseconds` — latency range.
|
||||
- `_durations` — rolling buffer of up to **1000 latency samples** for percentile calculation.
|
||||
|
||||
`MetricsStatistics` snapshot:
|
||||
- `TotalCount`, `SuccessCount`, `SuccessRate` (percentage).
|
||||
- `AverageMilliseconds`, `MinMilliseconds`, `MaxMilliseconds`.
|
||||
- `Percentile95Milliseconds` — calculated from sorted samples at the 95th percentile index.
|
||||
|
||||
### 2.4 Periodic Reporting
|
||||
|
||||
A timer fires every **60 seconds**, logging a summary of all operation metrics to Serilog.
|
||||
|
||||
## 3. Status Web Server
|
||||
|
||||
### 3.1 Server
|
||||
|
||||
`StatusWebServer` uses `HttpListener` on `http://+:{Port}/` (default port 8080).
|
||||
|
||||
- Starts an async request-handling loop, spawning a task per request.
|
||||
- Graceful shutdown: cancels the listener, waits **5 seconds** for the listener task to exit.
|
||||
- Returns HTTP 405 for non-GET methods, HTTP 500 on errors.
|
||||
|
||||
### 3.2 Endpoints
|
||||
|
||||
| Endpoint | Method | Response |
|
||||
|----------|--------|----------|
|
||||
| `/` | GET | HTML dashboard (auto-refresh every 30 seconds) |
|
||||
| `/api/status` | GET | JSON status report (camelCase) |
|
||||
| `/api/health` | GET | Plain text `OK` (200) or `UNHEALTHY` (503) |
|
||||
|
||||
### 3.3 HTML Dashboard
|
||||
|
||||
Generated by `StatusReportService`:
|
||||
- Bootstrap-like CSS grid layout with status cards.
|
||||
- Color-coded status: green = Healthy, yellow = Degraded, red = Unhealthy/Error.
|
||||
- Operations table with columns: Count, SuccessRate, Avg/Min/Max/P95 milliseconds.
|
||||
- Service metadata: ServiceName, Version (assembly version), connection state.
|
||||
- Subscription stats: TotalClients, TotalTags, ActiveSubscriptions.
|
||||
- Auto-refresh via `<meta http-equiv="refresh" content="30">`.
|
||||
- Last updated timestamp.
|
||||
|
||||
### 3.4 JSON Status Report
|
||||
|
||||
Fully nested structure with camelCase property names:
|
||||
- Service metadata, connection status, subscription stats, performance data, health check results.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **MxAccessClient** — `IsConnected`, `ConnectionState` for health checks; test tag read for detailed check.
|
||||
- **SubscriptionManager** — subscription statistics.
|
||||
- **PerformanceMetrics** — operation statistics for status report and health evaluation.
|
||||
- **Configuration** — `WebServerConfiguration` for port and prefix.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **GrpcServer** populates PerformanceMetrics via timing scopes on every RPC.
|
||||
- **ServiceHost** creates all health/metrics/status components at startup and disposes them at shutdown.
|
||||
- External monitoring systems can poll `/api/health` for availability checks.
|
||||
@@ -0,0 +1,108 @@
|
||||
# Component: MxAccessClient
|
||||
|
||||
## Purpose
|
||||
|
||||
The core component that wraps the ArchestrA MXAccess COM API, providing connection management, tag read/write operations, and subscription-based value change notifications. This is the bridge between the gRPC service layer and AVEVA System Platform.
|
||||
|
||||
## Location
|
||||
|
||||
`src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.cs` — partial class split across 6 files:
|
||||
- `MxAccessClient.cs` — Main class, properties, disposal, factory.
|
||||
- `MxAccessClient.Connection.cs` — Connection lifecycle (connect, disconnect, reconnect, cleanup).
|
||||
- `MxAccessClient.ReadWrite.cs` — Read and write operations with retry and concurrency control.
|
||||
- `MxAccessClient.Subscription.cs` — Subscription management and stored subscription state.
|
||||
- `MxAccessClient.EventHandlers.cs` — COM event handlers (OnDataChange, OnWriteComplete, OperationComplete).
|
||||
- `MxAccessClient.NestedTypes.cs` — Internal types and enums.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Manage the MXAccess COM object lifecycle (create, register, unregister, release).
|
||||
- Maintain connection state (Disconnected, Connecting, Connected, Disconnecting, Error, Reconnecting) and fire state change events.
|
||||
- Execute read and write operations against MXAccess with concurrency control via semaphores.
|
||||
- Manage tag subscriptions via MXAccess advise callbacks and store subscription state for reconnection.
|
||||
- Handle COM threading constraints (STA thread context via `Task.Run`).
|
||||
|
||||
## 1. Connection Lifecycle
|
||||
|
||||
### 1.1 Connect
|
||||
|
||||
`ConnectAsync()` wraps `ConnectInternal()` in `Task.Run` for STA thread context:
|
||||
|
||||
1. Validates not disposed.
|
||||
2. Returns early if already connected.
|
||||
3. Sets state to `Connecting`.
|
||||
4. `InitializeMxAccessConnection()` — creates new `LMXProxyServer` COM object, wires event handlers (OnDataChange, OnWriteComplete, OperationComplete).
|
||||
5. `RegisterWithMxAccess()` — calls `_lmxProxy.Register("ZB.MOM.WW.LmxProxy.Host")`, stores the returned connection handle.
|
||||
6. Sets state to `Connected`.
|
||||
7. On error, calls `Cleanup()` and re-throws.
|
||||
|
||||
After successful connection, calls `RecreateStoredSubscriptionsAsync()` to restore any previously active subscriptions.
|
||||
|
||||
### 1.2 Disconnect
|
||||
|
||||
`DisconnectAsync()` wraps `DisconnectInternal()` in `Task.Run`:
|
||||
|
||||
1. Checks `IsConnected`.
|
||||
2. Sets state to `Disconnecting`.
|
||||
3. `RemoveAllSubscriptions()` — unsubscribes all tags from MXAccess but retains subscription state in `_storedSubscriptions` for reconnection.
|
||||
4. `UnregisterFromMxAccess()` — calls `_lmxProxy.Unregister(_connectionHandle)`.
|
||||
5. `Cleanup()` — removes event handlers, calls `Marshal.ReleaseComObject(_lmxProxy)` to force-release all COM references, nulls the proxy and resets the connection handle.
|
||||
6. Sets state to `Disconnected`.
|
||||
|
||||
### 1.3 Connection State
|
||||
|
||||
- `IsConnected` property: `_lmxProxy != null && _connectionState == Connected && _connectionHandle > 0`.
|
||||
- `ConnectionState` enum: Disconnected, Connecting, Connected, Disconnecting, Error, Reconnecting.
|
||||
- `ConnectionStateChanged` event fires on all state transitions with previous state, current state, and optional message.
|
||||
|
||||
### 1.4 Auto-Reconnect
|
||||
|
||||
When `AutoReconnect` is enabled (default), the `MonitorConnectionAsync` loop runs continuously:
|
||||
- Checks `IsConnected` every `MonitorIntervalSeconds` (default 5 seconds).
|
||||
- On disconnect, attempts reconnect via semaphore-protected `ConnectAsync()`.
|
||||
- On failure, logs warning and retries at the next interval.
|
||||
- Reconnection restores stored subscriptions automatically.
|
||||
|
||||
## 2. Thread Safety & COM Constraints
|
||||
|
||||
- State mutations protected by `lock (_lock)`.
|
||||
- COM operations wrapped in `Task.Run` for STA thread context (MXAccess is 32-bit COM).
|
||||
- Concurrency control: `_readSemaphore` and `_writeSemaphore` limit concurrent MXAccess operations to `MaxConcurrentOperations` (default 10, configurable).
|
||||
- Default max concurrency constant: `DefaultMaxConcurrency = 10`.
|
||||
|
||||
## 3. Read Operations
|
||||
|
||||
- `ReadAsync(address, ct)` — Applies Polly retry policy, calls `ReadSingleValueAsync()`, returns `Vtq`.
|
||||
- `ReadBatchAsync(addresses, ct)` — Creates parallel tasks per address via `ReadAddressWithSemaphoreAsync()`. Each task acquires `_readSemaphore` before reading. Returns `IReadOnlyDictionary<address, Vtq>`.
|
||||
|
||||
## 4. Write Operations
|
||||
|
||||
- `WriteAsync(address, value, ct)` — Applies Polly retry policy, calls `WriteInternalAsync(address, value, ct)`.
|
||||
- `WriteBatchAsync(values, ct)` — Parallel tasks via `WriteAddressWithSemaphoreAsync()`. Each task acquires `_writeSemaphore` before writing.
|
||||
- `WriteBatchAndWaitAsync(values, flagAddress, flagValue, responseAddress, responseValue, ct)` — Writes batch, writes flag, polls response tag until match.
|
||||
|
||||
## 5. Subscription Management
|
||||
|
||||
- Subscriptions stored in `_storedSubscriptions` for reconnection persistence.
|
||||
- `SubscribeInternalAsync(addresses, callback, storeSubscription)` — registers tags with MXAccess and stores subscription state.
|
||||
- `RecreateStoredSubscriptionsAsync()` — called after reconnect to re-subscribe all previously active tags without re-storing.
|
||||
- `RemoveAllSubscriptions()` — unsubscribes from MXAccess but retains `_storedSubscriptions`.
|
||||
|
||||
## 6. Event Handlers
|
||||
|
||||
- **OnDataChange** — Fired by MXAccess when a subscribed tag value changes. Routes the update to the SubscriptionManager.
|
||||
- **OnWriteComplete** — Fired when an async write operation completes.
|
||||
- **OperationComplete** — General operation completion callback.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **ArchestrA.MXAccess** COM interop assembly (`lib/ArchestrA.MXAccess.dll`).
|
||||
- **Polly** — retry policies for read/write operations.
|
||||
- **Configuration** — `ConnectionConfiguration` for timeouts, concurrency limits, and auto-reconnect settings.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **GrpcServer** (ScadaGrpcService) delegates all SCADA operations to MxAccessClient via the `IScadaClient` interface.
|
||||
- **SubscriptionManager** receives value change callbacks originating from MxAccessClient's COM event handlers.
|
||||
- **HealthAndMetrics** queries `IsConnected` and `ConnectionState` for health checks.
|
||||
- **ServiceHost** manages the MxAccessClient lifecycle (create at startup, dispose at shutdown).
|
||||
301
deprecated/lmxproxy/docs/requirements/Component-Protocol.md
Normal file
301
deprecated/lmxproxy/docs/requirements/Component-Protocol.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# Component: Protocol
|
||||
|
||||
## Purpose
|
||||
|
||||
Defines the gRPC protocol specification for communication between the LmxProxy Client and Host, including the proto file definition, code-first contracts, message schemas, value type system, and quality codes. The authoritative specification is `docs/lmxproxy_updates.md`.
|
||||
|
||||
## Location
|
||||
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Protos/scada.proto` — proto file (Host, proto-generated).
|
||||
- `src/ZB.MOM.WW.LmxProxy.Client/Domain/ScadaContracts.cs` — code-first contracts (Client, protobuf-net.Grpc).
|
||||
- `docs/lmxproxy_updates.md` — authoritative protocol specification.
|
||||
- `docs/lmxproxy_protocol.md` — legacy v1 protocol documentation (superseded).
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Define the gRPC service interface (`scada.ScadaService`) and all message types.
|
||||
- Ensure wire compatibility between the Host's proto-generated code and the Client's code-first contracts.
|
||||
- Specify the VTQ data model: `TypedValue` for values, `QualityCode` for quality.
|
||||
- Document OPC UA-aligned quality codes filtered to AVEVA System Platform usage.
|
||||
|
||||
## 1. Service Definition
|
||||
|
||||
Service: `scada.ScadaService` (gRPC package: `scada`)
|
||||
|
||||
| RPC | Request | Response | Type |
|
||||
|-----|---------|----------|------|
|
||||
| Connect | ConnectRequest | ConnectResponse | Unary |
|
||||
| Disconnect | DisconnectRequest | DisconnectResponse | Unary |
|
||||
| GetConnectionState | GetConnectionStateRequest | GetConnectionStateResponse | Unary |
|
||||
| Read | ReadRequest | ReadResponse | Unary |
|
||||
| ReadBatch | ReadBatchRequest | ReadBatchResponse | Unary |
|
||||
| Write | WriteRequest | WriteResponse | Unary |
|
||||
| WriteBatch | WriteBatchRequest | WriteBatchResponse | Unary |
|
||||
| WriteBatchAndWait | WriteBatchAndWaitRequest | WriteBatchAndWaitResponse | Unary |
|
||||
| Subscribe | SubscribeRequest | stream VtqMessage | Server streaming |
|
||||
| CheckApiKey | CheckApiKeyRequest | CheckApiKeyResponse | Unary |
|
||||
|
||||
## 2. Value Type System (TypedValue)
|
||||
|
||||
Values are transmitted in their native protobuf types via a `TypedValue` oneof. No string serialization or parsing heuristics are used.
|
||||
|
||||
```
|
||||
TypedValue {
|
||||
oneof value {
|
||||
bool bool_value = 1
|
||||
int32 int32_value = 2
|
||||
int64 int64_value = 3
|
||||
float float_value = 4
|
||||
double double_value = 5
|
||||
string string_value = 6
|
||||
bytes bytes_value = 7
|
||||
int64 datetime_value = 8 // UTC DateTime.Ticks (100ns intervals since 0001-01-01)
|
||||
ArrayValue array_value = 9 // typed arrays
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`ArrayValue` contains typed repeated fields via oneof: `BoolArray`, `Int32Array`, `Int64Array`, `FloatArray`, `DoubleArray`, `StringArray`. Each contains a `repeated` field of the corresponding primitive.
|
||||
|
||||
### 2.1 Null Handling
|
||||
|
||||
- Null is represented by an unset `oneof` (no field selected in `TypedValue`).
|
||||
- A null or missing VTQ message is treated as Bad quality with null value and current UTC timestamp.
|
||||
|
||||
### 2.2 Type Mapping from Internal Tag Model
|
||||
|
||||
| Tag Data Type | TypedValue Field |
|
||||
|---------------|-----------------|
|
||||
| `bool` | `bool_value` |
|
||||
| `int32` | `int32_value` |
|
||||
| `int64` | `int64_value` |
|
||||
| `float` | `float_value` |
|
||||
| `double` | `double_value` |
|
||||
| `string` | `string_value` |
|
||||
| `byte[]` | `bytes_value` |
|
||||
| `DateTime` | `datetime_value` (UTC Ticks as int64) |
|
||||
| `float[]` | `array_value.float_values` |
|
||||
| `int32[]` | `array_value.int32_values` |
|
||||
| Other arrays | Corresponding `ArrayValue` field |
|
||||
|
||||
## 3. Quality System (QualityCode)
|
||||
|
||||
Quality is a structured message with an OPC UA-compatible numeric status code and a human-readable symbolic name:
|
||||
|
||||
```
|
||||
QualityCode {
|
||||
uint32 status_code = 1 // OPC UA-compatible numeric status code
|
||||
string symbolic_name = 2 // Human-readable name (e.g., "Good", "BadSensorFailure")
|
||||
}
|
||||
```
|
||||
|
||||
### 3.1 Category Extraction
|
||||
|
||||
Category derived from high bits via `(statusCode & 0xC0000000)`:
|
||||
- `0x00000000` = Good
|
||||
- `0x40000000` = Uncertain
|
||||
- `0x80000000` = Bad
|
||||
|
||||
```csharp
|
||||
public static bool IsGood(uint statusCode) => (statusCode & 0xC0000000) == 0x00000000;
|
||||
public static bool IsBad(uint statusCode) => (statusCode & 0xC0000000) == 0x80000000;
|
||||
```
|
||||
|
||||
### 3.2 Supported Quality Codes
|
||||
|
||||
Filtered to codes actively used by AVEVA System Platform, InTouch, and OI Server/DAServer (per AVEVA Tech Note TN1305):
|
||||
|
||||
**Good Quality:**
|
||||
|
||||
| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description |
|
||||
|--------------|-------------------|------------------|-------------|
|
||||
| `Good` | `0x00000000` | `0x00C0` | Value is reliable, non-specific |
|
||||
| `GoodLocalOverride` | `0x00D80000` | `0x00D8` | Manually overridden; input disconnected |
|
||||
|
||||
**Uncertain Quality:**
|
||||
|
||||
| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description |
|
||||
|--------------|-------------------|------------------|-------------|
|
||||
| `UncertainLastUsableValue` | `0x40900000` | `0x0044` | External source stopped writing; value is stale |
|
||||
| `UncertainSensorNotAccurate` | `0x42390000` | `0x0050` | Sensor out of calibration or clamped |
|
||||
| `UncertainEngineeringUnitsExceeded` | `0x40540000` | `0x0054` | Outside defined engineering limits |
|
||||
| `UncertainSubNormal` | `0x40580000` | `0x0058` | Derived from insufficient good sources |
|
||||
|
||||
**Bad Quality:**
|
||||
|
||||
| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description |
|
||||
|--------------|-------------------|------------------|-------------|
|
||||
| `Bad` | `0x80000000` | `0x0000` | Non-specific bad; value not useful |
|
||||
| `BadConfigurationError` | `0x80040000` | `0x0004` | Server config problem (e.g., item deleted) |
|
||||
| `BadNotConnected` | `0x808A0000` | `0x0008` | Input not logically connected to source |
|
||||
| `BadDeviceFailure` | `0x806B0000` | `0x000C` | Device failure detected |
|
||||
| `BadSensorFailure` | `0x806D0000` | `0x0010` | Sensor failure detected |
|
||||
| `BadLastKnownValue` | `0x80050000` | `0x0014` | Comm failed; last known value available |
|
||||
| `BadCommunicationFailure` | `0x80050000` | `0x0018` | Comm failed; no last known value |
|
||||
| `BadOutOfService` | `0x808F0000` | `0x001C` | Block off-scan/locked; item inactive |
|
||||
| `BadWaitingForInitialData` | `0x80320000` | — | Initializing; OI Server establishing communication |
|
||||
|
||||
**Notes:**
|
||||
- AVEVA OPC DA quality codes use a 16-bit structure: 2 bits major (Good/Bad/Uncertain), 4 bits minor (sub-status), 2 bits limit (Not Limited, Low, High, Constant). The OPC UA status codes above are the standard UA equivalents.
|
||||
- The limit bits are appended to any quality code. For example, `Good + High Limited` = `0x00C2` in OPC DA. In OPC UA, limits are conveyed via separate status code bits but the base code remains the same.
|
||||
|
||||
### 3.3 Error Condition Mapping
|
||||
|
||||
| Scenario | Quality |
|
||||
|----------|---------|
|
||||
| Normal read | `Good` (`0x00000000`) |
|
||||
| Tag not found | `BadConfigurationError` (`0x80040000`) |
|
||||
| Tag read exception / comms loss | `BadCommunicationFailure` (`0x80050000`) |
|
||||
| Sensor failure | `BadSensorFailure` (`0x806D0000`) |
|
||||
| Device failure | `BadDeviceFailure` (`0x806B0000`) |
|
||||
| Stale value | `UncertainLastUsableValue` (`0x40900000`) |
|
||||
| Block off-scan / disabled | `BadOutOfService` (`0x808F0000`) |
|
||||
| Local override active | `GoodLocalOverride` (`0x00D80000`) |
|
||||
| Initializing / waiting for first value | `BadWaitingForInitialData` (`0x80320000`) |
|
||||
| Write to read-only tag | `WriteResult.success=false`, message indicates read-only |
|
||||
| Type mismatch on write | `WriteResult.success=false`, message indicates type mismatch |
|
||||
|
||||
## 4. Message Schemas
|
||||
|
||||
### 4.1 VtqMessage
|
||||
|
||||
The core data type for tag value transport:
|
||||
|
||||
| Field | Proto Type | Order | Description |
|
||||
|-------|-----------|-------|-------------|
|
||||
| tag | string | 1 | Tag address |
|
||||
| value | TypedValue | 2 | Typed value (native protobuf types) |
|
||||
| timestamp_utc_ticks | int64 | 3 | UTC DateTime.Ticks (100ns intervals since 0001-01-01) |
|
||||
| quality | QualityCode | 4 | Structured quality with status code and symbolic name |
|
||||
|
||||
A null or missing VTQ message is treated as Bad quality with null value and current UTC timestamp.
|
||||
|
||||
### 4.2 Connection Messages
|
||||
|
||||
**ConnectRequest**: `client_id` (string), `api_key` (string)
|
||||
**ConnectResponse**: `success` (bool), `message` (string), `session_id` (string — 32-char hex GUID)
|
||||
|
||||
**DisconnectRequest**: `session_id` (string)
|
||||
**DisconnectResponse**: `success` (bool), `message` (string)
|
||||
|
||||
**GetConnectionStateRequest**: `session_id` (string)
|
||||
**GetConnectionStateResponse**: `is_connected` (bool), `client_id` (string), `connected_since_utc_ticks` (int64)
|
||||
|
||||
### 4.3 Read Messages
|
||||
|
||||
**ReadRequest**: `session_id` (string), `tag` (string)
|
||||
**ReadResponse**: `success` (bool), `message` (string), `vtq` (VtqMessage)
|
||||
|
||||
**ReadBatchRequest**: `session_id` (string), `tags` (repeated string)
|
||||
**ReadBatchResponse**: `success` (bool), `message` (string), `vtqs` (repeated VtqMessage)
|
||||
|
||||
### 4.4 Write Messages
|
||||
|
||||
**WriteRequest**: `session_id` (string), `tag` (string), `value` (TypedValue)
|
||||
**WriteResponse**: `success` (bool), `message` (string)
|
||||
|
||||
**WriteItem**: `tag` (string), `value` (TypedValue)
|
||||
**WriteResult**: `tag` (string), `success` (bool), `message` (string)
|
||||
|
||||
**WriteBatchRequest**: `session_id` (string), `items` (repeated WriteItem)
|
||||
**WriteBatchResponse**: `success` (bool), `message` (string), `results` (repeated WriteResult)
|
||||
|
||||
### 4.5 WriteBatchAndWait Messages
|
||||
|
||||
**WriteBatchAndWaitRequest**:
|
||||
- `session_id` (string)
|
||||
- `items` (repeated WriteItem) — values to write
|
||||
- `flag_tag` (string) — tag to poll after writes
|
||||
- `flag_value` (TypedValue) — expected value (type-aware comparison)
|
||||
- `timeout_ms` (int32) — max wait time (default 5000ms if ≤ 0)
|
||||
- `poll_interval_ms` (int32) — polling interval (default 100ms if ≤ 0)
|
||||
|
||||
**WriteBatchAndWaitResponse**:
|
||||
- `success` (bool)
|
||||
- `message` (string)
|
||||
- `write_results` (repeated WriteResult)
|
||||
- `flag_reached` (bool) — whether the flag value was matched
|
||||
- `elapsed_ms` (int32) — total elapsed time
|
||||
|
||||
**Behavior:**
|
||||
1. All writes execute first. If any write fails, returns immediately with `success=false`.
|
||||
2. If writes succeed, polls `flag_tag` at `poll_interval_ms` intervals.
|
||||
3. Uses type-aware `TypedValueEquals()` comparison (see Section 4.5.1).
|
||||
4. If flag matches before timeout: `success=true`, `flag_reached=true`.
|
||||
5. If timeout expires: `success=true`, `flag_reached=false` (timeout is not an error).
|
||||
|
||||
#### 4.5.1 Flag Comparison Rules
|
||||
|
||||
Type-aware comparison via `TypedValueEquals()`:
|
||||
- Both values must have the same `oneof` case (same type). Mismatched types are never equal.
|
||||
- Numeric comparison uses the native type's equality (no floating-point string round-trip issues).
|
||||
- String comparison is case-sensitive.
|
||||
- Bool comparison is direct equality.
|
||||
- Null (unset `oneof`) equals null. Null does not equal any set value.
|
||||
- Array comparison: element-by-element equality, same length required.
|
||||
- `datetime_value` compared as `int64` equality (tick-level precision).
|
||||
|
||||
### 4.6 Subscription Messages
|
||||
|
||||
**SubscribeRequest**: `session_id` (string), `tags` (repeated string), `sampling_ms` (int32)
|
||||
Response: streamed `VtqMessage` items.
|
||||
|
||||
### 4.7 API Key Messages
|
||||
|
||||
**CheckApiKeyRequest**: `api_key` (string)
|
||||
**CheckApiKeyResponse**: `is_valid` (bool), `message` (string)
|
||||
|
||||
## 5. Dual gRPC Stack Compatibility
|
||||
|
||||
The Host and Client use different gRPC implementations:
|
||||
|
||||
| Aspect | Host | Client |
|
||||
|--------|------|--------|
|
||||
| Stack | Grpc.Core (C-core) | Grpc.Net.Client |
|
||||
| Contract | Proto file (`scada.proto`) + Grpc.Tools codegen | Code-first (`[ServiceContract]`, `[DataContract]`) via protobuf-net.Grpc |
|
||||
| Runtime | .NET Framework 4.8 | .NET 10 |
|
||||
|
||||
Both target `scada.ScadaService` and produce identical wire format. Field ordering in `[DataMember(Order = N)]` matches proto field numbers.
|
||||
|
||||
## 6. V1 Legacy Protocol
|
||||
|
||||
The current codebase implements the v1 protocol. The following describes v1 behavior that will be replaced during migration to v2.
|
||||
|
||||
### 6.1 V1 Value Encoding
|
||||
|
||||
All values transmitted as strings:
|
||||
- Write direction: server parses string values in order: bool → int → long → double → DateTime → raw string.
|
||||
- Read direction: server serializes via `.ToString()` (bool → lowercase, DateTime → ISO-8601, arrays → JSON).
|
||||
- Client parses: double → bool → null (empty string) → raw string.
|
||||
|
||||
### 6.2 V1 Quality
|
||||
|
||||
Three-state string quality (`"Good"`, `"Uncertain"`, `"Bad"`, case-insensitive). OPC UA numeric ranges: ≥192 = Good, 64–191 = Uncertain, <64 = Bad.
|
||||
|
||||
### 6.3 V1 → V2 Field Changes
|
||||
|
||||
| Message | Field | V1 Type | V2 Type |
|
||||
|---------|-------|---------|---------|
|
||||
| VtqMessage | value | string | TypedValue |
|
||||
| VtqMessage | quality | string | QualityCode |
|
||||
| WriteRequest | value | string | TypedValue |
|
||||
| WriteItem | value | string | TypedValue |
|
||||
| WriteBatchAndWaitRequest | flag_value | string | TypedValue |
|
||||
|
||||
All RPC signatures remain unchanged. Only value and quality fields change type.
|
||||
|
||||
### 6.4 Migration Strategy
|
||||
|
||||
Clean break — no backward compatibility layer. All clients and servers updated simultaneously. This is appropriate because LmxProxy is an internal protocol with a small, controlled client count. Dual-format support adds complexity with no long-term benefit.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Grpc.Core** + **Grpc.Tools** — proto compilation and server hosting (Host).
|
||||
- **protobuf-net.Grpc** — code-first contracts (Client).
|
||||
- **Grpc.Net.Client** — HTTP/2 transport (Client).
|
||||
|
||||
## Interactions
|
||||
|
||||
- **GrpcServer** implements the service defined by this protocol.
|
||||
- **Client** consumes the service defined by this protocol.
|
||||
- **MxAccessClient** is the backend that executes the operations requested via the protocol.
|
||||
119
deprecated/lmxproxy/docs/requirements/Component-Security.md
Normal file
119
deprecated/lmxproxy/docs/requirements/Component-Security.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Component: Security
|
||||
|
||||
## Purpose
|
||||
|
||||
Provides API key-based authentication and role-based authorization for the gRPC service, along with TLS certificate management for transport security.
|
||||
|
||||
## Location
|
||||
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Security/ApiKeyService.cs` — API key storage and validation.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Security/ApiKeyInterceptor.cs` — gRPC server interceptor for authentication/authorization.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Client/Security/GrpcChannelFactory.cs` — Client-side TLS channel factory.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Load and hot-reload API keys from a JSON configuration file.
|
||||
- Validate API keys on every gRPC request via a server interceptor.
|
||||
- Enforce role-based access control (ReadOnly vs ReadWrite).
|
||||
- Manage TLS certificates for server and optional mutual TLS.
|
||||
|
||||
## 1. API Key Service
|
||||
|
||||
### 1.1 Key Storage
|
||||
|
||||
- Keys are stored in a JSON file (default `apikeys.json`).
|
||||
- File format: `{ "ApiKeys": [{ "Key": "...", "Description": "...", "Role": "ReadOnly|ReadWrite", "Enabled": true|false }] }`.
|
||||
- If the file does not exist at startup, the service auto-generates a default file with two random keys: one ReadOnly and one ReadWrite.
|
||||
|
||||
### 1.2 Hot Reload
|
||||
|
||||
- A `FileSystemWatcher` monitors the API key file for changes.
|
||||
- Rapid changes are debounced (1-second minimum between reloads).
|
||||
- `ReloadConfigurationAsync` uses a `SemaphoreSlim` to serialize reload operations.
|
||||
- New and modified keys take effect on the next request. Removed or disabled keys reject future requests immediately.
|
||||
- Active sessions are not affected by key changes — sessions are tracked independently by SessionManager.
|
||||
|
||||
### 1.3 Validation
|
||||
|
||||
- `ValidateApiKey(apiKey)` — Returns the `ApiKey` object if the key exists and `Enabled` is true, otherwise null.
|
||||
- `HasRole(apiKey, requiredRole)` — Returns true if the key has the required role. Role hierarchy: ReadWrite implies ReadOnly.
|
||||
|
||||
## 2. API Key Interceptor
|
||||
|
||||
### 2.1 Authentication Flow
|
||||
|
||||
The `ApiKeyInterceptor` intercepts every unary and server-streaming RPC:
|
||||
|
||||
1. Extracts the `x-api-key` header from gRPC request metadata.
|
||||
2. Calls `ApiKeyService.ValidateApiKey()`.
|
||||
3. If the key is invalid or missing, returns `StatusCode.Unauthenticated`.
|
||||
4. For write-protected methods (`Write`, `WriteBatch`, `WriteBatchAndWait`), checks that the key has the `ReadWrite` role. Returns `StatusCode.PermissionDenied` if the key is `ReadOnly`.
|
||||
5. Adds the validated `ApiKey` to `context.UserState["ApiKey"]` for downstream use.
|
||||
6. Continues to the service method.
|
||||
|
||||
### 2.2 Write-Protected Methods
|
||||
|
||||
These RPCs require the `ReadWrite` role:
|
||||
- `Write`
|
||||
- `WriteBatch`
|
||||
- `WriteBatchAndWait`
|
||||
|
||||
All other RPCs (`Connect`, `Disconnect`, `GetConnectionState`, `Read`, `ReadBatch`, `Subscribe`, `CheckApiKey`) are allowed for `ReadOnly` keys.
|
||||
|
||||
## 3. API Key Model
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| Key | string | The secret API key value |
|
||||
| Description | string | Human-readable name for the key |
|
||||
| Role | ApiKeyRole | `ReadOnly` or `ReadWrite` |
|
||||
| Enabled | bool | Whether the key is active |
|
||||
|
||||
`ApiKeyRole` enum: `ReadOnly` (read and subscribe only), `ReadWrite` (full access including writes).
|
||||
|
||||
## 4. TLS Configuration
|
||||
|
||||
### 4.1 Server-Side (Host)
|
||||
|
||||
Configured via `TlsConfiguration` in `appsettings.json`:
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| Enabled | false | Enable TLS on the gRPC server |
|
||||
| ServerCertificatePath | `certs/server.crt` | PEM server certificate |
|
||||
| ServerKeyPath | `certs/server.key` | PEM server private key |
|
||||
| ClientCaCertificatePath | `certs/ca.crt` | CA certificate for mTLS client validation |
|
||||
| RequireClientCertificate | false | Require client certificates (mutual TLS) |
|
||||
| CheckCertificateRevocation | false | Check certificate revocation lists |
|
||||
|
||||
If TLS is enabled but certificates are missing, the service generates self-signed certificates at startup.
|
||||
|
||||
### 4.2 Client-Side
|
||||
|
||||
`ClientTlsConfiguration` in the client library:
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| UseTls | false | Enable TLS on the client connection |
|
||||
| ClientCertificatePath | null | Client certificate for mTLS |
|
||||
| ClientKeyPath | null | Client private key for mTLS |
|
||||
| ServerCaCertificatePath | null | Custom CA for server validation |
|
||||
| ServerNameOverride | null | SNI/hostname override |
|
||||
| ValidateServerCertificate | true | Validate the server certificate chain |
|
||||
| AllowSelfSignedCertificates | false | Accept self-signed server certificates |
|
||||
| IgnoreAllCertificateErrors | false | Skip all certificate validation (dangerous) |
|
||||
|
||||
- SSL protocols: TLS 1.2 and TLS 1.3.
|
||||
- Client certificates loaded from PEM files and converted to PKCS12.
|
||||
- Custom CA trust store support via chain building.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Configuration** — TLS settings and API key file path from `appsettings.json`.
|
||||
- **System.IO.FileSystemWatcher** — API key file change detection.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **GrpcServer** — the ApiKeyInterceptor runs before every RPC in ScadaGrpcService.
|
||||
- **ServiceHost** — creates ApiKeyService and ApiKeyInterceptor at startup, configures gRPC server credentials.
|
||||
- **Client** — GrpcChannelFactory creates TLS-configured gRPC channels in LmxProxyClient.
|
||||
108
deprecated/lmxproxy/docs/requirements/Component-ServiceHost.md
Normal file
108
deprecated/lmxproxy/docs/requirements/Component-ServiceHost.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# Component: ServiceHost
|
||||
|
||||
## Purpose
|
||||
|
||||
The entry point and lifecycle manager for the LmxProxy Windows service. Handles Topshelf service hosting, Serilog logging setup, component initialization/teardown ordering, and Windows SCM service recovery configuration.
|
||||
|
||||
## Location
|
||||
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Program.cs` — entry point, Serilog setup, Topshelf configuration.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs` — service lifecycle (Start, Stop, Pause, Continue, Shutdown).
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Configure and launch the Topshelf Windows service.
|
||||
- Load and validate configuration from `appsettings.json`.
|
||||
- Initialize Serilog logging.
|
||||
- Orchestrate service startup: create all components in dependency order, connect to MxAccess, start servers.
|
||||
- Orchestrate service shutdown: stop servers, dispose all components in reverse order.
|
||||
- Configure Windows SCM service recovery policies.
|
||||
|
||||
## 1. Entry Point (Program.cs)
|
||||
|
||||
1. Builds configuration from `appsettings.json` + environment variables via `ConfigurationBuilder`.
|
||||
2. Configures Serilog from the `Serilog` section of appsettings (console + file sinks).
|
||||
3. Validates configuration using `ConfigurationValidator.ValidateAndLog()`.
|
||||
4. Configures Topshelf `HostFactory`:
|
||||
- Service name: `ZB.MOM.WW.LmxProxy.Host`
|
||||
- Display name: `SCADA Bridge LMX Proxy`
|
||||
- Start automatically on boot.
|
||||
- Service recovery: first failure 1 min, second 5 min, subsequent 10 min, reset period 1 day.
|
||||
5. Runs the Topshelf host (blocks until service stops).
|
||||
|
||||
## 2. Service Lifecycle (LmxProxyService)
|
||||
|
||||
### 2.1 Startup Sequence (Start)
|
||||
|
||||
Components are created and started in dependency order:
|
||||
|
||||
1. Validate configuration.
|
||||
2. Check/generate TLS certificates (if TLS enabled).
|
||||
3. Create `PerformanceMetrics`.
|
||||
4. Create `ApiKeyService` — loads API keys from file.
|
||||
5. Create `MxAccessClient` via factory.
|
||||
6. Subscribe to connection state changes.
|
||||
7. Connect to MxAccess synchronously — times out at `ConnectionTimeoutSeconds` (default 30s).
|
||||
8. Start `MonitorConnectionAsync` (if `AutoReconnect` enabled).
|
||||
9. Create `SubscriptionManager`.
|
||||
10. Create `SessionManager`.
|
||||
11. Create `HealthCheckService` + `DetailedHealthCheckService`.
|
||||
12. Create `StatusReportService` + `StatusWebServer`.
|
||||
13. Create `ScadaGrpcService`.
|
||||
14. Create `ApiKeyInterceptor`.
|
||||
15. Configure gRPC `Server` with TLS or insecure credentials.
|
||||
16. Start gRPC server on `0.0.0.0:{GrpcPort}`.
|
||||
17. Start `StatusWebServer`.
|
||||
|
||||
### 2.2 Shutdown Sequence (Stop)
|
||||
|
||||
Components are stopped and disposed in reverse order:
|
||||
|
||||
1. Cancel reconnect monitor — wait **5 seconds** for exit.
|
||||
2. Graceful gRPC server shutdown — **10-second** timeout, then kill.
|
||||
3. Stop StatusWebServer — **5-second** wait.
|
||||
4. Dispose all components in reverse creation order.
|
||||
5. Disconnect from MxAccess — **10-second** timeout.
|
||||
|
||||
### 2.3 Other Lifecycle Events
|
||||
|
||||
- **Pause**: Supported by Topshelf but behavior is a no-op beyond logging.
|
||||
- **Continue**: Resume from pause, no-op beyond logging.
|
||||
- **Shutdown**: System shutdown signal, triggers the same shutdown sequence as Stop.
|
||||
|
||||
## 3. Service Recovery (Windows SCM)
|
||||
|
||||
Configured via Topshelf's `EnableServiceRecovery`:
|
||||
|
||||
| Failure | Action | Delay |
|
||||
|---------|--------|-------|
|
||||
| First | Restart service | 1 minute |
|
||||
| Second | Restart service | 5 minutes |
|
||||
| Subsequent | Restart service | 10 minutes |
|
||||
| Reset period | — | 1 day |
|
||||
|
||||
All values are configurable via `ServiceRecoveryConfiguration`.
|
||||
|
||||
## 4. Service Identity
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Service name | `ZB.MOM.WW.LmxProxy.Host` |
|
||||
| Display name | `SCADA Bridge LMX Proxy` |
|
||||
| Start mode | Automatic |
|
||||
| Platform | x86 (.NET Framework 4.8) |
|
||||
| Framework | Topshelf |
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Topshelf** — Windows service framework.
|
||||
- **Serilog** — structured logging (console + file sinks).
|
||||
- **Microsoft.Extensions.Configuration** — configuration loading.
|
||||
- **Configuration** — validated configuration objects.
|
||||
- All other components are created and managed by LmxProxyService.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **Configuration** is loaded and validated first; all other components receive their settings from it.
|
||||
- **MxAccessClient** is connected synchronously during startup. If connection fails within the timeout, the service fails to start.
|
||||
- **GrpcServer** and **StatusWebServer** are started last, after all dependencies are ready.
|
||||
@@ -0,0 +1,76 @@
|
||||
# Component: SessionManager
|
||||
|
||||
## Purpose
|
||||
|
||||
Tracks active client sessions, mapping session IDs to client metadata. Provides session creation, validation, and termination for the gRPC service layer.
|
||||
|
||||
## Location
|
||||
|
||||
`src/ZB.MOM.WW.LmxProxy.Host/Sessions/SessionManager.cs`
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Create new sessions with unique identifiers when clients connect.
|
||||
- Validate session IDs on every data operation.
|
||||
- Track session metadata (client ID, API key, connection time, last activity).
|
||||
- Terminate sessions on client disconnect.
|
||||
- Provide session listing for monitoring and status reporting.
|
||||
|
||||
## 1. Session Storage
|
||||
|
||||
- Sessions are stored in a `ConcurrentDictionary<string, SessionInfo>` (lock-free, thread-safe).
|
||||
- Session state is in-memory only — all sessions are lost on service restart.
|
||||
- `ActiveSessionCount` property returns the current count of tracked sessions.
|
||||
|
||||
## 2. Session Lifecycle
|
||||
|
||||
### 2.1 Creation
|
||||
|
||||
`CreateSession(clientId, apiKey)`:
|
||||
- Generates a unique session ID: `Guid.NewGuid().ToString("N")` (32-character lowercase hex string, no hyphens).
|
||||
- Creates a `SessionInfo` record with `ConnectedAt` and `LastActivity` set to `DateTime.UtcNow`.
|
||||
- Stores the session in the dictionary.
|
||||
- Returns the session ID to the client.
|
||||
|
||||
### 2.2 Validation
|
||||
|
||||
`ValidateSession(sessionId)`:
|
||||
- Looks up the session ID in the dictionary.
|
||||
- If found, updates `LastActivity` to `DateTime.UtcNow` and returns `true`.
|
||||
- If not found, returns `false`.
|
||||
|
||||
### 2.3 Termination
|
||||
|
||||
`TerminateSession(sessionId)`:
|
||||
- Removes the session from the dictionary.
|
||||
- Returns `true` if the session existed, `false` otherwise.
|
||||
|
||||
### 2.4 Query
|
||||
|
||||
- `GetSession(sessionId)` — Returns `SessionInfo` or `null` if not found.
|
||||
- `GetAllSessions()` — Returns `IReadOnlyList<SessionInfo>` snapshot of all active sessions.
|
||||
|
||||
## 3. SessionInfo
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| SessionId | string | 32-character hex GUID |
|
||||
| ClientId | string | Client-provided identifier |
|
||||
| ApiKey | string | API key used for authentication |
|
||||
| ConnectedAt | DateTime | UTC time of session creation |
|
||||
| LastActivity | DateTime | UTC time of last operation (updated on each validation) |
|
||||
| ConnectedSinceUtcTicks | long | `ConnectedAt.Ticks` for gRPC response serialization |
|
||||
|
||||
## 4. Disposal
|
||||
|
||||
`Dispose()` clears all sessions from the dictionary. No notifications are sent to connected clients.
|
||||
|
||||
## Dependencies
|
||||
|
||||
None. SessionManager is a standalone in-memory store with no external dependencies.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **GrpcServer** calls `CreateSession` on Connect, `ValidateSession` on every data operation, and `TerminateSession` on Disconnect.
|
||||
- **HealthAndMetrics** reads `ActiveSessionCount` for health check data.
|
||||
- **StatusReportService** reads session information for the status dashboard.
|
||||
@@ -0,0 +1,116 @@
|
||||
# Component: SubscriptionManager
|
||||
|
||||
## Purpose
|
||||
|
||||
Manages the lifecycle of tag value subscriptions, multiplexing multiple client subscriptions onto shared MXAccess tag subscriptions and delivering updates via per-client bounded channels with configurable backpressure.
|
||||
|
||||
## Location
|
||||
|
||||
`src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs`
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Create per-client subscription channels with bounded capacity.
|
||||
- Share underlying MXAccess tag subscriptions across multiple clients subscribing to the same tags.
|
||||
- Deliver tag value updates from MXAccess callbacks to all subscribed clients.
|
||||
- Handle backpressure when client channels are full (DropOldest, DropNewest, or Wait).
|
||||
- Clean up subscriptions on client disconnect.
|
||||
- Notify all subscribed clients with bad quality when MXAccess disconnects.
|
||||
|
||||
## 1. Architecture
|
||||
|
||||
### 1.1 Per-Client Channels
|
||||
|
||||
Each subscribing client gets a bounded `System.Threading.Channel<(string address, Vtq vtq)>`:
|
||||
- Capacity: configurable (default 1000 messages).
|
||||
- Full mode: configurable (default `DropOldest`).
|
||||
- `SingleReader = true`, `SingleWriter = false`.
|
||||
|
||||
### 1.2 Shared Tag Subscriptions
|
||||
|
||||
Tag subscriptions to MXAccess are shared across clients:
|
||||
- When the first client subscribes to a tag, a new MXAccess subscription is created.
|
||||
- When additional clients subscribe to the same tag, they are added to the existing tag subscription's client set.
|
||||
- When the last client unsubscribes from a tag, the MXAccess subscription is disposed.
|
||||
|
||||
### 1.3 Thread Safety
|
||||
|
||||
- `ReaderWriterLockSlim` protects tag subscription updates.
|
||||
- `ConcurrentDictionary` for client subscription tracking.
|
||||
|
||||
## 2. Subscription Flow
|
||||
|
||||
### 2.1 Subscribe
|
||||
|
||||
`SubscribeAsync(clientId, addresses, ct)`:
|
||||
|
||||
1. Creates a bounded channel with configured capacity and full mode.
|
||||
2. Creates a `ClientSubscription` record (clientId, channel, address set, CancellationTokenSource, counters).
|
||||
3. For each tag address:
|
||||
- If the tag already has a subscription, adds the client to the existing `TagSubscription.clientIds` set.
|
||||
- Otherwise, creates a new `TagSubscription` and calls `_scadaClient.SubscribeAsync()` to register with MXAccess (outside the lock to avoid blocking).
|
||||
4. Registers a cancellation token callback to automatically call `UnsubscribeClient` on disconnect.
|
||||
5. Returns the channel reader for the GrpcServer to stream from.
|
||||
|
||||
### 2.2 Value Updates
|
||||
|
||||
`OnTagValueChanged(address, Vtq)` — called from MxAccessClient's COM event handler:
|
||||
|
||||
1. Looks up the tag subscription to find all subscribed clients.
|
||||
2. For each client, calls `channel.Writer.TryWrite((address, vtq))`.
|
||||
3. If the channel is full:
|
||||
- **DropOldest**: Logs a warning, increments `DroppedMessageCount`. The oldest message is automatically discarded by the channel.
|
||||
- **DropNewest**: Drops the incoming message.
|
||||
- **Wait**: Blocks the writer until space is available (not recommended for gRPC streaming).
|
||||
4. On channel closed (client disconnected), schedules `UnsubscribeClient` cleanup.
|
||||
|
||||
### 2.3 Unsubscribe
|
||||
|
||||
`UnsubscribeClient(clientId)`:
|
||||
|
||||
1. Removes the client from the client dictionary.
|
||||
2. For each tag the client was subscribed to, removes the client from the tag's subscriber set.
|
||||
3. If a tag has no remaining subscribers, disposes the MXAccess subscription handle.
|
||||
4. Completes the client's channel writer (signals end of stream).
|
||||
|
||||
## 3. Backpressure
|
||||
|
||||
| Mode | Behavior | Use Case |
|
||||
|------|----------|----------|
|
||||
| DropOldest | Silently discards oldest message when channel is full | Default. Fire-and-forget semantics. No client blocking. |
|
||||
| DropNewest | Drops the incoming message when channel is full | Preserves history, drops latest updates. |
|
||||
| Wait | Blocks the writer until space is available | Not recommended for gRPC streaming (blocks callback thread). |
|
||||
|
||||
Per-client statistics track `DeliveredMessageCount` and `DroppedMessageCount` for monitoring via the status dashboard.
|
||||
|
||||
## 4. Disconnection Handling
|
||||
|
||||
### 4.1 Client Disconnect
|
||||
|
||||
When a client's gRPC stream ends (cancellation or error), the cancellation token callback triggers `UnsubscribeClient`, which cleans up all tag subscriptions for that client.
|
||||
|
||||
### 4.2 MxAccess Disconnect
|
||||
|
||||
`OnConnectionStateChanged` — when the MxAccess connection drops:
|
||||
- Sends a bad-quality Vtq to all subscribed clients via their channels.
|
||||
- Each client receives an async notification of the connection loss.
|
||||
- Tag subscriptions are retained in memory for reconnection (via MxAccessClient's `_storedSubscriptions`).
|
||||
|
||||
## 5. Statistics
|
||||
|
||||
`GetSubscriptionStats()` returns:
|
||||
- `TotalClients` — number of active client subscriptions.
|
||||
- `TotalTags` — number of unique tags with active MXAccess subscriptions.
|
||||
- `ActiveSubscriptions` — total client-tag subscription count.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **MxAccessClient** (IScadaClient) — creates and disposes MXAccess tag subscriptions.
|
||||
- **Configuration** — `SubscriptionConfiguration` for channel capacity and full mode.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **GrpcServer** calls `SubscribeAsync` on Subscribe RPC and reads from the returned channel.
|
||||
- **MxAccessClient** delivers value updates via the `OnTagValueChanged` callback.
|
||||
- **HealthAndMetrics** reads subscription statistics for health checks and status reports.
|
||||
- **ServiceHost** disposes the SubscriptionManager at shutdown.
|
||||
274
deprecated/lmxproxy/docs/requirements/HighLevelReqs.md
Normal file
274
deprecated/lmxproxy/docs/requirements/HighLevelReqs.md
Normal file
@@ -0,0 +1,274 @@
|
||||
# LmxProxy - High Level Requirements
|
||||
|
||||
## 1. System Purpose
|
||||
|
||||
LmxProxy is a gRPC proxy service that bridges SCADA clients to AVEVA System Platform (Wonderware) via the ArchestrA MXAccess COM API. It exists because MXAccess is a 32-bit COM component that requires co-location with System Platform on a Windows machine running .NET Framework 4.8. LmxProxy isolates this constraint behind a gRPC interface, allowing modern .NET clients to access System Platform data remotely over HTTP/2.
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
### 2.1 Two-Project Structure
|
||||
|
||||
- **ZB.MOM.WW.LmxProxy.Host** — .NET Framework 4.8, x86-only Windows service. Hosts a gRPC server (Grpc.Core) fronting the MXAccess COM API. Runs on the same machine as AVEVA System Platform.
|
||||
- **ZB.MOM.WW.LmxProxy.Client** — .NET 10, AnyCPU class library. Code-first gRPC client (protobuf-net.Grpc) consumed by ScadaLink's Data Connection Layer. Packaged as a NuGet library.
|
||||
|
||||
### 2.2 Dual gRPC Stacks
|
||||
|
||||
The two projects use different gRPC implementations that are wire-compatible:
|
||||
|
||||
- **Host**: Proto-file-generated code via `Grpc.Core` + `Grpc.Tools`. Uses the deprecated C-core gRPC library because .NET Framework 4.8 does not support `Grpc.Net.Server`.
|
||||
- **Client**: Code-first contracts via `protobuf-net.Grpc` with `[DataContract]`/`[ServiceContract]` attributes over `Grpc.Net.Client`.
|
||||
|
||||
Both target the same `scada.ScadaService` gRPC service definition and are wire-compatible.
|
||||
|
||||
### 2.3 Deployment Model
|
||||
|
||||
- The Host service runs on the AVEVA System Platform machine (or any machine with MXAccess access).
|
||||
- Clients connect remotely over gRPC (HTTP/2) on a configurable port (default 50051).
|
||||
- The Host runs as a Windows service managed by Topshelf.
|
||||
|
||||
## 3. Communication Protocol
|
||||
|
||||
### 3.1 Transport
|
||||
|
||||
- gRPC over HTTP/2.
|
||||
- Default server port: 50051.
|
||||
- Optional TLS with mutual TLS (mTLS) support.
|
||||
|
||||
### 3.2 RPCs
|
||||
|
||||
The service exposes 10 RPCs:
|
||||
|
||||
| RPC | Type | Description |
|
||||
|-----|------|-------------|
|
||||
| Connect | Unary | Establish session, returns session ID |
|
||||
| Disconnect | Unary | Terminate session |
|
||||
| GetConnectionState | Unary | Query MxAccess connection status |
|
||||
| Read | Unary | Read single tag value |
|
||||
| ReadBatch | Unary | Read multiple tag values |
|
||||
| Write | Unary | Write single tag value |
|
||||
| WriteBatch | Unary | Write multiple tag values |
|
||||
| WriteBatchAndWait | Unary | Write values, poll flag tag until match or timeout |
|
||||
| Subscribe | Server streaming | Stream tag value updates to client |
|
||||
| CheckApiKey | Unary | Validate API key and return role |
|
||||
|
||||
### 3.3 Data Model (VTQ)
|
||||
|
||||
All tag values are represented as VTQ (Value, Timestamp, Quality) tuples:
|
||||
|
||||
- **Value**: `TypedValue` — a protobuf `oneof` carrying the value in its native type (bool, int32, int64, float, double, string, bytes, datetime, typed arrays). An unset `oneof` represents null.
|
||||
- **Timestamp**: UTC `DateTime.Ticks` as `int64` (100-nanosecond intervals since 0001-01-01 00:00:00 UTC).
|
||||
- **Quality**: `QualityCode` — a structured message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Category derived from high bits: `0x00xxxxxx` = Good, `0x40xxxxxx` = Uncertain, `0x80xxxxxx` = Bad.
|
||||
|
||||
## 4. Session Lifecycle
|
||||
|
||||
- Clients call `Connect` with a client ID and optional API key to establish a session.
|
||||
- The server returns a 32-character hex GUID as the session ID.
|
||||
- All subsequent operations require the session ID for validation.
|
||||
- Sessions persist until explicit `Disconnect` or server restart. There is no idle timeout.
|
||||
- Session state is tracked in memory (not persisted). All sessions are lost on service restart.
|
||||
|
||||
## 5. Authentication & Authorization
|
||||
|
||||
### 5.1 API Key Authentication
|
||||
|
||||
- API keys are validated via the `x-api-key` gRPC metadata header.
|
||||
- Keys are stored in a JSON file (`apikeys.json` by default) with hot-reload via FileSystemWatcher (1-second debounce).
|
||||
- If no API key file exists, the service auto-generates a default file with two random keys (one ReadOnly, one ReadWrite).
|
||||
- Authentication is enforced at the gRPC interceptor level before any service method executes.
|
||||
|
||||
### 5.2 Role-Based Authorization
|
||||
|
||||
Two roles with hierarchical permissions:
|
||||
|
||||
| Role | Read | Subscribe | Write |
|
||||
|------|------|-----------|-------|
|
||||
| ReadOnly | Yes | Yes | No |
|
||||
| ReadWrite | Yes | Yes | Yes |
|
||||
|
||||
Write-protected methods: `Write`, `WriteBatch`, `WriteBatchAndWait`. A ReadOnly key attempting a write receives `StatusCode.PermissionDenied`.
|
||||
|
||||
### 5.3 TLS/Security
|
||||
|
||||
- TLS is optional (disabled by default in configuration, though `Tls.Enabled` defaults to `true` in the config class).
|
||||
- Supports server TLS and mutual TLS (client certificate validation).
|
||||
- Client CA certificate path configurable for mTLS.
|
||||
- Certificate revocation checking is optional.
|
||||
- Client library supports TLS 1.2 and TLS 1.3, custom CA trust stores, self-signed certificate allowance, and server name override.
|
||||
|
||||
## 6. Operations
|
||||
|
||||
### 6.1 Read
|
||||
|
||||
- Single tag read with configurable retry policy.
|
||||
- Batch read with semaphore-controlled concurrency (default max 10 concurrent operations).
|
||||
- Read timeout: 5 seconds (configurable).
|
||||
|
||||
### 6.2 Write
|
||||
|
||||
- Single tag write with retry policy. Values are sent as `TypedValue` (native protobuf types). Type mismatches between the value and the tag's expected type return a write failure.
|
||||
- Batch write with semaphore-controlled concurrency.
|
||||
- Write timeout: 5 seconds (configurable).
|
||||
- WriteBatchAndWait: writes a batch, then polls the flag tag at a configurable interval until its value matches the expected flag value (type-aware comparison via `TypedValueEquals`) or a timeout expires. Default timeout: 5000ms, default poll interval: 100ms. Timeout is not an error — returns `flag_reached=false`.
|
||||
|
||||
### 6.3 Subscribe
|
||||
|
||||
- Server-streaming RPC. Client sends a list of tags and a sampling interval (in milliseconds).
|
||||
- Server maintains a per-client bounded channel (default capacity 1000 messages).
|
||||
- Updates are pushed as `VtqMessage` items on the stream.
|
||||
- When the MxAccess connection drops, all subscribed clients receive a bad-quality notification.
|
||||
- Subscriptions are cleaned up on client disconnect. When the last client unsubscribes from a tag, the underlying MxAccess subscription is disposed.
|
||||
|
||||
## 7. Connection Resilience
|
||||
|
||||
### 7.1 Host Auto-Reconnect
|
||||
|
||||
- If the MxAccess connection is lost, the Host automatically attempts reconnection at a fixed interval (default 5 seconds).
|
||||
- Stored subscriptions are recreated after a successful reconnect.
|
||||
- Auto-reconnect is configurable (`Connection.AutoReconnect`, default true).
|
||||
|
||||
### 7.2 Client Keep-Alive
|
||||
|
||||
- The client sends a lightweight `GetConnectionState` ping every 30 seconds.
|
||||
- On keep-alive failure, the client marks the connection as disconnected and cleans up subscriptions.
|
||||
|
||||
### 7.3 Client Retry Policy
|
||||
|
||||
- Polly-based exponential backoff retry.
|
||||
- Default: 3 attempts with 1-second initial delay (1s → 2s → 4s).
|
||||
- Transient errors retried: Unavailable, DeadlineExceeded, ResourceExhausted, Aborted.
|
||||
|
||||
## 8. Health Monitoring & Metrics
|
||||
|
||||
### 8.1 Health Checks
|
||||
|
||||
Two health check implementations:
|
||||
|
||||
- **Basic** (`HealthCheckService`): Checks MxAccess connection state, subscription stats, and operation success rate. Returns Degraded if success rate < 50% (with > 100 operations) or client count > 100.
|
||||
- **Detailed** (`DetailedHealthCheckService`): Reads a test tag (`System.Heartbeat`). Returns Unhealthy if not connected, Degraded if test tag quality is not Good or timestamp is older than 5 minutes.
|
||||
|
||||
### 8.2 Performance Metrics
|
||||
|
||||
- Per-operation tracking: Read, ReadBatch, Write, WriteBatch.
|
||||
- Metrics: total count, success count, success rate, average/min/max latency, 95th percentile latency.
|
||||
- Rolling buffer of 1000 latency samples per operation for percentile calculation.
|
||||
- Metrics reported to logs every 60 seconds.
|
||||
|
||||
### 8.3 Status Web Server
|
||||
|
||||
- HTTP status server on port 8080 (configurable).
|
||||
- Endpoints:
|
||||
- `GET /` — HTML dashboard with auto-refresh (30 seconds), color-coded status cards, operations table.
|
||||
- `GET /api/status` — JSON status report.
|
||||
- `GET /api/health` — Plain text `OK` (200) or `UNHEALTHY` (503).
|
||||
|
||||
### 8.4 Client Metrics
|
||||
|
||||
- Per-operation counts, error counts, and latency tracking (average, p95, p99).
|
||||
- Rolling buffer of 1000 latency samples.
|
||||
- Exposed via `ILmxProxyClient.GetMetrics()`.
|
||||
|
||||
## 9. Service Hosting
|
||||
|
||||
### 9.1 Topshelf Windows Service
|
||||
|
||||
- Service name: `ZB.MOM.WW.LmxProxy.Host`
|
||||
- Display name: `SCADA Bridge LMX Proxy`
|
||||
- Starts automatically on boot.
|
||||
|
||||
### 9.2 Service Recovery (Windows SCM)
|
||||
|
||||
| Failure | Restart Delay |
|
||||
|---------|--------------|
|
||||
| First | 1 minute |
|
||||
| Second | 5 minutes |
|
||||
| Subsequent | 10 minutes |
|
||||
| Reset period | 1 day |
|
||||
|
||||
### 9.3 Startup Sequence
|
||||
|
||||
1. Load configuration from `appsettings.json` + environment variables.
|
||||
2. Configure Serilog (console + file sinks).
|
||||
3. Validate configuration.
|
||||
4. Check/generate TLS certificates (if TLS enabled).
|
||||
5. Initialize services: PerformanceMetrics, ApiKeyService, MxAccessClient, SubscriptionManager, SessionManager, HealthCheckService, StatusReportService.
|
||||
6. Connect to MxAccess synchronously (timeout: 30 seconds).
|
||||
7. Start auto-reconnect monitor loop (if enabled).
|
||||
8. Start gRPC server on configured port.
|
||||
9. Start HTTP status web server.
|
||||
|
||||
### 9.4 Shutdown Sequence
|
||||
|
||||
1. Cancel reconnect monitor (5-second wait).
|
||||
2. Graceful gRPC server shutdown (10-second timeout, then kill).
|
||||
3. Stop status web server (5-second wait).
|
||||
4. Dispose all components in reverse order.
|
||||
5. Disconnect from MxAccess (10-second timeout).
|
||||
|
||||
## 10. Configuration
|
||||
|
||||
All configuration is via `appsettings.json` bound to `LmxProxyConfiguration`. Key settings:
|
||||
|
||||
| Section | Setting | Default |
|
||||
|---------|---------|---------|
|
||||
| Root | GrpcPort | 50051 |
|
||||
| Root | ApiKeyConfigFile | `apikeys.json` |
|
||||
| Connection | MonitorIntervalSeconds | 5 |
|
||||
| Connection | ConnectionTimeoutSeconds | 30 |
|
||||
| Connection | ReadTimeoutSeconds | 5 |
|
||||
| Connection | WriteTimeoutSeconds | 5 |
|
||||
| Connection | MaxConcurrentOperations | 10 |
|
||||
| Connection | AutoReconnect | true |
|
||||
| Subscription | ChannelCapacity | 1000 |
|
||||
| Subscription | ChannelFullMode | DropOldest |
|
||||
| Tls | Enabled | false |
|
||||
| Tls | RequireClientCertificate | false |
|
||||
| WebServer | Enabled | true |
|
||||
| WebServer | Port | 8080 |
|
||||
|
||||
Configuration is validated at startup. Invalid values cause the service to fail to start.
|
||||
|
||||
## 11. Logging
|
||||
|
||||
- Serilog with console and file sinks.
|
||||
- File sink: `logs/lmxproxy-.txt`, daily rolling, 30 files retained.
|
||||
- Default level: Information. Overrides: Microsoft=Warning, System=Warning, Grpc=Information.
|
||||
- Enrichment: FromLogContext, WithMachineName, WithThreadId.
|
||||
|
||||
## 12. Constraints
|
||||
|
||||
- Host **must** target x86 and .NET Framework 4.8 (MXAccess is 32-bit COM).
|
||||
- Host uses `Grpc.Core` (deprecated C-core library), required because .NET 4.8 does not support `Grpc.Net.Server`.
|
||||
- Client targets .NET 10 and runs in ScadaLink central/site clusters.
|
||||
- MxAccess COM operations require STA thread context (wrapped in `Task.Run`).
|
||||
- The solution file uses `.slnx` format.
|
||||
|
||||
## 13. Protocol
|
||||
|
||||
The protocol specification is defined in `lmxproxy_updates.md`, which is the authoritative source of truth. All RPC signatures, message schemas, and behavioral specifications are per that document.
|
||||
|
||||
### 13.1 Value System (TypedValue)
|
||||
|
||||
Values are transmitted in their native protobuf types via a `TypedValue` oneof: bool, int32, int64, float, double, string, bytes, datetime (int64 UTC Ticks), and typed arrays. An unset oneof represents null. No string serialization or parsing heuristics are used.
|
||||
|
||||
### 13.2 Quality System (QualityCode)
|
||||
|
||||
Quality is a structured `QualityCode` message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Supports AVEVA-aligned quality sub-codes (e.g., `BadSensorFailure` = `0x806D0000`, `GoodLocalOverride` = `0x00D80000`, `BadWaitingForInitialData` = `0x80320000`). See Component-Protocol for the full quality code table.
|
||||
|
||||
### 13.3 Migration from V1
|
||||
|
||||
The current codebase implements the v1 protocol (string-encoded values, three-state string quality). The v2 protocol is a clean break — all clients and servers will be updated simultaneously. No backward compatibility layer. This is appropriate because LmxProxy is an internal protocol with a small, controlled client count.
|
||||
|
||||
## 14. Component List (10 Components)
|
||||
|
||||
| # | Component | Description |
|
||||
|---|-----------|-------------|
|
||||
| 1 | GrpcServer | gRPC service implementation, session validation, request routing |
|
||||
| 2 | MxAccessClient | MXAccess COM interop wrapper, connection lifecycle, read/write/subscribe |
|
||||
| 3 | SessionManager | Client session tracking and lifecycle |
|
||||
| 4 | Security | API key authentication, role-based authorization, TLS management |
|
||||
| 5 | SubscriptionManager | Tag subscription lifecycle, channel-based update delivery, backpressure |
|
||||
| 6 | Configuration | appsettings.json structure, validation, options binding |
|
||||
| 7 | HealthAndMetrics | Health checks, performance metrics, status web server |
|
||||
| 8 | ServiceHost | Topshelf hosting, startup/shutdown, logging setup, service recovery |
|
||||
| 9 | Client | LmxProxyClient library, builder, retry, streaming, DI integration |
|
||||
| 10 | Protocol | gRPC protocol specification, proto definition, code-first contracts |
|
||||
167
deprecated/lmxproxy/docs/sta_gap.md
Normal file
167
deprecated/lmxproxy/docs/sta_gap.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# STA Message Pump Gap — OnWriteComplete COM Callback
|
||||
|
||||
**Status**: Documented gap. Fire-and-forget workaround in place (deviation #7). Full fix deferred until secured/verified writes are needed.
|
||||
|
||||
## When This Matters
|
||||
|
||||
The current fire-and-forget write approach works for **supervisory writes** where:
|
||||
- Security is handled at the LmxProxy API key level, not MxAccess attribute level
|
||||
- Writes succeed synchronously (no secured/verified write requirements)
|
||||
- Write confirmation is handled at the application level (read-back in `WriteBatchAndWait`)
|
||||
|
||||
This gap becomes a **blocking issue** if any of these scenarios arise:
|
||||
- **Secured writes (MxAccess error 1012)**: Attribute requires ArchestrA user authentication. `OnWriteComplete` returns the error, and the caller must retry with `WriteSecured()`.
|
||||
- **Verified writes (MxAccess error 1013)**: Attribute requires two-user verification. Same retry pattern.
|
||||
- **Write failure detection**: MxAccess accepts the `Write()` call but can't complete it (e.g., downstream device failure). `OnWriteComplete` is the only notification of this — without it, the caller assumes success.
|
||||
|
||||
## Root Cause
|
||||
|
||||
The MxAccess documentation (Write() Method) states: *"Upon completion of the write, your program receives notification of the success/failure status through the OnWriteComplete() event"* and *"that item should not be taken off advise or removed from the internal tables until the OnWriteComplete() event is received."*
|
||||
|
||||
`OnWriteComplete` **should** fire after every `Write()` call. It doesn't in our service because:
|
||||
- MxAccess is a COM component designed for Windows Forms apps with a UI message loop
|
||||
- COM event callbacks are delivered via the Windows message pump
|
||||
- Our Topshelf Windows service has no message pump — `Write()` is called from thread pool threads (`Task.Run`) with no message loop
|
||||
- `OnDataChange` works because MxAccess fires it proactively on its own internal threads; `OnWriteComplete` is a response callback that needs message-pump-based marshaling
|
||||
|
||||
## Correct Solution: Dedicated STA Thread + `Application.Run()`
|
||||
|
||||
Based on research (Stephen Toub, MSDN Magazine 2007; Microsoft Learn COM interop docs; community patterns), the correct approach is a dedicated STA thread running a Windows Forms message pump via `Application.Run()`.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Service main thread (MTA)
|
||||
│
|
||||
├── gRPC server threads (handle client RPCs)
|
||||
│ │
|
||||
│ └── Marshal COM calls via Form.BeginInvoke() ──┐
|
||||
│ │
|
||||
└── Dedicated STA thread │
|
||||
│ │
|
||||
├── Creates LMXProxyServerClass COM object │
|
||||
├── Wires event handlers (OnDataChange, │
|
||||
│ OnWriteComplete, OperationComplete) │
|
||||
├── Runs Application.Run() ← continuous │
|
||||
│ message pump │
|
||||
│ │
|
||||
└── Hidden Form receives BeginInvoke calls ◄────┘
|
||||
│
|
||||
├── Executes COM operations (Read, Write,
|
||||
│ AddItem, AdviseSupervisory, etc.)
|
||||
│
|
||||
└── COM callbacks delivered via message pump
|
||||
(OnWriteComplete, OnDataChange, etc.)
|
||||
```
|
||||
|
||||
### Implementation Pattern
|
||||
|
||||
```csharp
|
||||
// In MxAccessClient constructor or Start():
|
||||
var initDone = new ManualResetEventSlim(false);
|
||||
|
||||
_staThread = new Thread(() =>
|
||||
{
|
||||
// 1. Create hidden form for marshaling
|
||||
_marshalForm = new Form();
|
||||
_marshalForm.CreateHandle(); // force HWND creation without showing
|
||||
|
||||
// 2. Create COM objects ON THIS THREAD
|
||||
_lmxProxy = new LMXProxyServerClass();
|
||||
_lmxProxy.OnDataChange += OnDataChange;
|
||||
_lmxProxy.OnWriteComplete += OnWriteComplete;
|
||||
|
||||
// 3. Signal that init is complete
|
||||
initDone.Set();
|
||||
|
||||
// 4. Run message pump (blocks forever, pumps COM callbacks)
|
||||
Application.Run();
|
||||
});
|
||||
_staThread.Name = "MxAccess-STA";
|
||||
_staThread.IsBackground = true;
|
||||
_staThread.SetApartmentState(ApartmentState.STA);
|
||||
_staThread.Start();
|
||||
|
||||
initDone.Wait(); // wait for COM objects to be ready
|
||||
```
|
||||
|
||||
### Dispatching Work to the STA Thread
|
||||
|
||||
```csharp
|
||||
// All COM calls must go through the hidden form's invoke:
|
||||
public Task<Vtq> ReadAsync(string address, CancellationToken ct)
|
||||
{
|
||||
var tcs = new TaskCompletionSource<Vtq>();
|
||||
_marshalForm.BeginInvoke((Action)(() =>
|
||||
{
|
||||
try
|
||||
{
|
||||
// COM call executes on STA thread
|
||||
int handle = _lmxProxy.AddItem(_connectionHandle, address);
|
||||
_lmxProxy.AdviseSupervisory(_connectionHandle, handle);
|
||||
// ... etc
|
||||
tcs.SetResult(vtq);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
tcs.SetException(ex);
|
||||
}
|
||||
}));
|
||||
return tcs.Task;
|
||||
}
|
||||
```
|
||||
|
||||
### Shutdown
|
||||
|
||||
```csharp
|
||||
// To stop the message pump:
|
||||
_marshalForm.BeginInvoke((Action)(() =>
|
||||
{
|
||||
// Clean up COM objects on STA thread
|
||||
// ... UnAdvise, RemoveItem, Unregister ...
|
||||
Marshal.ReleaseComObject(_lmxProxy);
|
||||
Application.ExitThread(); // stops Application.Run()
|
||||
}));
|
||||
_staThread.Join(TimeSpan.FromSeconds(10));
|
||||
```
|
||||
|
||||
### Why Our First Attempt Failed
|
||||
|
||||
Our original `StaDispatchThread` (Phase 2) used `BlockingCollection.Take()` to wait for work items, with `Application.DoEvents()` between items. This failed because:
|
||||
|
||||
| Our failed approach | Correct approach |
|
||||
|---|---|
|
||||
| `BlockingCollection.Take()` blocks the STA thread, preventing the message pump from running | `Application.Run()` runs continuously, pumping messages at all times |
|
||||
| `Application.DoEvents()` only pumps messages already in the queue at that instant | Message pump runs an infinite loop, processing messages as they arrive |
|
||||
| Work dispatched by enqueueing to `BlockingCollection` | Work dispatched via `Form.BeginInvoke()` which posts a Windows message to the STA thread's queue |
|
||||
|
||||
The key difference: `BeginInvoke` posts a `WM_` message that the message pump processes alongside COM callbacks. `BlockingCollection` bypasses the message pump entirely.
|
||||
|
||||
## Drawbacks of the STA Approach
|
||||
|
||||
### Performance
|
||||
- **All COM calls serialize onto one thread.** Under load (batch reads of 100+ tags), operations queue up single-file. Current `Task.Run` approach allows MxAccess's internal marshaling to handle some concurrency.
|
||||
- **Double context switch per operation.** Caller → STA thread (invoke) → wait → back to caller. Adds ~0.1-1ms per call. Negligible for single reads, noticeable for large batch operations.
|
||||
|
||||
### Safety
|
||||
- **Single point of failure.** If the STA thread dies, all MxAccess operations stop. Recovery requires tearing down and recreating the thread + all COM objects.
|
||||
- **Deadlock risk.** If STA thread code synchronously waits on something that needs the STA thread (circular dependency), the message pump freezes. All waits must be async/non-blocking.
|
||||
- **Reentrancy.** While pumping messages, inbound COM callbacks can reenter your code during another COM call. Event handlers must be reentrant-safe.
|
||||
|
||||
### Complexity
|
||||
- Every COM call needs `_marshalForm.BeginInvoke()` wrapping.
|
||||
- COM object affinity to STA thread is hard to enforce at compile time.
|
||||
- Unit tests need STA thread support or must use fakes.
|
||||
|
||||
## Decision
|
||||
|
||||
Fire-and-forget is the correct choice for now. Revisit when secured/verified writes are needed.
|
||||
|
||||
## References
|
||||
|
||||
- [.NET Matters: Handling Messages in Console Apps — Stephen Toub, MSDN Magazine 2007](https://learn.microsoft.com/en-us/archive/msdn-magazine/2007/june/net-matters-handling-messages-in-console-apps)
|
||||
- [How to: Support COM Interop by Displaying Each Windows Form on Its Own Thread — Microsoft Learn](https://learn.microsoft.com/en-us/dotnet/desktop/winforms/advanced/how-to-support-com-interop-by-displaying-each-windows-form-on-its-own-thread)
|
||||
- [.NET Windows Service needs STAThread — hirenppatel](https://hirenppatel.wordpress.com/2012/11/24/net-windows-service-needs-to-use-stathread-instead-of-mtathread/)
|
||||
- [Application.Run() In a Windows Service — PC Review](https://www.pcreview.co.uk/threads/application-run-in-a-windows-service.3087159/)
|
||||
- [Build a message pump for a Windows service? — CodeProject](https://www.codeproject.com/Messages/1365966/Build-a-message-pump-for-a-Windows-service.aspx)
|
||||
- MxAccess Toolkit User's Guide — Write() Method, OnWriteComplete Callback sections
|
||||
Reference in New Issue
Block a user