# LmxProxy v2 Rebuild — Design Document **Date**: 2026-03-21 **Status**: Approved **Scope**: Complete rebuild of LmxProxy Host and Client with v2 protocol ## 1. Overview Rebuild the LmxProxy gRPC proxy service from scratch, implementing the v2 protocol (TypedValue + QualityCode) as defined in `docs/lmxproxy_updates.md`. The existing code in `src/` is retained as reference only. No backward compatibility with v1. ## 2. Key Design Decisions | Decision | Choice | Rationale | |----------|--------|-----------| | gRPC server for Host | Grpc.Core (C-core) | Only option for .NET Framework 4.8 server-side | | Service hosting | Topshelf | Proven, already deployed, simple install/uninstall | | Protocol version | v2 only, clean break | Small controlled client count, no value in v1 compat | | Shared code between projects | None — fully independent | Different .NET runtimes (.NET Fx 4.8 vs .NET 10), wire compat is the contract | | Client retry library | Polly v8+ | Building fresh on .NET 10, modern API | | Testing strategy | Unit tests during implementation, integration tests after Client functional | Phased approach, real hardware validation on windev | ## 3. Architecture ### 3.1 Host (.NET Framework 4.8, x86) ``` Program.cs (Topshelf entry point) └── LmxProxyService (lifecycle manager) ├── Configuration (appsettings.json binding + validation) ├── MxAccessClient (COM interop, STA dispatch thread) │ ├── Connection state machine │ ├── Read/Write with semaphore concurrency │ ├── Subscription storage for reconnect replay │ └── Auto-reconnect loop (5s interval) ├── SessionManager (ConcurrentDictionary, 5-min inactivity scavenging) ├── SubscriptionManager (per-client channels, shared MxAccess subscriptions) ├── ApiKeyService (JSON file, FileSystemWatcher hot-reload) ├── ScadaGrpcService (proto-generated, all 10 RPCs) │ └── ApiKeyInterceptor (x-api-key header enforcement) ├── PerformanceMetrics (per-op tracking, p95, 60s log) ├── HealthCheckService (basic + detailed with test tag) └── StatusWebServer (HTML dashboard, JSON status, health endpoint) ``` ### 3.2 Client (.NET 10, AnyCPU) ``` ILmxProxyClient (public interface) └── LmxProxyClient (partial class) ├── Connection (GrpcChannel, protobuf-net.Grpc, 30s keep-alive) ├── Read/Write/Subscribe operations ├── CodeFirstSubscription (IAsyncEnumerable streaming) ├── ClientMetrics (p95/p99, 1000-sample buffer) └── Disposal (session disconnect, channel cleanup) LmxProxyClientBuilder (fluent builder, Polly v8 resilience pipeline) ILmxProxyClientFactory + LmxProxyClientFactory (config-based creation) ServiceCollectionExtensions (DI registrations) StreamingExtensions (batched reads/writes, parallel processing) Domain/ ├── ScadaContracts.cs (IScadaService + all DataContract messages) ├── Quality.cs, QualityExtensions.cs ├── Vtq.cs └── ConnectionState.cs ``` ### 3.3 Wire Compatibility The `.proto` file is the single source of truth for the wire format. Host generates server stubs from it. Client implements code-first contracts (`[DataContract]`/`[ServiceContract]`) that mirror the proto exactly — same field numbers, names, nesting, and streaming shapes. Cross-stack serialization tests verify compatibility. ## 4. Protocol (v2) ### 4.1 TypedValue System Protobuf `oneof` carrying native types: | Case | Proto Type | .NET Type | |------|-----------|-----------| | bool_value | bool | bool | | int32_value | int32 | int | | int64_value | int64 | long | | float_value | float | float | | double_value | double | double | | string_value | string | string | | bytes_value | bytes | byte[] | | datetime_value | int64 (UTC Ticks) | DateTime | | array_value | ArrayValue | typed arrays | Unset `oneof` = null. No string serialization heuristics. ### 4.2 COM Variant Coercion Table | COM Variant Type | TypedValue Case | Notes | |-----------------|-----------------|-------| | VT_BOOL | bool_value | | | VT_I2 (short) | int32_value | Widened | | VT_I4 (int) | int32_value | | | VT_I8 (long) | int64_value | | | VT_UI2 (ushort) | int32_value | Widened | | VT_UI4 (uint) | int64_value | Widened to avoid sign issues | | VT_UI8 (ulong) | int64_value | Truncation risk logged if > long.MaxValue | | VT_R4 (float) | float_value | | | VT_R8 (double) | double_value | | | VT_BSTR (string) | string_value | | | VT_DATE (DateTime) | datetime_value | Converted to UTC Ticks | | VT_DECIMAL | double_value | Precision loss logged | | VT_CY (Currency) | double_value | | | VT_NULL, VT_EMPTY, DBNull | unset oneof | Represents null | | VT_ARRAY | array_value | Element type determines ArrayValue field | | VT_UNKNOWN | string_value | ToString() fallback, logged as warning | ### 4.3 QualityCode System `status_code` (uint32, OPC UA-compatible) is canonical. `symbolic_name` is derived from a lookup table, never set independently. Category derived from high bits: - `0x00xxxxxx` = Good - `0x40xxxxxx` = Uncertain - `0x80xxxxxx` = Bad Domain `Quality` enum uses byte values for the low-order byte, with extension methods `IsGood()`, `IsBad()`, `IsUncertain()`. ### 4.4 Error Model | Error Type | Mechanism | Examples | |-----------|-----------|----------| | Infrastructure | gRPC StatusCode | Unauthenticated (bad API key), PermissionDenied (ReadOnly write), InvalidArgument (bad session), Unavailable (MxAccess down) | | Business outcome | Payload `success`/`message` fields | Tag read failure, write type mismatch, batch partial failure, WriteBatchAndWait flag timeout | | Subscription | gRPC StatusCode on stream | Unauthenticated (invalid session), Internal (unexpected error) | ## 5. COM Threading Model MxAccess is an STA COM component. All COM operations execute on a **dedicated STA thread** with a `BlockingCollection` dispatch queue: - `MxAccessClient` creates a single STA thread at construction - All COM calls (connect, read, write, subscribe, disconnect) are dispatched to this thread via the queue - Callers await a `TaskCompletionSource` that the STA thread completes after the COM call - The STA thread runs a message pump loop (`Application.Run` or manual `MSG` pump) - On disposal, a sentinel is enqueued and the thread joins with a 10-second timeout This replaces the fragile `Task.Run` + `SemaphoreSlim` pattern in the reference code. ## 6. Session Lifecycle - Sessions created on `Connect` with GUID "N" format (32-char hex) - Tracked in `ConcurrentDictionary` - **Inactivity scavenging**: sessions not accessed for 5 minutes are automatically terminated. Client keep-alive pings (30s) keep legitimate sessions alive. - On termination: subscriptions cleaned up, session removed from dictionary - All sessions lost on service restart (in-memory only) ## 7. Subscription Semantics - **Shared MxAccess subscriptions**: first client to subscribe creates the underlying MxAccess subscription. Last to unsubscribe disposes it. Ref-counted. - **Sampling rate**: when multiple clients subscribe to the same tag with different `sampling_ms`, the fastest (lowest non-zero) rate is used for the MxAccess subscription. All clients receive updates at this rate. - **Per-client channels**: each client gets an independent `BoundedChannel` (capacity 1000, DropOldest). One slow consumer's drops do not affect other clients. - **MxAccess disconnect**: all subscribed clients receive a bad-quality notification for all their subscribed tags. - **Session termination**: all subscriptions for that session are cleaned up. ## 8. Authentication - `x-api-key` gRPC metadata header is the authoritative authentication mechanism - `ConnectRequest.api_key` is accepted but the interceptor is the enforcement point - API keys loaded from JSON file with FileSystemWatcher hot-reload (1-second debounce) - Auto-generates default file with two random keys (ReadOnly + ReadWrite) if missing - Write-protected RPCs: Write, WriteBatch, WriteBatchAndWait ## 9. Phasing | Phase | Scope | Depends On | |-------|-------|------------| | 1 | Protocol & Domain Types | — | | 2 | Host Core (MxAccessClient, SessionManager, SubscriptionManager) | Phase 1 | | 3 | Host gRPC Server, Security, Configuration, Service Hosting | Phase 2 | | 4 | Host Health, Metrics, Status Server | Phase 3 | | 5 | Client Core | Phase 1 | | 6 | Client Extras (Builder, Factory, DI, Streaming) | Phase 5 | | 7 | Integration Tests & Deployment | Phases 4 + 6 | Phases 2-4 (Host) and 5-6 (Client) can proceed in parallel after Phase 1. ## 10. Guardrails 1. **Proto is the source of truth** — any wire format question is resolved by reading `scada.proto`, not the code-first contracts. 2. **No v1 code in the new build** — reference only. Do not copy-paste and modify; write fresh. 3. **Cross-stack tests in Phase 1** — Host proto serialize → Client code-first deserialize (and vice versa) before any business logic. 4. **COM calls only on STA thread** — no `Task.Run` for COM operations. All go through the dispatch queue. 5. **status_code is canonical for quality** — `symbolic_name` is always derived, never independently set. 6. **Unit tests before integration** — every phase includes unit tests. Integration tests are Phase 7 only. 7. **Each phase must compile and pass tests** before the next phase begins. 8. **No string serialization heuristics** — v2 uses native TypedValue. No `double.TryParse` or `bool.TryParse` on values. ## 11. Resolved Conflicts | Conflict | Resolution | |----------|-----------| | WriteBatchAndWait signature (MxAccessClient vs Protocol) | Follow Protocol spec: write items, poll flagTag for flagValue. IScadaClient interface matches protocol semantics. | | Builder default port 5050 vs Host 50051 | Standardize builder default to 50051 | | Auth in metadata vs payload | x-api-key header is authoritative; ConnectRequest.api_key accepted but interceptor enforces | ## 12. Reference Code The existing code remains in `src/` as `src-reference/` for consultation: - `src-reference/ZB.MOM.WW.LmxProxy.Host/` — v1 Host implementation - `src-reference/ZB.MOM.WW.LmxProxy.Client/` — v1 Client implementation Key reference files for COM interop patterns: - `Implementation/MxAccessClient.Connection.cs` — COM object lifecycle - `Implementation/MxAccessClient.EventHandlers.cs` — MxAccess callbacks - `Implementation/MxAccessClient.Subscription.cs` — Advise/Unadvise patterns