LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL adapter files, and related docs to deprecated/. Removed LmxProxy registration from DataConnectionFactory, project reference from DCL, protocol option from UI, and cleaned up all requirement docs.
10 KiB
LmxProxy v2 Rebuild — Design Document
Date: 2026-03-21 Status: Approved Scope: Complete rebuild of LmxProxy Host and Client with v2 protocol
1. Overview
Rebuild the LmxProxy gRPC proxy service from scratch, implementing the v2 protocol (TypedValue + QualityCode) as defined in docs/lmxproxy_updates.md. The existing code in src/ is retained as reference only. No backward compatibility with v1.
2. Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| gRPC server for Host | Grpc.Core (C-core) | Only option for .NET Framework 4.8 server-side |
| Service hosting | Topshelf | Proven, already deployed, simple install/uninstall |
| Protocol version | v2 only, clean break | Small controlled client count, no value in v1 compat |
| Shared code between projects | None — fully independent | Different .NET runtimes (.NET Fx 4.8 vs .NET 10), wire compat is the contract |
| Client retry library | Polly v8+ | Building fresh on .NET 10, modern API |
| Testing strategy | Unit tests during implementation, integration tests after Client functional | Phased approach, real hardware validation on windev |
3. Architecture
3.1 Host (.NET Framework 4.8, x86)
Program.cs (Topshelf entry point)
└── LmxProxyService (lifecycle manager)
├── Configuration (appsettings.json binding + validation)
├── MxAccessClient (COM interop, STA dispatch thread)
│ ├── Connection state machine
│ ├── Read/Write with semaphore concurrency
│ ├── Subscription storage for reconnect replay
│ └── Auto-reconnect loop (5s interval)
├── SessionManager (ConcurrentDictionary, 5-min inactivity scavenging)
├── SubscriptionManager (per-client channels, shared MxAccess subscriptions)
├── ApiKeyService (JSON file, FileSystemWatcher hot-reload)
├── ScadaGrpcService (proto-generated, all 10 RPCs)
│ └── ApiKeyInterceptor (x-api-key header enforcement)
├── PerformanceMetrics (per-op tracking, p95, 60s log)
├── HealthCheckService (basic + detailed with test tag)
└── StatusWebServer (HTML dashboard, JSON status, health endpoint)
3.2 Client (.NET 10, AnyCPU)
ILmxProxyClient (public interface)
└── LmxProxyClient (partial class)
├── Connection (GrpcChannel, protobuf-net.Grpc, 30s keep-alive)
├── Read/Write/Subscribe operations
├── CodeFirstSubscription (IAsyncEnumerable streaming)
├── ClientMetrics (p95/p99, 1000-sample buffer)
└── Disposal (session disconnect, channel cleanup)
LmxProxyClientBuilder (fluent builder, Polly v8 resilience pipeline)
ILmxProxyClientFactory + LmxProxyClientFactory (config-based creation)
ServiceCollectionExtensions (DI registrations)
StreamingExtensions (batched reads/writes, parallel processing)
Domain/
├── ScadaContracts.cs (IScadaService + all DataContract messages)
├── Quality.cs, QualityExtensions.cs
├── Vtq.cs
└── ConnectionState.cs
3.3 Wire Compatibility
The .proto file is the single source of truth for the wire format. Host generates server stubs from it. Client implements code-first contracts ([DataContract]/[ServiceContract]) that mirror the proto exactly — same field numbers, names, nesting, and streaming shapes. Cross-stack serialization tests verify compatibility.
4. Protocol (v2)
4.1 TypedValue System
Protobuf oneof carrying native types:
| Case | Proto Type | .NET Type |
|---|---|---|
| bool_value | bool | bool |
| int32_value | int32 | int |
| int64_value | int64 | long |
| float_value | float | float |
| double_value | double | double |
| string_value | string | string |
| bytes_value | bytes | byte[] |
| datetime_value | int64 (UTC Ticks) | DateTime |
| array_value | ArrayValue | typed arrays |
Unset oneof = null. No string serialization heuristics.
4.2 COM Variant Coercion Table
| COM Variant Type | TypedValue Case | Notes |
|---|---|---|
| VT_BOOL | bool_value | |
| VT_I2 (short) | int32_value | Widened |
| VT_I4 (int) | int32_value | |
| VT_I8 (long) | int64_value | |
| VT_UI2 (ushort) | int32_value | Widened |
| VT_UI4 (uint) | int64_value | Widened to avoid sign issues |
| VT_UI8 (ulong) | int64_value | Truncation risk logged if > long.MaxValue |
| VT_R4 (float) | float_value | |
| VT_R8 (double) | double_value | |
| VT_BSTR (string) | string_value | |
| VT_DATE (DateTime) | datetime_value | Converted to UTC Ticks |
| VT_DECIMAL | double_value | Precision loss logged |
| VT_CY (Currency) | double_value | |
| VT_NULL, VT_EMPTY, DBNull | unset oneof | Represents null |
| VT_ARRAY | array_value | Element type determines ArrayValue field |
| VT_UNKNOWN | string_value | ToString() fallback, logged as warning |
4.3 QualityCode System
status_code (uint32, OPC UA-compatible) is canonical. symbolic_name is derived from a lookup table, never set independently.
Category derived from high bits:
0x00xxxxxx= Good0x40xxxxxx= Uncertain0x80xxxxxx= Bad
Domain Quality enum uses byte values for the low-order byte, with extension methods IsGood(), IsBad(), IsUncertain().
4.4 Error Model
| Error Type | Mechanism | Examples |
|---|---|---|
| Infrastructure | gRPC StatusCode | Unauthenticated (bad API key), PermissionDenied (ReadOnly write), InvalidArgument (bad session), Unavailable (MxAccess down) |
| Business outcome | Payload success/message fields |
Tag read failure, write type mismatch, batch partial failure, WriteBatchAndWait flag timeout |
| Subscription | gRPC StatusCode on stream | Unauthenticated (invalid session), Internal (unexpected error) |
5. COM Threading Model
MxAccess is an STA COM component. All COM operations execute on a dedicated STA thread with a BlockingCollection<Action> dispatch queue:
MxAccessClientcreates a single STA thread at construction- All COM calls (connect, read, write, subscribe, disconnect) are dispatched to this thread via the queue
- Callers await a
TaskCompletionSource<T>that the STA thread completes after the COM call - The STA thread runs a message pump loop (
Application.Runor manualMSGpump) - On disposal, a sentinel is enqueued and the thread joins with a 10-second timeout
This replaces the fragile Task.Run + SemaphoreSlim pattern in the reference code.
6. Session Lifecycle
- Sessions created on
Connectwith GUID "N" format (32-char hex) - Tracked in
ConcurrentDictionary<string, SessionInfo> - Inactivity scavenging: sessions not accessed for 5 minutes are automatically terminated. Client keep-alive pings (30s) keep legitimate sessions alive.
- On termination: subscriptions cleaned up, session removed from dictionary
- All sessions lost on service restart (in-memory only)
7. Subscription Semantics
- Shared MxAccess subscriptions: first client to subscribe creates the underlying MxAccess subscription. Last to unsubscribe disposes it. Ref-counted.
- Sampling rate: when multiple clients subscribe to the same tag with different
sampling_ms, the fastest (lowest non-zero) rate is used for the MxAccess subscription. All clients receive updates at this rate. - Per-client channels: each client gets an independent
BoundedChannel<VtqMessage>(capacity 1000, DropOldest). One slow consumer's drops do not affect other clients. - MxAccess disconnect: all subscribed clients receive a bad-quality notification for all their subscribed tags.
- Session termination: all subscriptions for that session are cleaned up.
8. Authentication
x-api-keygRPC metadata header is the authoritative authentication mechanismConnectRequest.api_keyis accepted but the interceptor is the enforcement point- API keys loaded from JSON file with FileSystemWatcher hot-reload (1-second debounce)
- Auto-generates default file with two random keys (ReadOnly + ReadWrite) if missing
- Write-protected RPCs: Write, WriteBatch, WriteBatchAndWait
9. Phasing
| Phase | Scope | Depends On |
|---|---|---|
| 1 | Protocol & Domain Types | — |
| 2 | Host Core (MxAccessClient, SessionManager, SubscriptionManager) | Phase 1 |
| 3 | Host gRPC Server, Security, Configuration, Service Hosting | Phase 2 |
| 4 | Host Health, Metrics, Status Server | Phase 3 |
| 5 | Client Core | Phase 1 |
| 6 | Client Extras (Builder, Factory, DI, Streaming) | Phase 5 |
| 7 | Integration Tests & Deployment | Phases 4 + 6 |
Phases 2-4 (Host) and 5-6 (Client) can proceed in parallel after Phase 1.
10. Guardrails
- Proto is the source of truth — any wire format question is resolved by reading
scada.proto, not the code-first contracts. - No v1 code in the new build — reference only. Do not copy-paste and modify; write fresh.
- Cross-stack tests in Phase 1 — Host proto serialize → Client code-first deserialize (and vice versa) before any business logic.
- COM calls only on STA thread — no
Task.Runfor COM operations. All go through the dispatch queue. - status_code is canonical for quality —
symbolic_nameis always derived, never independently set. - Unit tests before integration — every phase includes unit tests. Integration tests are Phase 7 only.
- Each phase must compile and pass tests before the next phase begins.
- No string serialization heuristics — v2 uses native TypedValue. No
double.TryParseorbool.TryParseon values.
11. Resolved Conflicts
| Conflict | Resolution |
|---|---|
| WriteBatchAndWait signature (MxAccessClient vs Protocol) | Follow Protocol spec: write items, poll flagTag for flagValue. IScadaClient interface matches protocol semantics. |
| Builder default port 5050 vs Host 50051 | Standardize builder default to 50051 |
| Auth in metadata vs payload | x-api-key header is authoritative; ConnectRequest.api_key accepted but interceptor enforces |
12. Reference Code
The existing code remains in src/ as src-reference/ for consultation:
src-reference/ZB.MOM.WW.LmxProxy.Host/— v1 Host implementationsrc-reference/ZB.MOM.WW.LmxProxy.Client/— v1 Client implementation
Key reference files for COM interop patterns:
Implementation/MxAccessClient.Connection.cs— COM object lifecycleImplementation/MxAccessClient.EventHandlers.cs— MxAccess callbacksImplementation/MxAccessClient.Subscription.cs— Advise/Unadvise patterns