Files
scadalink-design/deprecated/lmxproxy/docs/plans/2026-03-21-lmxproxy-v2-rebuild-design.md
Joseph Doherty 9dccf8e72f deprecate(lmxproxy): move all LmxProxy code, tests, and docs to deprecated/
LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL
adapter files, and related docs to deprecated/. Removed LmxProxy registration
from DataConnectionFactory, project reference from DCL, protocol option from
UI, and cleaned up all requirement docs.
2026-04-08 15:56:23 -04:00

10 KiB

LmxProxy v2 Rebuild — Design Document

Date: 2026-03-21 Status: Approved Scope: Complete rebuild of LmxProxy Host and Client with v2 protocol

1. Overview

Rebuild the LmxProxy gRPC proxy service from scratch, implementing the v2 protocol (TypedValue + QualityCode) as defined in docs/lmxproxy_updates.md. The existing code in src/ is retained as reference only. No backward compatibility with v1.

2. Key Design Decisions

Decision Choice Rationale
gRPC server for Host Grpc.Core (C-core) Only option for .NET Framework 4.8 server-side
Service hosting Topshelf Proven, already deployed, simple install/uninstall
Protocol version v2 only, clean break Small controlled client count, no value in v1 compat
Shared code between projects None — fully independent Different .NET runtimes (.NET Fx 4.8 vs .NET 10), wire compat is the contract
Client retry library Polly v8+ Building fresh on .NET 10, modern API
Testing strategy Unit tests during implementation, integration tests after Client functional Phased approach, real hardware validation on windev

3. Architecture

3.1 Host (.NET Framework 4.8, x86)

Program.cs (Topshelf entry point)
  └── LmxProxyService (lifecycle manager)
        ├── Configuration (appsettings.json binding + validation)
        ├── MxAccessClient (COM interop, STA dispatch thread)
        │     ├── Connection state machine
        │     ├── Read/Write with semaphore concurrency
        │     ├── Subscription storage for reconnect replay
        │     └── Auto-reconnect loop (5s interval)
        ├── SessionManager (ConcurrentDictionary, 5-min inactivity scavenging)
        ├── SubscriptionManager (per-client channels, shared MxAccess subscriptions)
        ├── ApiKeyService (JSON file, FileSystemWatcher hot-reload)
        ├── ScadaGrpcService (proto-generated, all 10 RPCs)
        │     └── ApiKeyInterceptor (x-api-key header enforcement)
        ├── PerformanceMetrics (per-op tracking, p95, 60s log)
        ├── HealthCheckService (basic + detailed with test tag)
        └── StatusWebServer (HTML dashboard, JSON status, health endpoint)

3.2 Client (.NET 10, AnyCPU)

ILmxProxyClient (public interface)
  └── LmxProxyClient (partial class)
        ├── Connection (GrpcChannel, protobuf-net.Grpc, 30s keep-alive)
        ├── Read/Write/Subscribe operations
        ├── CodeFirstSubscription (IAsyncEnumerable streaming)
        ├── ClientMetrics (p95/p99, 1000-sample buffer)
        └── Disposal (session disconnect, channel cleanup)

LmxProxyClientBuilder (fluent builder, Polly v8 resilience pipeline)
ILmxProxyClientFactory + LmxProxyClientFactory (config-based creation)
ServiceCollectionExtensions (DI registrations)
StreamingExtensions (batched reads/writes, parallel processing)

Domain/
  ├── ScadaContracts.cs (IScadaService + all DataContract messages)
  ├── Quality.cs, QualityExtensions.cs
  ├── Vtq.cs
  └── ConnectionState.cs

3.3 Wire Compatibility

The .proto file is the single source of truth for the wire format. Host generates server stubs from it. Client implements code-first contracts ([DataContract]/[ServiceContract]) that mirror the proto exactly — same field numbers, names, nesting, and streaming shapes. Cross-stack serialization tests verify compatibility.

4. Protocol (v2)

4.1 TypedValue System

Protobuf oneof carrying native types:

Case Proto Type .NET Type
bool_value bool bool
int32_value int32 int
int64_value int64 long
float_value float float
double_value double double
string_value string string
bytes_value bytes byte[]
datetime_value int64 (UTC Ticks) DateTime
array_value ArrayValue typed arrays

Unset oneof = null. No string serialization heuristics.

4.2 COM Variant Coercion Table

COM Variant Type TypedValue Case Notes
VT_BOOL bool_value
VT_I2 (short) int32_value Widened
VT_I4 (int) int32_value
VT_I8 (long) int64_value
VT_UI2 (ushort) int32_value Widened
VT_UI4 (uint) int64_value Widened to avoid sign issues
VT_UI8 (ulong) int64_value Truncation risk logged if > long.MaxValue
VT_R4 (float) float_value
VT_R8 (double) double_value
VT_BSTR (string) string_value
VT_DATE (DateTime) datetime_value Converted to UTC Ticks
VT_DECIMAL double_value Precision loss logged
VT_CY (Currency) double_value
VT_NULL, VT_EMPTY, DBNull unset oneof Represents null
VT_ARRAY array_value Element type determines ArrayValue field
VT_UNKNOWN string_value ToString() fallback, logged as warning

4.3 QualityCode System

status_code (uint32, OPC UA-compatible) is canonical. symbolic_name is derived from a lookup table, never set independently.

Category derived from high bits:

  • 0x00xxxxxx = Good
  • 0x40xxxxxx = Uncertain
  • 0x80xxxxxx = Bad

Domain Quality enum uses byte values for the low-order byte, with extension methods IsGood(), IsBad(), IsUncertain().

4.4 Error Model

Error Type Mechanism Examples
Infrastructure gRPC StatusCode Unauthenticated (bad API key), PermissionDenied (ReadOnly write), InvalidArgument (bad session), Unavailable (MxAccess down)
Business outcome Payload success/message fields Tag read failure, write type mismatch, batch partial failure, WriteBatchAndWait flag timeout
Subscription gRPC StatusCode on stream Unauthenticated (invalid session), Internal (unexpected error)

5. COM Threading Model

MxAccess is an STA COM component. All COM operations execute on a dedicated STA thread with a BlockingCollection<Action> dispatch queue:

  • MxAccessClient creates a single STA thread at construction
  • All COM calls (connect, read, write, subscribe, disconnect) are dispatched to this thread via the queue
  • Callers await a TaskCompletionSource<T> that the STA thread completes after the COM call
  • The STA thread runs a message pump loop (Application.Run or manual MSG pump)
  • On disposal, a sentinel is enqueued and the thread joins with a 10-second timeout

This replaces the fragile Task.Run + SemaphoreSlim pattern in the reference code.

6. Session Lifecycle

  • Sessions created on Connect with GUID "N" format (32-char hex)
  • Tracked in ConcurrentDictionary<string, SessionInfo>
  • Inactivity scavenging: sessions not accessed for 5 minutes are automatically terminated. Client keep-alive pings (30s) keep legitimate sessions alive.
  • On termination: subscriptions cleaned up, session removed from dictionary
  • All sessions lost on service restart (in-memory only)

7. Subscription Semantics

  • Shared MxAccess subscriptions: first client to subscribe creates the underlying MxAccess subscription. Last to unsubscribe disposes it. Ref-counted.
  • Sampling rate: when multiple clients subscribe to the same tag with different sampling_ms, the fastest (lowest non-zero) rate is used for the MxAccess subscription. All clients receive updates at this rate.
  • Per-client channels: each client gets an independent BoundedChannel<VtqMessage> (capacity 1000, DropOldest). One slow consumer's drops do not affect other clients.
  • MxAccess disconnect: all subscribed clients receive a bad-quality notification for all their subscribed tags.
  • Session termination: all subscriptions for that session are cleaned up.

8. Authentication

  • x-api-key gRPC metadata header is the authoritative authentication mechanism
  • ConnectRequest.api_key is accepted but the interceptor is the enforcement point
  • API keys loaded from JSON file with FileSystemWatcher hot-reload (1-second debounce)
  • Auto-generates default file with two random keys (ReadOnly + ReadWrite) if missing
  • Write-protected RPCs: Write, WriteBatch, WriteBatchAndWait

9. Phasing

Phase Scope Depends On
1 Protocol & Domain Types
2 Host Core (MxAccessClient, SessionManager, SubscriptionManager) Phase 1
3 Host gRPC Server, Security, Configuration, Service Hosting Phase 2
4 Host Health, Metrics, Status Server Phase 3
5 Client Core Phase 1
6 Client Extras (Builder, Factory, DI, Streaming) Phase 5
7 Integration Tests & Deployment Phases 4 + 6

Phases 2-4 (Host) and 5-6 (Client) can proceed in parallel after Phase 1.

10. Guardrails

  1. Proto is the source of truth — any wire format question is resolved by reading scada.proto, not the code-first contracts.
  2. No v1 code in the new build — reference only. Do not copy-paste and modify; write fresh.
  3. Cross-stack tests in Phase 1 — Host proto serialize → Client code-first deserialize (and vice versa) before any business logic.
  4. COM calls only on STA thread — no Task.Run for COM operations. All go through the dispatch queue.
  5. status_code is canonical for qualitysymbolic_name is always derived, never independently set.
  6. Unit tests before integration — every phase includes unit tests. Integration tests are Phase 7 only.
  7. Each phase must compile and pass tests before the next phase begins.
  8. No string serialization heuristics — v2 uses native TypedValue. No double.TryParse or bool.TryParse on values.

11. Resolved Conflicts

Conflict Resolution
WriteBatchAndWait signature (MxAccessClient vs Protocol) Follow Protocol spec: write items, poll flagTag for flagValue. IScadaClient interface matches protocol semantics.
Builder default port 5050 vs Host 50051 Standardize builder default to 50051
Auth in metadata vs payload x-api-key header is authoritative; ConnectRequest.api_key accepted but interceptor enforces

12. Reference Code

The existing code remains in src/ as src-reference/ for consultation:

  • src-reference/ZB.MOM.WW.LmxProxy.Host/ — v1 Host implementation
  • src-reference/ZB.MOM.WW.LmxProxy.Client/ — v1 Client implementation

Key reference files for COM interop patterns:

  • Implementation/MxAccessClient.Connection.cs — COM object lifecycle
  • Implementation/MxAccessClient.EventHandlers.cs — MxAccess callbacks
  • Implementation/MxAccessClient.Subscription.cs — Advise/Unadvise patterns