From 683aea0fbe436524ff2ac9c61732bb424a8c7648 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sat, 21 Mar 2026 22:38:11 -0400 Subject: [PATCH] docs: add LmxProxy requirements documentation with v2 protocol as authoritative design Generate high-level requirements and 10 component documents derived from source code and protocol specs. Uses lmxproxy_updates.md (v2 TypedValue/QualityCode) as the source of truth, with v1 string-based encoding documented as legacy context. Co-Authored-By: Claude Opus 4.6 (1M context) --- lmxproxy/CLAUDE.md | 71 +++++ .../docs/requirements/Component-Client.md | 200 ++++++++++++ .../requirements/Component-Configuration.md | 122 +++++++ .../docs/requirements/Component-GrpcServer.md | 86 +++++ .../Component-HealthAndMetrics.md | 121 +++++++ .../requirements/Component-MxAccessClient.md | 108 +++++++ .../docs/requirements/Component-Protocol.md | 301 ++++++++++++++++++ .../docs/requirements/Component-Security.md | 119 +++++++ .../requirements/Component-ServiceHost.md | 108 +++++++ .../requirements/Component-SessionManager.md | 76 +++++ .../Component-SubscriptionManager.md | 116 +++++++ lmxproxy/docs/requirements/HighLevelReqs.md | 274 ++++++++++++++++ 12 files changed, 1702 insertions(+) create mode 100644 lmxproxy/CLAUDE.md create mode 100644 lmxproxy/docs/requirements/Component-Client.md create mode 100644 lmxproxy/docs/requirements/Component-Configuration.md create mode 100644 lmxproxy/docs/requirements/Component-GrpcServer.md create mode 100644 lmxproxy/docs/requirements/Component-HealthAndMetrics.md create mode 100644 lmxproxy/docs/requirements/Component-MxAccessClient.md create mode 100644 lmxproxy/docs/requirements/Component-Protocol.md create mode 100644 lmxproxy/docs/requirements/Component-Security.md create mode 100644 lmxproxy/docs/requirements/Component-ServiceHost.md create mode 100644 lmxproxy/docs/requirements/Component-SessionManager.md create mode 100644 lmxproxy/docs/requirements/Component-SubscriptionManager.md create mode 100644 lmxproxy/docs/requirements/HighLevelReqs.md diff --git a/lmxproxy/CLAUDE.md b/lmxproxy/CLAUDE.md new file mode 100644 index 0000000..1017bdd --- /dev/null +++ b/lmxproxy/CLAUDE.md @@ -0,0 +1,71 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What This Is + +LmxProxy is a gRPC proxy that bridges ScadaLink's Data Connection Layer to AVEVA System Platform via the ArchestrA MXAccess COM API. It has two projects: + +- **Host** (`ZB.MOM.WW.LmxProxy.Host`) — .NET Framework 4.8, x86-only Windows service. Hosts a gRPC server (Grpc.Core) that fronts an MxAccessClient talking to ArchestrA MXAccess. Runs as a Windows service via Topshelf. +- **Client** (`ZB.MOM.WW.LmxProxy.Client`) — .NET 10, AnyCPU library. Code-first gRPC client (protobuf-net.Grpc) consumed by ScadaLink's DCL. This is a NuGet-packable library. + +The two projects use **different gRPC stacks**: Host uses proto-file-generated code (`Grpc.Core` + `Grpc.Tools`), Client uses code-first contracts (`protobuf-net.Grpc` with `[DataContract]`/`[ServiceContract]` attributes). They are wire-compatible because both target the same `scada.ScadaService` gRPC service. + +## Build Commands + +```bash +dotnet build ZB.MOM.WW.LmxProxy.slnx # Build entire solution +dotnet build src/ZB.MOM.WW.LmxProxy.Host # Host only (requires x86 platform) +dotnet build src/ZB.MOM.WW.LmxProxy.Client # Client only +``` + +The Host project requires the `ArchestrA.MXAccess.dll` COM interop assembly in `lib/`. It targets x86 exclusively (MXAccess is 32-bit COM). + +## Architecture + +### Host Service Startup Chain + +`Program.Main` → Topshelf `HostFactory` → `LmxProxyService.Start()` which: +1. Validates configuration (`appsettings.json` bound to `LmxProxyConfiguration`) +2. Creates `MxAccessClient` (the `IScadaClient` impl that wraps ArchestrA.MXAccess COM) +3. Connects to MxAccess synchronously at startup +4. Starts connection monitor loop (auto-reconnect) +5. Creates `SubscriptionManager`, `SessionManager`, `PerformanceMetrics`, `ApiKeyService` +6. Creates `ScadaGrpcService` (the proto-generated service impl) with all dependencies +7. Starts Grpc.Core `Server` on configured port (default 50051) +8. Starts HTTP status web server (default port 8080) + +### Key Host Components + +- `MxAccessClient` — Partial class split across 6 files (Connection, ReadWrite, Subscription, EventHandlers, NestedTypes, main). Wraps `LMXProxyServer` COM object. Uses semaphores for concurrency control. +- `ScadaGrpcService` — Inherits proto-generated `ScadaService.ScadaServiceBase`. All RPCs validate session first, then delegate to `IScadaClient`. Values are string-serialized on the wire (v1 protocol). +- `SessionManager` — Tracks client sessions by GUID. +- `SubscriptionManager` — Manages MxAccess subscriptions, fans out updates via `System.Threading.Channels`. +- `ApiKeyInterceptor` — gRPC server interceptor for API key validation. + +### Client Architecture + +- `ILmxProxyClient` — Public interface for consumers. Connect/Read/Write/Subscribe/Dispose. +- `LmxProxyClient` — Partial class split across multiple files (Connection, Subscription, Metrics, etc.). Uses `protobuf-net.Grpc` code-first contracts (`IScadaService` in `Domain/ScadaContracts.cs`). +- `LmxProxyClientBuilder` — Fluent builder for configuring client instances. +- `Domain/ScadaContracts.cs` — All gRPC message types as `[DataContract]` POCOs and the `IScadaService` interface with `[ServiceContract]`. +- Value conversion: Client parses string values from wire using double → bool → string heuristic in `ConvertToVtq()`. Writes use `.ToString()` via `ConvertToString()`. + +### Protocol + +Proto definition: `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Protos/scada.proto` + +Currently v1 protocol (string-encoded values, string quality). A v2 protocol spec exists in `docs/lmxproxy_updates.md` that introduces `TypedValue` (protobuf oneof) and `QualityCode` (OPC UA status codes) — not yet implemented. + +RPCs: Connect, Disconnect, GetConnectionState, Read, ReadBatch, Write, WriteBatch, WriteBatchAndWait, Subscribe (server streaming), CheckApiKey. + +### Configuration + +Host configured via `appsettings.json` bound to `LmxProxyConfiguration`. Key sections: GrpcPort, Connection (timeouts, auto-reconnect), Subscription (channel capacity), Tls, WebServer, Serilog, RetryPolicies, HealthCheck. + +## Important Constraints + +- Host **must** target x86 and .NET Framework 4.8 (ArchestrA.MXAccess is 32-bit COM interop). +- Host uses `Grpc.Core` (the deprecated C-core gRPC library), not `Grpc.Net`. This is required because .NET 4.8 doesn't support `Grpc.Net.Server`. +- Client uses `Grpc.Net.Client` and targets .NET 10 — it runs in the ScadaLink central/site clusters. +- The solution file is `.slnx` format (XML-based, not the older text format). diff --git a/lmxproxy/docs/requirements/Component-Client.md b/lmxproxy/docs/requirements/Component-Client.md new file mode 100644 index 0000000..9a2e5a5 --- /dev/null +++ b/lmxproxy/docs/requirements/Component-Client.md @@ -0,0 +1,200 @@ +# Component: Client + +## Purpose + +A .NET 10 class library providing a typed gRPC client for consuming the LmxProxy service. Used by ScadaLink's Data Connection Layer to connect to AVEVA System Platform via the LmxProxy Host. + +## Location + +`src/ZB.MOM.WW.LmxProxy.Client/` — all files in this project. + +Key files: +- `ILmxProxyClient.cs` — public interface. +- `LmxProxyClient.cs` — main implementation (partial class across multiple files). +- `LmxProxyClientBuilder.cs` — fluent builder for client construction. +- `ServiceCollectionExtensions.cs` — DI integration and options classes. +- `ILmxProxyClientFactory.cs` — factory interface and implementation. +- `StreamingExtensions.cs` — batch and parallel streaming helpers. +- `Domain/ScadaContracts.cs` — code-first gRPC contracts. +- `Security/GrpcChannelFactory.cs` — TLS channel creation. + +## Responsibilities + +- Connect to and communicate with the LmxProxy Host gRPC service. +- Manage session lifecycle (connect, keep-alive, disconnect). +- Execute read, write, and subscribe operations with retry and concurrency control. +- Provide a fluent builder and DI integration for configuration. +- Track client-side performance metrics. +- Support TLS and mutual TLS connections. + +## 1. Public Interface (ILmxProxyClient) + +| Method | Description | +|--------|-------------| +| `ConnectAsync(ct)` | Establish gRPC channel and session | +| `DisconnectAsync()` | Graceful disconnect | +| `IsConnectedAsync()` | Thread-safe connection state check | +| `ReadAsync(address, ct)` | Read single tag, returns Vtq | +| `ReadBatchAsync(addresses, ct)` | Read multiple tags, returns dictionary | +| `WriteAsync(address, value, ct)` | Write single tag value | +| `WriteBatchAsync(values, ct)` | Write multiple tag values | +| `SubscribeAsync(addresses, onUpdate, onStreamError, ct)` | Subscribe to tag updates with value and error callbacks | +| `GetMetrics()` | Return operation counts, errors, latency stats | +| `DefaultTimeout` | Configurable timeout (default 30s, range 1s–10min) | + +Implements `IDisposable` and `IAsyncDisposable`. + +## 2. Connection Management + +### 2.1 Connect + +`ConnectAsync()`: +1. Creates a gRPC channel via `GrpcChannelFactory` (HTTP or HTTPS based on TLS config). +2. Creates a `protobuf-net.Grpc` client for `IScadaService`. +3. Calls the `Connect` RPC with a client ID (format: `ScadaBridge-{guid}`) and optional API key. +4. Stores the returned session ID. +5. Starts the keep-alive timer. + +### 2.2 Keep-Alive + +- Timer-based ping every **30 seconds** (hardcoded). +- Sends a lightweight `GetConnectionState` RPC. +- On failure: stops the timer, marks disconnected, triggers subscription cleanup. + +### 2.3 Disconnect + +`DisconnectAsync()`: +1. Stops keep-alive timer. +2. Calls `Disconnect` RPC. +3. Clears session ID. +4. Disposes gRPC channel. + +### 2.4 Connection State + +`IsConnected` property: `!_disposed && _isConnected && !string.IsNullOrEmpty(_sessionId)`. + +## 3. Builder Pattern (LmxProxyClientBuilder) + +| Method | Default | Constraint | +|--------|---------|-----------| +| `WithHost(string)` | Required | Non-null/non-empty | +| `WithPort(int)` | 5050 | 1–65535 | +| `WithApiKey(string?)` | null | Optional | +| `WithTimeout(TimeSpan)` | 30 seconds | > 0 and ≤ 10 minutes | +| `WithLogger(ILogger)` | NullLogger | Optional | +| `WithSslCredentials(string?)` | Disabled | Optional cert path | +| `WithTlsConfiguration(ClientTlsConfiguration)` | null | Full TLS config | +| `WithRetryPolicy(int, TimeSpan)` | 3 attempts, 1s delay | maxAttempts > 0, delay > 0 | +| `WithMetrics()` | Disabled | Enables metric collection | +| `WithCorrelationIdHeader(string)` | null | Custom header name | + +## 4. Retry Policy + +Polly-based exponential backoff: +- Default: **3 attempts** with **1-second** initial delay. +- Backoff sequence: `delay * 2^(retryAttempt - 1)` → 1s, 2s, 4s. +- Transient errors retried: `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, `Aborted`. +- Each retry is logged with correlation ID at Warning level. + +## 5. Subscription + +### 5.1 Subscribe API + +`SubscribeAsync(addresses, onUpdate, onStreamError, ct)` returns an `ISubscription`: +- Calls the `Subscribe` RPC (server streaming) with the tag list and default sampling interval (**1000ms**). +- Processes streamed `VtqMessage` items asynchronously, invoking the `onUpdate(tag, vtq)` callback for each. +- On stream termination (server disconnect, gRPC error, or connection drop), invokes the `onStreamError` callback exactly once. +- On stream error, the client immediately nullifies its session ID, causing `IsConnected` to return `false`. This triggers the DCL adapter's `Disconnected` event and reconnection cycle. +- Errors are logged per-subscription. + +### 5.2 ISubscription + +- `Dispose()` — synchronous disposal with **5-second** timeout. +- Automatic callback on disposal for cleanup. + +## 6. DI Integration + +### 6.1 Service Collection Extensions + +| Method | Lifetime | Description | +|--------|----------|-------------| +| `AddLmxProxyClient(IConfiguration)` | Singleton | Bind `LmxProxy` config section | +| `AddLmxProxyClient(IConfiguration, string)` | Singleton | Bind named config section | +| `AddLmxProxyClient(Action)` | Singleton | Builder action | +| `AddScopedLmxProxyClient(IConfiguration)` | Scoped | Per-scope lifetime | +| `AddNamedLmxProxyClient(string, Action)` | Keyed singleton | Named/keyed registration | + +### 6.2 Configuration Options (LmxProxyClientOptions) + +Bound from `appsettings.json`: + +| Setting | Default | Description | +|---------|---------|-------------| +| Host | `localhost` | Server hostname | +| Port | 5050 | Server port | +| ApiKey | null | API key | +| Timeout | 30 seconds | Operation timeout | +| UseSsl | false | Enable TLS | +| CertificatePath | null | SSL certificate path | +| EnableMetrics | false | Enable client metrics | +| CorrelationIdHeader | null | Custom correlation header | +| Retry:MaxAttempts | 3 | Retry attempts | +| Retry:Delay | 1 second | Initial retry delay | + +### 6.3 Factory Pattern + +`ILmxProxyClientFactory` creates configured clients: +- `CreateClient()` — uses default `LmxProxy` config section. +- `CreateClient(string)` — uses named config section. +- `CreateClient(Action)` — uses builder action. + +Registered as singleton in DI. + +## 7. Streaming Extensions + +Helper methods for large-scale batch operations: + +| Method | Default Batch Size | Description | +|--------|--------------------|-------------| +| `ReadStreamAsync` | 100 | Batched reads, 2 retries per batch, stops after 3 consecutive errors. Returns `IAsyncEnumerable>`. | +| `WriteStreamAsync` | 100 | Batched writes from async enumerable input. Returns total count written. | +| `ProcessInParallelAsync` | — | Parallel processing with max concurrency of **4** (configurable). Semaphore-based rate limiting. | +| `SubscribeStreamAsync` | — | Wraps callback-based subscription into `IAsyncEnumerable` via `System.Threading.Channels`. | + +## 8. Client Metrics + +When metrics are enabled (`WithMetrics()`): +- Per-operation tracking: counts, error counts, latency. +- Rolling buffer of **1000** latency samples per operation (prevents memory growth). +- Snapshot via `GetMetrics()` returns: `{op}_count`, `{op}_errors`, `{op}_avg_latency_ms`, `{op}_p95_latency_ms`, `{op}_p99_latency_ms`. + +## 9. Value and Quality Handling + +### 9.1 Values (TypedValue) + +Read responses and subscription updates return values as `TypedValue` (protobuf oneof). The client extracts the value directly from the appropriate oneof field (e.g., `vtq.Value.DoubleValue`, `vtq.Value.BoolValue`). Write operations construct `TypedValue` with the correct oneof case for the value's native type. No string serialization or parsing is needed. + +### 9.2 Quality (QualityCode) + +Quality is received as a `QualityCode` message. Category checks use bitmask: `IsGood = (statusCode & 0xC0000000) == 0x00000000`, `IsBad = (statusCode & 0xC0000000) == 0x80000000`. The `symbolic_name` field provides human-readable quality for logging and display. + +### 9.3 Current Implementation (V1 Legacy) + +The current codebase still uses v1 string-based encoding. During v2 migration, the following will be removed: +- `ConvertToVtq()` — parses string values via heuristic (double → bool → null → raw string). +- `ConvertToString()` — serializes values via `.ToString()`. + +## Dependencies + +- **protobuf-net.Grpc** — code-first gRPC client. +- **Grpc.Net.Client** — HTTP/2 gRPC transport. +- **Polly** — retry policies. +- **Microsoft.Extensions.DependencyInjection** — DI integration. +- **Microsoft.Extensions.Configuration** — options binding. +- **Microsoft.Extensions.Logging** — logging abstraction. + +## Interactions + +- **ScadaLink Data Connection Layer** consumes the client library via `ILmxProxyClient`. +- **Protocol** — the client uses code-first contracts (`IScadaService`) that are wire-compatible with the Host's proto-generated service. +- **Security** — `GrpcChannelFactory` creates TLS-configured channels matching the Host's TLS configuration. diff --git a/lmxproxy/docs/requirements/Component-Configuration.md b/lmxproxy/docs/requirements/Component-Configuration.md new file mode 100644 index 0000000..66e2f07 --- /dev/null +++ b/lmxproxy/docs/requirements/Component-Configuration.md @@ -0,0 +1,122 @@ +# Component: Configuration + +## Purpose + +Defines the `appsettings.json` structure, configuration binding, and startup validation for the LmxProxy Host service. + +## Location + +- `src/ZB.MOM.WW.LmxProxy.Host/Configuration/LmxProxyConfiguration.cs` — root configuration class. +- `src/ZB.MOM.WW.LmxProxy.Host/Configuration/ConfigurationValidator.cs` — validation logic. +- `src/ZB.MOM.WW.LmxProxy.Host/appsettings.json` — default configuration file. + +## Responsibilities + +- Define all configurable settings as strongly-typed classes. +- Bind `appsettings.json` sections to configuration objects via `Microsoft.Extensions.Configuration`. +- Validate all settings at startup, failing fast on invalid values. +- Support environment variable overrides. + +## 1. Configuration Structure + +### 1.1 Root: LmxProxyConfiguration + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| GrpcPort | int | 50051 | gRPC server listen port | +| ApiKeyConfigFile | string | `apikeys.json` | Path to API key configuration file | +| Subscription | SubscriptionConfiguration | — | Subscription channel settings | +| ServiceRecovery | ServiceRecoveryConfiguration | — | Windows SCM recovery settings | +| Connection | ConnectionConfiguration | — | MxAccess connection settings | +| Tls | TlsConfiguration | — | TLS/SSL settings | +| WebServer | WebServerConfiguration | — | Status web server settings | + +### 1.2 ConnectionConfiguration + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| MonitorIntervalSeconds | int | 5 | Auto-reconnect check interval | +| ConnectionTimeoutSeconds | int | 30 | Initial connection timeout | +| ReadTimeoutSeconds | int | 5 | Per-read operation timeout | +| WriteTimeoutSeconds | int | 5 | Per-write operation timeout | +| MaxConcurrentOperations | int | 10 | Semaphore limit for concurrent MxAccess operations | +| AutoReconnect | bool | true | Enable auto-reconnect loop | +| NodeName | string? | null | MxAccess node name (optional) | +| GalaxyName | string? | null | MxAccess galaxy name (optional) | + +### 1.3 SubscriptionConfiguration + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| ChannelCapacity | int | 1000 | Per-client subscription buffer size | +| ChannelFullMode | string | `DropOldest` | Backpressure strategy: `DropOldest`, `DropNewest`, `Wait` | + +### 1.4 TlsConfiguration + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| Enabled | bool | false | Enable TLS on gRPC server | +| ServerCertificatePath | string | `certs/server.crt` | PEM server certificate | +| ServerKeyPath | string | `certs/server.key` | PEM server private key | +| ClientCaCertificatePath | string | `certs/ca.crt` | CA certificate for mTLS | +| RequireClientCertificate | bool | false | Require client certificates | +| CheckCertificateRevocation | bool | false | Enable CRL checking | + +### 1.5 WebServerConfiguration + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| Enabled | bool | true | Enable status web server | +| Port | int | 8080 | HTTP listen port | +| Prefix | string? | null | Custom URL prefix (defaults to `http://+:{Port}/`) | + +### 1.6 ServiceRecoveryConfiguration + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| FirstFailureDelayMinutes | int | 1 | Restart delay after first failure | +| SecondFailureDelayMinutes | int | 5 | Restart delay after second failure | +| SubsequentFailureDelayMinutes | int | 10 | Restart delay after subsequent failures | +| ResetPeriodDays | int | 1 | Days before failure count resets | + +## 2. Validation + +`ConfigurationValidator.ValidateAndLog()` runs at startup and checks: + +- **GrpcPort**: Must be 1–65535. +- **Connection**: All timeout values > 0. NodeName and GalaxyName ≤ 255 characters. +- **Subscription**: ChannelCapacity 0–100000. ChannelFullMode must be one of `DropOldest`, `DropNewest`, `Wait`. +- **ServiceRecovery**: All failure delay values ≥ 0. ResetPeriodDays > 0. +- **TLS**: If enabled, validates certificate file paths exist. + +Validation errors are logged and cause the service to throw `InvalidOperationException`, preventing startup. + +## 3. Configuration Sources + +Configuration is loaded via `Microsoft.Extensions.Configuration.ConfigurationBuilder`: +1. `appsettings.json` (required). +2. Environment variables (override any JSON setting). + +## 4. Serilog Configuration + +Logging is configured in the `Serilog` section of `appsettings.json`: + +| Setting | Value | +|---------|-------| +| Console sink | ANSI theme, custom template with HH:mm:ss timestamp | +| File sink | `logs/lmxproxy-.txt`, daily rolling, 30 files retained | +| Default level | Information | +| Override: Microsoft | Warning | +| Override: System | Warning | +| Override: Grpc | Information | +| Enrichment | FromLogContext, WithMachineName, WithThreadId | + +## Dependencies + +- **Microsoft.Extensions.Configuration** — configuration binding. +- **Serilog.Settings.Configuration** — Serilog configuration from appsettings. + +## Interactions + +- **ServiceHost** (Program.cs) loads and validates configuration at startup. +- All other components receive their settings from the bound configuration objects. diff --git a/lmxproxy/docs/requirements/Component-GrpcServer.md b/lmxproxy/docs/requirements/Component-GrpcServer.md new file mode 100644 index 0000000..4032dba --- /dev/null +++ b/lmxproxy/docs/requirements/Component-GrpcServer.md @@ -0,0 +1,86 @@ +# Component: GrpcServer + +## Purpose + +The gRPC service implementation that receives client RPCs, validates sessions, and delegates operations to the MxAccessClient. It is the network-facing entry point for all SCADA operations. + +## Location + +`src/ZB.MOM.WW.LmxProxy.Host/Grpc/ScadaGrpcService.cs` — inherits proto-generated `ScadaService.ScadaServiceBase`. + +## Responsibilities + +- Implement all 10 gRPC RPCs defined in `scada.proto`. +- Validate session IDs on all data operations before processing. +- Delegate read/write/subscribe operations to the MxAccessClient. +- Convert between gRPC message types and internal domain types (Vtq, Quality). +- Track operation timing and success/failure via PerformanceMetrics. +- Handle errors gracefully, returning structured error responses rather than throwing. + +## 1. RPC Implementations + +### 1.1 Connection Management + +- **Connect**: Creates a new session via SessionManager if MxAccess is connected. Returns the session ID (32-character hex GUID). Rejects if MxAccess is disconnected. +- **Disconnect**: Terminates the session via SessionManager. +- **GetConnectionState**: Returns `IsConnected`, `ClientId`, and `ConnectedSinceUtcTicks` from the MxAccessClient. + +### 1.2 Read Operations + +- **Read**: Validates session, applies Polly retry policy, calls MxAccessClient.ReadAsync(), returns VtqMessage. On invalid session, returns a VtqMessage with `Quality.Bad`. +- **ReadBatch**: Validates session, reads all tags via MxAccessClient.ReadBatchAsync() with semaphore-controlled concurrency (max 10 concurrent). Returns results in request order. Batch reads are partially successful — individual tags may have Bad quality (with current UTC timestamp) while the overall response succeeds. If a tag read throws an exception, its VTQ is returned with Bad quality. + +### 1.3 Write Operations + +- **Write**: Validates session, parses the string value using the type heuristic, calls MxAccessClient.WriteAsync(). +- **WriteBatch**: Validates session, writes all items in parallel via MxAccessClient with semaphore concurrency control. Returns per-item success/failure results. Overall `success` is `false` if any item fails (all-or-nothing at the reporting level). +- **WriteBatchAndWait**: Validates session, writes all items first. If any write fails, returns immediately with `success=false`. If writes succeed, polls `flag_tag` at `poll_interval_ms` intervals using type-aware `TypedValueEquals()` comparison (same oneof case required, native type equality, case-sensitive strings, null equals null only). Default timeout: 5000ms, default poll interval: 100ms. If flag matches before timeout: `success=true`, `flag_reached=true`. If timeout expires: `success=true`, `flag_reached=false` (timeout is not an error). Returns `flag_reached` boolean and `elapsed_ms`. + +### 1.4 Subscription + +- **Subscribe**: Validates session (throws `RpcException(Unauthenticated)` on invalid). Creates a subscription handle via SubscriptionManager. Streams VtqMessage items from the subscription channel to the client. Cleans up the subscription on stream cancellation or error. + +### 1.5 API Key Check + +- **CheckApiKey**: Returns validity and role information from the interceptor context. + +## 2. Value and Quality Handling + +### 2.1 Values (TypedValue) + +Read responses and subscription updates return values as `TypedValue` (protobuf oneof carrying native types). Write requests receive `TypedValue` and apply the value directly to MxAccess by its native type. If the `oneof` case doesn't match the tag's expected data type, the write returns `WriteResult` with `success=false` indicating type mismatch. No string serialization or parsing heuristics are used. + +### 2.2 Quality (QualityCode) + +Quality is returned as a `QualityCode` message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. The server maps MxAccess quality codes to OPC UA status codes per the quality table in Component-Protocol. Specific error scenarios return specific quality codes (e.g., tag not found → `BadConfigurationError`, comms loss → `BadCommunicationFailure`). + +### 2.3 Current Implementation (V1 Legacy) + +The current codebase still uses v1 string-based encoding. During v2 migration, the following v1 behavior will be removed: +- `ConvertValueToString()` — serializes values to strings (bool → lowercase, DateTime → ISO-8601, arrays → JSON, others → `.ToString()`). +- `ParseValue()` — parses string values in order: bool → int → long → double → DateTime → raw string. +- Three-state string quality mapping: ≥192 → `"Good"`, 64–191 → `"Uncertain"`, <64 → `"Bad"`. + +## 3. Error Handling + +- All RPC methods catch exceptions and return error responses with `success=false` and a descriptive message. Exceptions do not propagate as gRPC status codes (except Subscribe, which throws `RpcException` for invalid sessions). +- Each operation is wrapped in a PerformanceMetrics timing scope that records duration and success/failure. + +## 4. Session Validation + +- All data operations (Read, ReadBatch, Write, WriteBatch, WriteBatchAndWait, Subscribe) validate the session ID before processing. +- Invalid session on read/write operations returns a response with Bad quality VTQ. +- Invalid session on Subscribe throws `RpcException` with `StatusCode.Unauthenticated`. + +## Dependencies + +- **MxAccessClient** (IScadaClient) — all SCADA operations are delegated here. +- **SessionManager** — session creation, validation, and termination. +- **SubscriptionManager** — subscription lifecycle for the Subscribe RPC. +- **PerformanceMetrics** — operation timing and success/failure tracking. + +## Interactions + +- **ApiKeyInterceptor** intercepts all RPCs before they reach ScadaGrpcService, enforcing API key authentication and role-based write authorization. +- **SubscriptionManager** provides the channel that Subscribe streams from. +- **StatusReportService** reads PerformanceMetrics data that ScadaGrpcService populates. diff --git a/lmxproxy/docs/requirements/Component-HealthAndMetrics.md b/lmxproxy/docs/requirements/Component-HealthAndMetrics.md new file mode 100644 index 0000000..dd21ace --- /dev/null +++ b/lmxproxy/docs/requirements/Component-HealthAndMetrics.md @@ -0,0 +1,121 @@ +# Component: HealthAndMetrics + +## Purpose + +Provides health checking, performance metrics collection, and an HTTP status dashboard for monitoring the LmxProxy service. + +## Location + +- `src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs` — basic health check. +- `src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs` — detailed health check with test tag read. +- `src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs` — operation metrics collection. +- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs` — status report generation. +- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs` — HTTP status endpoint. + +## Responsibilities + +- Evaluate service health based on connection state, operation success rates, and test tag reads. +- Track per-operation performance metrics (counts, latencies, percentiles). +- Serve an HTML status dashboard and JSON/health HTTP endpoints. +- Report metrics to logs on a periodic interval. + +## 1. Health Checks + +### 1.1 Basic Health Check (HealthCheckService) + +`CheckHealthAsync()` evaluates: + +| Check | Healthy | Degraded | +|-------|---------|----------| +| MxAccess connected | Yes | — | +| Success rate (if > 100 total ops) | ≥ 50% | < 50% | +| Client count | ≤ 100 | > 100 | + +Returns health data dictionary: `scada_connected`, `scada_connection_state`, `total_clients`, `total_tags`, `total_operations`, `average_success_rate`. + +### 1.2 Detailed Health Check (DetailedHealthCheckService) + +`CheckHealthAsync()` performs an active probe: + +1. Checks `IsConnected` — returns **Unhealthy** if not connected. +2. Reads a test tag (default `System.Heartbeat`). +3. If test tag quality is not Good — returns **Degraded**. +4. If test tag timestamp is older than **5 minutes** — returns **Degraded** (stale data detection). +5. Otherwise returns **Healthy**. + +## 2. Performance Metrics + +### 2.1 Tracking + +`PerformanceMetrics` uses a `ConcurrentDictionary` to track operations by name. + +Operations tracked: `Read`, `ReadBatch`, `Write`, `WriteBatch` (recorded by ScadaGrpcService). + +### 2.2 Recording + +Two recording patterns: +- `RecordOperation(name, duration, success)` — explicit recording. +- `BeginOperation(name)` — returns an `ITimingScope` (disposable). On dispose, automatically records duration (via `Stopwatch`) and success flag (set via `SetSuccess(bool)`). + +### 2.3 Per-Operation Statistics + +`OperationMetrics` maintains: +- `_totalCount`, `_successCount` — running counters. +- `_totalMilliseconds`, `_minMilliseconds`, `_maxMilliseconds` — latency range. +- `_durations` — rolling buffer of up to **1000 latency samples** for percentile calculation. + +`MetricsStatistics` snapshot: +- `TotalCount`, `SuccessCount`, `SuccessRate` (percentage). +- `AverageMilliseconds`, `MinMilliseconds`, `MaxMilliseconds`. +- `Percentile95Milliseconds` — calculated from sorted samples at the 95th percentile index. + +### 2.4 Periodic Reporting + +A timer fires every **60 seconds**, logging a summary of all operation metrics to Serilog. + +## 3. Status Web Server + +### 3.1 Server + +`StatusWebServer` uses `HttpListener` on `http://+:{Port}/` (default port 8080). + +- Starts an async request-handling loop, spawning a task per request. +- Graceful shutdown: cancels the listener, waits **5 seconds** for the listener task to exit. +- Returns HTTP 405 for non-GET methods, HTTP 500 on errors. + +### 3.2 Endpoints + +| Endpoint | Method | Response | +|----------|--------|----------| +| `/` | GET | HTML dashboard (auto-refresh every 30 seconds) | +| `/api/status` | GET | JSON status report (camelCase) | +| `/api/health` | GET | Plain text `OK` (200) or `UNHEALTHY` (503) | + +### 3.3 HTML Dashboard + +Generated by `StatusReportService`: +- Bootstrap-like CSS grid layout with status cards. +- Color-coded status: green = Healthy, yellow = Degraded, red = Unhealthy/Error. +- Operations table with columns: Count, SuccessRate, Avg/Min/Max/P95 milliseconds. +- Service metadata: ServiceName, Version (assembly version), connection state. +- Subscription stats: TotalClients, TotalTags, ActiveSubscriptions. +- Auto-refresh via ``. +- Last updated timestamp. + +### 3.4 JSON Status Report + +Fully nested structure with camelCase property names: +- Service metadata, connection status, subscription stats, performance data, health check results. + +## Dependencies + +- **MxAccessClient** — `IsConnected`, `ConnectionState` for health checks; test tag read for detailed check. +- **SubscriptionManager** — subscription statistics. +- **PerformanceMetrics** — operation statistics for status report and health evaluation. +- **Configuration** — `WebServerConfiguration` for port and prefix. + +## Interactions + +- **GrpcServer** populates PerformanceMetrics via timing scopes on every RPC. +- **ServiceHost** creates all health/metrics/status components at startup and disposes them at shutdown. +- External monitoring systems can poll `/api/health` for availability checks. diff --git a/lmxproxy/docs/requirements/Component-MxAccessClient.md b/lmxproxy/docs/requirements/Component-MxAccessClient.md new file mode 100644 index 0000000..a5ed4ff --- /dev/null +++ b/lmxproxy/docs/requirements/Component-MxAccessClient.md @@ -0,0 +1,108 @@ +# Component: MxAccessClient + +## Purpose + +The core component that wraps the ArchestrA MXAccess COM API, providing connection management, tag read/write operations, and subscription-based value change notifications. This is the bridge between the gRPC service layer and AVEVA System Platform. + +## Location + +`src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.cs` — partial class split across 6 files: +- `MxAccessClient.cs` — Main class, properties, disposal, factory. +- `MxAccessClient.Connection.cs` — Connection lifecycle (connect, disconnect, reconnect, cleanup). +- `MxAccessClient.ReadWrite.cs` — Read and write operations with retry and concurrency control. +- `MxAccessClient.Subscription.cs` — Subscription management and stored subscription state. +- `MxAccessClient.EventHandlers.cs` — COM event handlers (OnDataChange, OnWriteComplete, OperationComplete). +- `MxAccessClient.NestedTypes.cs` — Internal types and enums. + +## Responsibilities + +- Manage the MXAccess COM object lifecycle (create, register, unregister, release). +- Maintain connection state (Disconnected, Connecting, Connected, Disconnecting, Error, Reconnecting) and fire state change events. +- Execute read and write operations against MXAccess with concurrency control via semaphores. +- Manage tag subscriptions via MXAccess advise callbacks and store subscription state for reconnection. +- Handle COM threading constraints (STA thread context via `Task.Run`). + +## 1. Connection Lifecycle + +### 1.1 Connect + +`ConnectAsync()` wraps `ConnectInternal()` in `Task.Run` for STA thread context: + +1. Validates not disposed. +2. Returns early if already connected. +3. Sets state to `Connecting`. +4. `InitializeMxAccessConnection()` — creates new `LMXProxyServer` COM object, wires event handlers (OnDataChange, OnWriteComplete, OperationComplete). +5. `RegisterWithMxAccess()` — calls `_lmxProxy.Register("ZB.MOM.WW.LmxProxy.Host")`, stores the returned connection handle. +6. Sets state to `Connected`. +7. On error, calls `Cleanup()` and re-throws. + +After successful connection, calls `RecreateStoredSubscriptionsAsync()` to restore any previously active subscriptions. + +### 1.2 Disconnect + +`DisconnectAsync()` wraps `DisconnectInternal()` in `Task.Run`: + +1. Checks `IsConnected`. +2. Sets state to `Disconnecting`. +3. `RemoveAllSubscriptions()` — unsubscribes all tags from MXAccess but retains subscription state in `_storedSubscriptions` for reconnection. +4. `UnregisterFromMxAccess()` — calls `_lmxProxy.Unregister(_connectionHandle)`. +5. `Cleanup()` — removes event handlers, calls `Marshal.ReleaseComObject(_lmxProxy)` to force-release all COM references, nulls the proxy and resets the connection handle. +6. Sets state to `Disconnected`. + +### 1.3 Connection State + +- `IsConnected` property: `_lmxProxy != null && _connectionState == Connected && _connectionHandle > 0`. +- `ConnectionState` enum: Disconnected, Connecting, Connected, Disconnecting, Error, Reconnecting. +- `ConnectionStateChanged` event fires on all state transitions with previous state, current state, and optional message. + +### 1.4 Auto-Reconnect + +When `AutoReconnect` is enabled (default), the `MonitorConnectionAsync` loop runs continuously: +- Checks `IsConnected` every `MonitorIntervalSeconds` (default 5 seconds). +- On disconnect, attempts reconnect via semaphore-protected `ConnectAsync()`. +- On failure, logs warning and retries at the next interval. +- Reconnection restores stored subscriptions automatically. + +## 2. Thread Safety & COM Constraints + +- State mutations protected by `lock (_lock)`. +- COM operations wrapped in `Task.Run` for STA thread context (MXAccess is 32-bit COM). +- Concurrency control: `_readSemaphore` and `_writeSemaphore` limit concurrent MXAccess operations to `MaxConcurrentOperations` (default 10, configurable). +- Default max concurrency constant: `DefaultMaxConcurrency = 10`. + +## 3. Read Operations + +- `ReadAsync(address, ct)` — Applies Polly retry policy, calls `ReadSingleValueAsync()`, returns `Vtq`. +- `ReadBatchAsync(addresses, ct)` — Creates parallel tasks per address via `ReadAddressWithSemaphoreAsync()`. Each task acquires `_readSemaphore` before reading. Returns `IReadOnlyDictionary`. + +## 4. Write Operations + +- `WriteAsync(address, value, ct)` — Applies Polly retry policy, calls `WriteInternalAsync(address, value, ct)`. +- `WriteBatchAsync(values, ct)` — Parallel tasks via `WriteAddressWithSemaphoreAsync()`. Each task acquires `_writeSemaphore` before writing. +- `WriteBatchAndWaitAsync(values, flagAddress, flagValue, responseAddress, responseValue, ct)` — Writes batch, writes flag, polls response tag until match. + +## 5. Subscription Management + +- Subscriptions stored in `_storedSubscriptions` for reconnection persistence. +- `SubscribeInternalAsync(addresses, callback, storeSubscription)` — registers tags with MXAccess and stores subscription state. +- `RecreateStoredSubscriptionsAsync()` — called after reconnect to re-subscribe all previously active tags without re-storing. +- `RemoveAllSubscriptions()` — unsubscribes from MXAccess but retains `_storedSubscriptions`. + +## 6. Event Handlers + +- **OnDataChange** — Fired by MXAccess when a subscribed tag value changes. Routes the update to the SubscriptionManager. +- **OnWriteComplete** — Fired when an async write operation completes. +- **OperationComplete** — General operation completion callback. + +## Dependencies + +- **ArchestrA.MXAccess** COM interop assembly (`lib/ArchestrA.MXAccess.dll`). +- **Polly** — retry policies for read/write operations. +- **Configuration** — `ConnectionConfiguration` for timeouts, concurrency limits, and auto-reconnect settings. + +## Interactions + +- **GrpcServer** (ScadaGrpcService) delegates all SCADA operations to MxAccessClient via the `IScadaClient` interface. +- **SubscriptionManager** receives value change callbacks originating from MxAccessClient's COM event handlers. +- **HealthAndMetrics** queries `IsConnected` and `ConnectionState` for health checks. +- **ServiceHost** manages the MxAccessClient lifecycle (create at startup, dispose at shutdown). diff --git a/lmxproxy/docs/requirements/Component-Protocol.md b/lmxproxy/docs/requirements/Component-Protocol.md new file mode 100644 index 0000000..7ffe69b --- /dev/null +++ b/lmxproxy/docs/requirements/Component-Protocol.md @@ -0,0 +1,301 @@ +# Component: Protocol + +## Purpose + +Defines the gRPC protocol specification for communication between the LmxProxy Client and Host, including the proto file definition, code-first contracts, message schemas, value type system, and quality codes. The authoritative specification is `docs/lmxproxy_updates.md`. + +## Location + +- `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Protos/scada.proto` — proto file (Host, proto-generated). +- `src/ZB.MOM.WW.LmxProxy.Client/Domain/ScadaContracts.cs` — code-first contracts (Client, protobuf-net.Grpc). +- `docs/lmxproxy_updates.md` — authoritative protocol specification. +- `docs/lmxproxy_protocol.md` — legacy v1 protocol documentation (superseded). + +## Responsibilities + +- Define the gRPC service interface (`scada.ScadaService`) and all message types. +- Ensure wire compatibility between the Host's proto-generated code and the Client's code-first contracts. +- Specify the VTQ data model: `TypedValue` for values, `QualityCode` for quality. +- Document OPC UA-aligned quality codes filtered to AVEVA System Platform usage. + +## 1. Service Definition + +Service: `scada.ScadaService` (gRPC package: `scada`) + +| RPC | Request | Response | Type | +|-----|---------|----------|------| +| Connect | ConnectRequest | ConnectResponse | Unary | +| Disconnect | DisconnectRequest | DisconnectResponse | Unary | +| GetConnectionState | GetConnectionStateRequest | GetConnectionStateResponse | Unary | +| Read | ReadRequest | ReadResponse | Unary | +| ReadBatch | ReadBatchRequest | ReadBatchResponse | Unary | +| Write | WriteRequest | WriteResponse | Unary | +| WriteBatch | WriteBatchRequest | WriteBatchResponse | Unary | +| WriteBatchAndWait | WriteBatchAndWaitRequest | WriteBatchAndWaitResponse | Unary | +| Subscribe | SubscribeRequest | stream VtqMessage | Server streaming | +| CheckApiKey | CheckApiKeyRequest | CheckApiKeyResponse | Unary | + +## 2. Value Type System (TypedValue) + +Values are transmitted in their native protobuf types via a `TypedValue` oneof. No string serialization or parsing heuristics are used. + +``` +TypedValue { + oneof value { + bool bool_value = 1 + int32 int32_value = 2 + int64 int64_value = 3 + float float_value = 4 + double double_value = 5 + string string_value = 6 + bytes bytes_value = 7 + int64 datetime_value = 8 // UTC DateTime.Ticks (100ns intervals since 0001-01-01) + ArrayValue array_value = 9 // typed arrays + } +} +``` + +`ArrayValue` contains typed repeated fields via oneof: `BoolArray`, `Int32Array`, `Int64Array`, `FloatArray`, `DoubleArray`, `StringArray`. Each contains a `repeated` field of the corresponding primitive. + +### 2.1 Null Handling + +- Null is represented by an unset `oneof` (no field selected in `TypedValue`). +- A null or missing VTQ message is treated as Bad quality with null value and current UTC timestamp. + +### 2.2 Type Mapping from Internal Tag Model + +| Tag Data Type | TypedValue Field | +|---------------|-----------------| +| `bool` | `bool_value` | +| `int32` | `int32_value` | +| `int64` | `int64_value` | +| `float` | `float_value` | +| `double` | `double_value` | +| `string` | `string_value` | +| `byte[]` | `bytes_value` | +| `DateTime` | `datetime_value` (UTC Ticks as int64) | +| `float[]` | `array_value.float_values` | +| `int32[]` | `array_value.int32_values` | +| Other arrays | Corresponding `ArrayValue` field | + +## 3. Quality System (QualityCode) + +Quality is a structured message with an OPC UA-compatible numeric status code and a human-readable symbolic name: + +``` +QualityCode { + uint32 status_code = 1 // OPC UA-compatible numeric status code + string symbolic_name = 2 // Human-readable name (e.g., "Good", "BadSensorFailure") +} +``` + +### 3.1 Category Extraction + +Category derived from high bits via `(statusCode & 0xC0000000)`: +- `0x00000000` = Good +- `0x40000000` = Uncertain +- `0x80000000` = Bad + +```csharp +public static bool IsGood(uint statusCode) => (statusCode & 0xC0000000) == 0x00000000; +public static bool IsBad(uint statusCode) => (statusCode & 0xC0000000) == 0x80000000; +``` + +### 3.2 Supported Quality Codes + +Filtered to codes actively used by AVEVA System Platform, InTouch, and OI Server/DAServer (per AVEVA Tech Note TN1305): + +**Good Quality:** + +| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description | +|--------------|-------------------|------------------|-------------| +| `Good` | `0x00000000` | `0x00C0` | Value is reliable, non-specific | +| `GoodLocalOverride` | `0x00D80000` | `0x00D8` | Manually overridden; input disconnected | + +**Uncertain Quality:** + +| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description | +|--------------|-------------------|------------------|-------------| +| `UncertainLastUsableValue` | `0x40900000` | `0x0044` | External source stopped writing; value is stale | +| `UncertainSensorNotAccurate` | `0x42390000` | `0x0050` | Sensor out of calibration or clamped | +| `UncertainEngineeringUnitsExceeded` | `0x40540000` | `0x0054` | Outside defined engineering limits | +| `UncertainSubNormal` | `0x40580000` | `0x0058` | Derived from insufficient good sources | + +**Bad Quality:** + +| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description | +|--------------|-------------------|------------------|-------------| +| `Bad` | `0x80000000` | `0x0000` | Non-specific bad; value not useful | +| `BadConfigurationError` | `0x80040000` | `0x0004` | Server config problem (e.g., item deleted) | +| `BadNotConnected` | `0x808A0000` | `0x0008` | Input not logically connected to source | +| `BadDeviceFailure` | `0x806B0000` | `0x000C` | Device failure detected | +| `BadSensorFailure` | `0x806D0000` | `0x0010` | Sensor failure detected | +| `BadLastKnownValue` | `0x80050000` | `0x0014` | Comm failed; last known value available | +| `BadCommunicationFailure` | `0x80050000` | `0x0018` | Comm failed; no last known value | +| `BadOutOfService` | `0x808F0000` | `0x001C` | Block off-scan/locked; item inactive | +| `BadWaitingForInitialData` | `0x80320000` | — | Initializing; OI Server establishing communication | + +**Notes:** +- AVEVA OPC DA quality codes use a 16-bit structure: 2 bits major (Good/Bad/Uncertain), 4 bits minor (sub-status), 2 bits limit (Not Limited, Low, High, Constant). The OPC UA status codes above are the standard UA equivalents. +- The limit bits are appended to any quality code. For example, `Good + High Limited` = `0x00C2` in OPC DA. In OPC UA, limits are conveyed via separate status code bits but the base code remains the same. + +### 3.3 Error Condition Mapping + +| Scenario | Quality | +|----------|---------| +| Normal read | `Good` (`0x00000000`) | +| Tag not found | `BadConfigurationError` (`0x80040000`) | +| Tag read exception / comms loss | `BadCommunicationFailure` (`0x80050000`) | +| Sensor failure | `BadSensorFailure` (`0x806D0000`) | +| Device failure | `BadDeviceFailure` (`0x806B0000`) | +| Stale value | `UncertainLastUsableValue` (`0x40900000`) | +| Block off-scan / disabled | `BadOutOfService` (`0x808F0000`) | +| Local override active | `GoodLocalOverride` (`0x00D80000`) | +| Initializing / waiting for first value | `BadWaitingForInitialData` (`0x80320000`) | +| Write to read-only tag | `WriteResult.success=false`, message indicates read-only | +| Type mismatch on write | `WriteResult.success=false`, message indicates type mismatch | + +## 4. Message Schemas + +### 4.1 VtqMessage + +The core data type for tag value transport: + +| Field | Proto Type | Order | Description | +|-------|-----------|-------|-------------| +| tag | string | 1 | Tag address | +| value | TypedValue | 2 | Typed value (native protobuf types) | +| timestamp_utc_ticks | int64 | 3 | UTC DateTime.Ticks (100ns intervals since 0001-01-01) | +| quality | QualityCode | 4 | Structured quality with status code and symbolic name | + +A null or missing VTQ message is treated as Bad quality with null value and current UTC timestamp. + +### 4.2 Connection Messages + +**ConnectRequest**: `client_id` (string), `api_key` (string) +**ConnectResponse**: `success` (bool), `message` (string), `session_id` (string — 32-char hex GUID) + +**DisconnectRequest**: `session_id` (string) +**DisconnectResponse**: `success` (bool), `message` (string) + +**GetConnectionStateRequest**: `session_id` (string) +**GetConnectionStateResponse**: `is_connected` (bool), `client_id` (string), `connected_since_utc_ticks` (int64) + +### 4.3 Read Messages + +**ReadRequest**: `session_id` (string), `tag` (string) +**ReadResponse**: `success` (bool), `message` (string), `vtq` (VtqMessage) + +**ReadBatchRequest**: `session_id` (string), `tags` (repeated string) +**ReadBatchResponse**: `success` (bool), `message` (string), `vtqs` (repeated VtqMessage) + +### 4.4 Write Messages + +**WriteRequest**: `session_id` (string), `tag` (string), `value` (TypedValue) +**WriteResponse**: `success` (bool), `message` (string) + +**WriteItem**: `tag` (string), `value` (TypedValue) +**WriteResult**: `tag` (string), `success` (bool), `message` (string) + +**WriteBatchRequest**: `session_id` (string), `items` (repeated WriteItem) +**WriteBatchResponse**: `success` (bool), `message` (string), `results` (repeated WriteResult) + +### 4.5 WriteBatchAndWait Messages + +**WriteBatchAndWaitRequest**: +- `session_id` (string) +- `items` (repeated WriteItem) — values to write +- `flag_tag` (string) — tag to poll after writes +- `flag_value` (TypedValue) — expected value (type-aware comparison) +- `timeout_ms` (int32) — max wait time (default 5000ms if ≤ 0) +- `poll_interval_ms` (int32) — polling interval (default 100ms if ≤ 0) + +**WriteBatchAndWaitResponse**: +- `success` (bool) +- `message` (string) +- `write_results` (repeated WriteResult) +- `flag_reached` (bool) — whether the flag value was matched +- `elapsed_ms` (int32) — total elapsed time + +**Behavior:** +1. All writes execute first. If any write fails, returns immediately with `success=false`. +2. If writes succeed, polls `flag_tag` at `poll_interval_ms` intervals. +3. Uses type-aware `TypedValueEquals()` comparison (see Section 4.5.1). +4. If flag matches before timeout: `success=true`, `flag_reached=true`. +5. If timeout expires: `success=true`, `flag_reached=false` (timeout is not an error). + +#### 4.5.1 Flag Comparison Rules + +Type-aware comparison via `TypedValueEquals()`: +- Both values must have the same `oneof` case (same type). Mismatched types are never equal. +- Numeric comparison uses the native type's equality (no floating-point string round-trip issues). +- String comparison is case-sensitive. +- Bool comparison is direct equality. +- Null (unset `oneof`) equals null. Null does not equal any set value. +- Array comparison: element-by-element equality, same length required. +- `datetime_value` compared as `int64` equality (tick-level precision). + +### 4.6 Subscription Messages + +**SubscribeRequest**: `session_id` (string), `tags` (repeated string), `sampling_ms` (int32) +Response: streamed `VtqMessage` items. + +### 4.7 API Key Messages + +**CheckApiKeyRequest**: `api_key` (string) +**CheckApiKeyResponse**: `is_valid` (bool), `message` (string) + +## 5. Dual gRPC Stack Compatibility + +The Host and Client use different gRPC implementations: + +| Aspect | Host | Client | +|--------|------|--------| +| Stack | Grpc.Core (C-core) | Grpc.Net.Client | +| Contract | Proto file (`scada.proto`) + Grpc.Tools codegen | Code-first (`[ServiceContract]`, `[DataContract]`) via protobuf-net.Grpc | +| Runtime | .NET Framework 4.8 | .NET 10 | + +Both target `scada.ScadaService` and produce identical wire format. Field ordering in `[DataMember(Order = N)]` matches proto field numbers. + +## 6. V1 Legacy Protocol + +The current codebase implements the v1 protocol. The following describes v1 behavior that will be replaced during migration to v2. + +### 6.1 V1 Value Encoding + +All values transmitted as strings: +- Write direction: server parses string values in order: bool → int → long → double → DateTime → raw string. +- Read direction: server serializes via `.ToString()` (bool → lowercase, DateTime → ISO-8601, arrays → JSON). +- Client parses: double → bool → null (empty string) → raw string. + +### 6.2 V1 Quality + +Three-state string quality (`"Good"`, `"Uncertain"`, `"Bad"`, case-insensitive). OPC UA numeric ranges: ≥192 = Good, 64–191 = Uncertain, <64 = Bad. + +### 6.3 V1 → V2 Field Changes + +| Message | Field | V1 Type | V2 Type | +|---------|-------|---------|---------| +| VtqMessage | value | string | TypedValue | +| VtqMessage | quality | string | QualityCode | +| WriteRequest | value | string | TypedValue | +| WriteItem | value | string | TypedValue | +| WriteBatchAndWaitRequest | flag_value | string | TypedValue | + +All RPC signatures remain unchanged. Only value and quality fields change type. + +### 6.4 Migration Strategy + +Clean break — no backward compatibility layer. All clients and servers updated simultaneously. This is appropriate because LmxProxy is an internal protocol with a small, controlled client count. Dual-format support adds complexity with no long-term benefit. + +## Dependencies + +- **Grpc.Core** + **Grpc.Tools** — proto compilation and server hosting (Host). +- **protobuf-net.Grpc** — code-first contracts (Client). +- **Grpc.Net.Client** — HTTP/2 transport (Client). + +## Interactions + +- **GrpcServer** implements the service defined by this protocol. +- **Client** consumes the service defined by this protocol. +- **MxAccessClient** is the backend that executes the operations requested via the protocol. diff --git a/lmxproxy/docs/requirements/Component-Security.md b/lmxproxy/docs/requirements/Component-Security.md new file mode 100644 index 0000000..7371791 --- /dev/null +++ b/lmxproxy/docs/requirements/Component-Security.md @@ -0,0 +1,119 @@ +# Component: Security + +## Purpose + +Provides API key-based authentication and role-based authorization for the gRPC service, along with TLS certificate management for transport security. + +## Location + +- `src/ZB.MOM.WW.LmxProxy.Host/Security/ApiKeyService.cs` — API key storage and validation. +- `src/ZB.MOM.WW.LmxProxy.Host/Security/ApiKeyInterceptor.cs` — gRPC server interceptor for authentication/authorization. +- `src/ZB.MOM.WW.LmxProxy.Client/Security/GrpcChannelFactory.cs` — Client-side TLS channel factory. + +## Responsibilities + +- Load and hot-reload API keys from a JSON configuration file. +- Validate API keys on every gRPC request via a server interceptor. +- Enforce role-based access control (ReadOnly vs ReadWrite). +- Manage TLS certificates for server and optional mutual TLS. + +## 1. API Key Service + +### 1.1 Key Storage + +- Keys are stored in a JSON file (default `apikeys.json`). +- File format: `{ "ApiKeys": [{ "Key": "...", "Description": "...", "Role": "ReadOnly|ReadWrite", "Enabled": true|false }] }`. +- If the file does not exist at startup, the service auto-generates a default file with two random keys: one ReadOnly and one ReadWrite. + +### 1.2 Hot Reload + +- A `FileSystemWatcher` monitors the API key file for changes. +- Rapid changes are debounced (1-second minimum between reloads). +- `ReloadConfigurationAsync` uses a `SemaphoreSlim` to serialize reload operations. +- New and modified keys take effect on the next request. Removed or disabled keys reject future requests immediately. +- Active sessions are not affected by key changes — sessions are tracked independently by SessionManager. + +### 1.3 Validation + +- `ValidateApiKey(apiKey)` — Returns the `ApiKey` object if the key exists and `Enabled` is true, otherwise null. +- `HasRole(apiKey, requiredRole)` — Returns true if the key has the required role. Role hierarchy: ReadWrite implies ReadOnly. + +## 2. API Key Interceptor + +### 2.1 Authentication Flow + +The `ApiKeyInterceptor` intercepts every unary and server-streaming RPC: + +1. Extracts the `x-api-key` header from gRPC request metadata. +2. Calls `ApiKeyService.ValidateApiKey()`. +3. If the key is invalid or missing, returns `StatusCode.Unauthenticated`. +4. For write-protected methods (`Write`, `WriteBatch`, `WriteBatchAndWait`), checks that the key has the `ReadWrite` role. Returns `StatusCode.PermissionDenied` if the key is `ReadOnly`. +5. Adds the validated `ApiKey` to `context.UserState["ApiKey"]` for downstream use. +6. Continues to the service method. + +### 2.2 Write-Protected Methods + +These RPCs require the `ReadWrite` role: +- `Write` +- `WriteBatch` +- `WriteBatchAndWait` + +All other RPCs (`Connect`, `Disconnect`, `GetConnectionState`, `Read`, `ReadBatch`, `Subscribe`, `CheckApiKey`) are allowed for `ReadOnly` keys. + +## 3. API Key Model + +| Field | Type | Description | +|-------|------|-------------| +| Key | string | The secret API key value | +| Description | string | Human-readable name for the key | +| Role | ApiKeyRole | `ReadOnly` or `ReadWrite` | +| Enabled | bool | Whether the key is active | + +`ApiKeyRole` enum: `ReadOnly` (read and subscribe only), `ReadWrite` (full access including writes). + +## 4. TLS Configuration + +### 4.1 Server-Side (Host) + +Configured via `TlsConfiguration` in `appsettings.json`: + +| Setting | Default | Description | +|---------|---------|-------------| +| Enabled | false | Enable TLS on the gRPC server | +| ServerCertificatePath | `certs/server.crt` | PEM server certificate | +| ServerKeyPath | `certs/server.key` | PEM server private key | +| ClientCaCertificatePath | `certs/ca.crt` | CA certificate for mTLS client validation | +| RequireClientCertificate | false | Require client certificates (mutual TLS) | +| CheckCertificateRevocation | false | Check certificate revocation lists | + +If TLS is enabled but certificates are missing, the service generates self-signed certificates at startup. + +### 4.2 Client-Side + +`ClientTlsConfiguration` in the client library: + +| Setting | Default | Description | +|---------|---------|-------------| +| UseTls | false | Enable TLS on the client connection | +| ClientCertificatePath | null | Client certificate for mTLS | +| ClientKeyPath | null | Client private key for mTLS | +| ServerCaCertificatePath | null | Custom CA for server validation | +| ServerNameOverride | null | SNI/hostname override | +| ValidateServerCertificate | true | Validate the server certificate chain | +| AllowSelfSignedCertificates | false | Accept self-signed server certificates | +| IgnoreAllCertificateErrors | false | Skip all certificate validation (dangerous) | + +- SSL protocols: TLS 1.2 and TLS 1.3. +- Client certificates loaded from PEM files and converted to PKCS12. +- Custom CA trust store support via chain building. + +## Dependencies + +- **Configuration** — TLS settings and API key file path from `appsettings.json`. +- **System.IO.FileSystemWatcher** — API key file change detection. + +## Interactions + +- **GrpcServer** — the ApiKeyInterceptor runs before every RPC in ScadaGrpcService. +- **ServiceHost** — creates ApiKeyService and ApiKeyInterceptor at startup, configures gRPC server credentials. +- **Client** — GrpcChannelFactory creates TLS-configured gRPC channels in LmxProxyClient. diff --git a/lmxproxy/docs/requirements/Component-ServiceHost.md b/lmxproxy/docs/requirements/Component-ServiceHost.md new file mode 100644 index 0000000..634b12c --- /dev/null +++ b/lmxproxy/docs/requirements/Component-ServiceHost.md @@ -0,0 +1,108 @@ +# Component: ServiceHost + +## Purpose + +The entry point and lifecycle manager for the LmxProxy Windows service. Handles Topshelf service hosting, Serilog logging setup, component initialization/teardown ordering, and Windows SCM service recovery configuration. + +## Location + +- `src/ZB.MOM.WW.LmxProxy.Host/Program.cs` — entry point, Serilog setup, Topshelf configuration. +- `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs` — service lifecycle (Start, Stop, Pause, Continue, Shutdown). + +## Responsibilities + +- Configure and launch the Topshelf Windows service. +- Load and validate configuration from `appsettings.json`. +- Initialize Serilog logging. +- Orchestrate service startup: create all components in dependency order, connect to MxAccess, start servers. +- Orchestrate service shutdown: stop servers, dispose all components in reverse order. +- Configure Windows SCM service recovery policies. + +## 1. Entry Point (Program.cs) + +1. Builds configuration from `appsettings.json` + environment variables via `ConfigurationBuilder`. +2. Configures Serilog from the `Serilog` section of appsettings (console + file sinks). +3. Validates configuration using `ConfigurationValidator.ValidateAndLog()`. +4. Configures Topshelf `HostFactory`: + - Service name: `ZB.MOM.WW.LmxProxy.Host` + - Display name: `SCADA Bridge LMX Proxy` + - Start automatically on boot. + - Service recovery: first failure 1 min, second 5 min, subsequent 10 min, reset period 1 day. +5. Runs the Topshelf host (blocks until service stops). + +## 2. Service Lifecycle (LmxProxyService) + +### 2.1 Startup Sequence (Start) + +Components are created and started in dependency order: + +1. Validate configuration. +2. Check/generate TLS certificates (if TLS enabled). +3. Create `PerformanceMetrics`. +4. Create `ApiKeyService` — loads API keys from file. +5. Create `MxAccessClient` via factory. +6. Subscribe to connection state changes. +7. Connect to MxAccess synchronously — times out at `ConnectionTimeoutSeconds` (default 30s). +8. Start `MonitorConnectionAsync` (if `AutoReconnect` enabled). +9. Create `SubscriptionManager`. +10. Create `SessionManager`. +11. Create `HealthCheckService` + `DetailedHealthCheckService`. +12. Create `StatusReportService` + `StatusWebServer`. +13. Create `ScadaGrpcService`. +14. Create `ApiKeyInterceptor`. +15. Configure gRPC `Server` with TLS or insecure credentials. +16. Start gRPC server on `0.0.0.0:{GrpcPort}`. +17. Start `StatusWebServer`. + +### 2.2 Shutdown Sequence (Stop) + +Components are stopped and disposed in reverse order: + +1. Cancel reconnect monitor — wait **5 seconds** for exit. +2. Graceful gRPC server shutdown — **10-second** timeout, then kill. +3. Stop StatusWebServer — **5-second** wait. +4. Dispose all components in reverse creation order. +5. Disconnect from MxAccess — **10-second** timeout. + +### 2.3 Other Lifecycle Events + +- **Pause**: Supported by Topshelf but behavior is a no-op beyond logging. +- **Continue**: Resume from pause, no-op beyond logging. +- **Shutdown**: System shutdown signal, triggers the same shutdown sequence as Stop. + +## 3. Service Recovery (Windows SCM) + +Configured via Topshelf's `EnableServiceRecovery`: + +| Failure | Action | Delay | +|---------|--------|-------| +| First | Restart service | 1 minute | +| Second | Restart service | 5 minutes | +| Subsequent | Restart service | 10 minutes | +| Reset period | — | 1 day | + +All values are configurable via `ServiceRecoveryConfiguration`. + +## 4. Service Identity + +| Property | Value | +|----------|-------| +| Service name | `ZB.MOM.WW.LmxProxy.Host` | +| Display name | `SCADA Bridge LMX Proxy` | +| Start mode | Automatic | +| Platform | x86 (.NET Framework 4.8) | +| Framework | Topshelf | + +## Dependencies + +- **Topshelf** — Windows service framework. +- **Serilog** — structured logging (console + file sinks). +- **Microsoft.Extensions.Configuration** — configuration loading. +- **Configuration** — validated configuration objects. +- All other components are created and managed by LmxProxyService. + +## Interactions + +- **Configuration** is loaded and validated first; all other components receive their settings from it. +- **MxAccessClient** is connected synchronously during startup. If connection fails within the timeout, the service fails to start. +- **GrpcServer** and **StatusWebServer** are started last, after all dependencies are ready. diff --git a/lmxproxy/docs/requirements/Component-SessionManager.md b/lmxproxy/docs/requirements/Component-SessionManager.md new file mode 100644 index 0000000..c984926 --- /dev/null +++ b/lmxproxy/docs/requirements/Component-SessionManager.md @@ -0,0 +1,76 @@ +# Component: SessionManager + +## Purpose + +Tracks active client sessions, mapping session IDs to client metadata. Provides session creation, validation, and termination for the gRPC service layer. + +## Location + +`src/ZB.MOM.WW.LmxProxy.Host/Sessions/SessionManager.cs` + +## Responsibilities + +- Create new sessions with unique identifiers when clients connect. +- Validate session IDs on every data operation. +- Track session metadata (client ID, API key, connection time, last activity). +- Terminate sessions on client disconnect. +- Provide session listing for monitoring and status reporting. + +## 1. Session Storage + +- Sessions are stored in a `ConcurrentDictionary` (lock-free, thread-safe). +- Session state is in-memory only — all sessions are lost on service restart. +- `ActiveSessionCount` property returns the current count of tracked sessions. + +## 2. Session Lifecycle + +### 2.1 Creation + +`CreateSession(clientId, apiKey)`: +- Generates a unique session ID: `Guid.NewGuid().ToString("N")` (32-character lowercase hex string, no hyphens). +- Creates a `SessionInfo` record with `ConnectedAt` and `LastActivity` set to `DateTime.UtcNow`. +- Stores the session in the dictionary. +- Returns the session ID to the client. + +### 2.2 Validation + +`ValidateSession(sessionId)`: +- Looks up the session ID in the dictionary. +- If found, updates `LastActivity` to `DateTime.UtcNow` and returns `true`. +- If not found, returns `false`. + +### 2.3 Termination + +`TerminateSession(sessionId)`: +- Removes the session from the dictionary. +- Returns `true` if the session existed, `false` otherwise. + +### 2.4 Query + +- `GetSession(sessionId)` — Returns `SessionInfo` or `null` if not found. +- `GetAllSessions()` — Returns `IReadOnlyList` snapshot of all active sessions. + +## 3. SessionInfo + +| Field | Type | Description | +|-------|------|-------------| +| SessionId | string | 32-character hex GUID | +| ClientId | string | Client-provided identifier | +| ApiKey | string | API key used for authentication | +| ConnectedAt | DateTime | UTC time of session creation | +| LastActivity | DateTime | UTC time of last operation (updated on each validation) | +| ConnectedSinceUtcTicks | long | `ConnectedAt.Ticks` for gRPC response serialization | + +## 4. Disposal + +`Dispose()` clears all sessions from the dictionary. No notifications are sent to connected clients. + +## Dependencies + +None. SessionManager is a standalone in-memory store with no external dependencies. + +## Interactions + +- **GrpcServer** calls `CreateSession` on Connect, `ValidateSession` on every data operation, and `TerminateSession` on Disconnect. +- **HealthAndMetrics** reads `ActiveSessionCount` for health check data. +- **StatusReportService** reads session information for the status dashboard. diff --git a/lmxproxy/docs/requirements/Component-SubscriptionManager.md b/lmxproxy/docs/requirements/Component-SubscriptionManager.md new file mode 100644 index 0000000..e06fe28 --- /dev/null +++ b/lmxproxy/docs/requirements/Component-SubscriptionManager.md @@ -0,0 +1,116 @@ +# Component: SubscriptionManager + +## Purpose + +Manages the lifecycle of tag value subscriptions, multiplexing multiple client subscriptions onto shared MXAccess tag subscriptions and delivering updates via per-client bounded channels with configurable backpressure. + +## Location + +`src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs` + +## Responsibilities + +- Create per-client subscription channels with bounded capacity. +- Share underlying MXAccess tag subscriptions across multiple clients subscribing to the same tags. +- Deliver tag value updates from MXAccess callbacks to all subscribed clients. +- Handle backpressure when client channels are full (DropOldest, DropNewest, or Wait). +- Clean up subscriptions on client disconnect. +- Notify all subscribed clients with bad quality when MXAccess disconnects. + +## 1. Architecture + +### 1.1 Per-Client Channels + +Each subscribing client gets a bounded `System.Threading.Channel<(string address, Vtq vtq)>`: +- Capacity: configurable (default 1000 messages). +- Full mode: configurable (default `DropOldest`). +- `SingleReader = true`, `SingleWriter = false`. + +### 1.2 Shared Tag Subscriptions + +Tag subscriptions to MXAccess are shared across clients: +- When the first client subscribes to a tag, a new MXAccess subscription is created. +- When additional clients subscribe to the same tag, they are added to the existing tag subscription's client set. +- When the last client unsubscribes from a tag, the MXAccess subscription is disposed. + +### 1.3 Thread Safety + +- `ReaderWriterLockSlim` protects tag subscription updates. +- `ConcurrentDictionary` for client subscription tracking. + +## 2. Subscription Flow + +### 2.1 Subscribe + +`SubscribeAsync(clientId, addresses, ct)`: + +1. Creates a bounded channel with configured capacity and full mode. +2. Creates a `ClientSubscription` record (clientId, channel, address set, CancellationTokenSource, counters). +3. For each tag address: + - If the tag already has a subscription, adds the client to the existing `TagSubscription.clientIds` set. + - Otherwise, creates a new `TagSubscription` and calls `_scadaClient.SubscribeAsync()` to register with MXAccess (outside the lock to avoid blocking). +4. Registers a cancellation token callback to automatically call `UnsubscribeClient` on disconnect. +5. Returns the channel reader for the GrpcServer to stream from. + +### 2.2 Value Updates + +`OnTagValueChanged(address, Vtq)` — called from MxAccessClient's COM event handler: + +1. Looks up the tag subscription to find all subscribed clients. +2. For each client, calls `channel.Writer.TryWrite((address, vtq))`. +3. If the channel is full: + - **DropOldest**: Logs a warning, increments `DroppedMessageCount`. The oldest message is automatically discarded by the channel. + - **DropNewest**: Drops the incoming message. + - **Wait**: Blocks the writer until space is available (not recommended for gRPC streaming). +4. On channel closed (client disconnected), schedules `UnsubscribeClient` cleanup. + +### 2.3 Unsubscribe + +`UnsubscribeClient(clientId)`: + +1. Removes the client from the client dictionary. +2. For each tag the client was subscribed to, removes the client from the tag's subscriber set. +3. If a tag has no remaining subscribers, disposes the MXAccess subscription handle. +4. Completes the client's channel writer (signals end of stream). + +## 3. Backpressure + +| Mode | Behavior | Use Case | +|------|----------|----------| +| DropOldest | Silently discards oldest message when channel is full | Default. Fire-and-forget semantics. No client blocking. | +| DropNewest | Drops the incoming message when channel is full | Preserves history, drops latest updates. | +| Wait | Blocks the writer until space is available | Not recommended for gRPC streaming (blocks callback thread). | + +Per-client statistics track `DeliveredMessageCount` and `DroppedMessageCount` for monitoring via the status dashboard. + +## 4. Disconnection Handling + +### 4.1 Client Disconnect + +When a client's gRPC stream ends (cancellation or error), the cancellation token callback triggers `UnsubscribeClient`, which cleans up all tag subscriptions for that client. + +### 4.2 MxAccess Disconnect + +`OnConnectionStateChanged` — when the MxAccess connection drops: +- Sends a bad-quality Vtq to all subscribed clients via their channels. +- Each client receives an async notification of the connection loss. +- Tag subscriptions are retained in memory for reconnection (via MxAccessClient's `_storedSubscriptions`). + +## 5. Statistics + +`GetSubscriptionStats()` returns: +- `TotalClients` — number of active client subscriptions. +- `TotalTags` — number of unique tags with active MXAccess subscriptions. +- `ActiveSubscriptions` — total client-tag subscription count. + +## Dependencies + +- **MxAccessClient** (IScadaClient) — creates and disposes MXAccess tag subscriptions. +- **Configuration** — `SubscriptionConfiguration` for channel capacity and full mode. + +## Interactions + +- **GrpcServer** calls `SubscribeAsync` on Subscribe RPC and reads from the returned channel. +- **MxAccessClient** delivers value updates via the `OnTagValueChanged` callback. +- **HealthAndMetrics** reads subscription statistics for health checks and status reports. +- **ServiceHost** disposes the SubscriptionManager at shutdown. diff --git a/lmxproxy/docs/requirements/HighLevelReqs.md b/lmxproxy/docs/requirements/HighLevelReqs.md new file mode 100644 index 0000000..71d1cbd --- /dev/null +++ b/lmxproxy/docs/requirements/HighLevelReqs.md @@ -0,0 +1,274 @@ +# LmxProxy - High Level Requirements + +## 1. System Purpose + +LmxProxy is a gRPC proxy service that bridges SCADA clients to AVEVA System Platform (Wonderware) via the ArchestrA MXAccess COM API. It exists because MXAccess is a 32-bit COM component that requires co-location with System Platform on a Windows machine running .NET Framework 4.8. LmxProxy isolates this constraint behind a gRPC interface, allowing modern .NET clients to access System Platform data remotely over HTTP/2. + +## 2. Architecture + +### 2.1 Two-Project Structure + +- **ZB.MOM.WW.LmxProxy.Host** — .NET Framework 4.8, x86-only Windows service. Hosts a gRPC server (Grpc.Core) fronting the MXAccess COM API. Runs on the same machine as AVEVA System Platform. +- **ZB.MOM.WW.LmxProxy.Client** — .NET 10, AnyCPU class library. Code-first gRPC client (protobuf-net.Grpc) consumed by ScadaLink's Data Connection Layer. Packaged as a NuGet library. + +### 2.2 Dual gRPC Stacks + +The two projects use different gRPC implementations that are wire-compatible: + +- **Host**: Proto-file-generated code via `Grpc.Core` + `Grpc.Tools`. Uses the deprecated C-core gRPC library because .NET Framework 4.8 does not support `Grpc.Net.Server`. +- **Client**: Code-first contracts via `protobuf-net.Grpc` with `[DataContract]`/`[ServiceContract]` attributes over `Grpc.Net.Client`. + +Both target the same `scada.ScadaService` gRPC service definition and are wire-compatible. + +### 2.3 Deployment Model + +- The Host service runs on the AVEVA System Platform machine (or any machine with MXAccess access). +- Clients connect remotely over gRPC (HTTP/2) on a configurable port (default 50051). +- The Host runs as a Windows service managed by Topshelf. + +## 3. Communication Protocol + +### 3.1 Transport + +- gRPC over HTTP/2. +- Default server port: 50051. +- Optional TLS with mutual TLS (mTLS) support. + +### 3.2 RPCs + +The service exposes 10 RPCs: + +| RPC | Type | Description | +|-----|------|-------------| +| Connect | Unary | Establish session, returns session ID | +| Disconnect | Unary | Terminate session | +| GetConnectionState | Unary | Query MxAccess connection status | +| Read | Unary | Read single tag value | +| ReadBatch | Unary | Read multiple tag values | +| Write | Unary | Write single tag value | +| WriteBatch | Unary | Write multiple tag values | +| WriteBatchAndWait | Unary | Write values, poll flag tag until match or timeout | +| Subscribe | Server streaming | Stream tag value updates to client | +| CheckApiKey | Unary | Validate API key and return role | + +### 3.3 Data Model (VTQ) + +All tag values are represented as VTQ (Value, Timestamp, Quality) tuples: + +- **Value**: `TypedValue` — a protobuf `oneof` carrying the value in its native type (bool, int32, int64, float, double, string, bytes, datetime, typed arrays). An unset `oneof` represents null. +- **Timestamp**: UTC `DateTime.Ticks` as `int64` (100-nanosecond intervals since 0001-01-01 00:00:00 UTC). +- **Quality**: `QualityCode` — a structured message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Category derived from high bits: `0x00xxxxxx` = Good, `0x40xxxxxx` = Uncertain, `0x80xxxxxx` = Bad. + +## 4. Session Lifecycle + +- Clients call `Connect` with a client ID and optional API key to establish a session. +- The server returns a 32-character hex GUID as the session ID. +- All subsequent operations require the session ID for validation. +- Sessions persist until explicit `Disconnect` or server restart. There is no idle timeout. +- Session state is tracked in memory (not persisted). All sessions are lost on service restart. + +## 5. Authentication & Authorization + +### 5.1 API Key Authentication + +- API keys are validated via the `x-api-key` gRPC metadata header. +- Keys are stored in a JSON file (`apikeys.json` by default) with hot-reload via FileSystemWatcher (1-second debounce). +- If no API key file exists, the service auto-generates a default file with two random keys (one ReadOnly, one ReadWrite). +- Authentication is enforced at the gRPC interceptor level before any service method executes. + +### 5.2 Role-Based Authorization + +Two roles with hierarchical permissions: + +| Role | Read | Subscribe | Write | +|------|------|-----------|-------| +| ReadOnly | Yes | Yes | No | +| ReadWrite | Yes | Yes | Yes | + +Write-protected methods: `Write`, `WriteBatch`, `WriteBatchAndWait`. A ReadOnly key attempting a write receives `StatusCode.PermissionDenied`. + +### 5.3 TLS/Security + +- TLS is optional (disabled by default in configuration, though `Tls.Enabled` defaults to `true` in the config class). +- Supports server TLS and mutual TLS (client certificate validation). +- Client CA certificate path configurable for mTLS. +- Certificate revocation checking is optional. +- Client library supports TLS 1.2 and TLS 1.3, custom CA trust stores, self-signed certificate allowance, and server name override. + +## 6. Operations + +### 6.1 Read + +- Single tag read with configurable retry policy. +- Batch read with semaphore-controlled concurrency (default max 10 concurrent operations). +- Read timeout: 5 seconds (configurable). + +### 6.2 Write + +- Single tag write with retry policy. Values are sent as `TypedValue` (native protobuf types). Type mismatches between the value and the tag's expected type return a write failure. +- Batch write with semaphore-controlled concurrency. +- Write timeout: 5 seconds (configurable). +- WriteBatchAndWait: writes a batch, then polls the flag tag at a configurable interval until its value matches the expected flag value (type-aware comparison via `TypedValueEquals`) or a timeout expires. Default timeout: 5000ms, default poll interval: 100ms. Timeout is not an error — returns `flag_reached=false`. + +### 6.3 Subscribe + +- Server-streaming RPC. Client sends a list of tags and a sampling interval (in milliseconds). +- Server maintains a per-client bounded channel (default capacity 1000 messages). +- Updates are pushed as `VtqMessage` items on the stream. +- When the MxAccess connection drops, all subscribed clients receive a bad-quality notification. +- Subscriptions are cleaned up on client disconnect. When the last client unsubscribes from a tag, the underlying MxAccess subscription is disposed. + +## 7. Connection Resilience + +### 7.1 Host Auto-Reconnect + +- If the MxAccess connection is lost, the Host automatically attempts reconnection at a fixed interval (default 5 seconds). +- Stored subscriptions are recreated after a successful reconnect. +- Auto-reconnect is configurable (`Connection.AutoReconnect`, default true). + +### 7.2 Client Keep-Alive + +- The client sends a lightweight `GetConnectionState` ping every 30 seconds. +- On keep-alive failure, the client marks the connection as disconnected and cleans up subscriptions. + +### 7.3 Client Retry Policy + +- Polly-based exponential backoff retry. +- Default: 3 attempts with 1-second initial delay (1s → 2s → 4s). +- Transient errors retried: Unavailable, DeadlineExceeded, ResourceExhausted, Aborted. + +## 8. Health Monitoring & Metrics + +### 8.1 Health Checks + +Two health check implementations: + +- **Basic** (`HealthCheckService`): Checks MxAccess connection state, subscription stats, and operation success rate. Returns Degraded if success rate < 50% (with > 100 operations) or client count > 100. +- **Detailed** (`DetailedHealthCheckService`): Reads a test tag (`System.Heartbeat`). Returns Unhealthy if not connected, Degraded if test tag quality is not Good or timestamp is older than 5 minutes. + +### 8.2 Performance Metrics + +- Per-operation tracking: Read, ReadBatch, Write, WriteBatch. +- Metrics: total count, success count, success rate, average/min/max latency, 95th percentile latency. +- Rolling buffer of 1000 latency samples per operation for percentile calculation. +- Metrics reported to logs every 60 seconds. + +### 8.3 Status Web Server + +- HTTP status server on port 8080 (configurable). +- Endpoints: + - `GET /` — HTML dashboard with auto-refresh (30 seconds), color-coded status cards, operations table. + - `GET /api/status` — JSON status report. + - `GET /api/health` — Plain text `OK` (200) or `UNHEALTHY` (503). + +### 8.4 Client Metrics + +- Per-operation counts, error counts, and latency tracking (average, p95, p99). +- Rolling buffer of 1000 latency samples. +- Exposed via `ILmxProxyClient.GetMetrics()`. + +## 9. Service Hosting + +### 9.1 Topshelf Windows Service + +- Service name: `ZB.MOM.WW.LmxProxy.Host` +- Display name: `SCADA Bridge LMX Proxy` +- Starts automatically on boot. + +### 9.2 Service Recovery (Windows SCM) + +| Failure | Restart Delay | +|---------|--------------| +| First | 1 minute | +| Second | 5 minutes | +| Subsequent | 10 minutes | +| Reset period | 1 day | + +### 9.3 Startup Sequence + +1. Load configuration from `appsettings.json` + environment variables. +2. Configure Serilog (console + file sinks). +3. Validate configuration. +4. Check/generate TLS certificates (if TLS enabled). +5. Initialize services: PerformanceMetrics, ApiKeyService, MxAccessClient, SubscriptionManager, SessionManager, HealthCheckService, StatusReportService. +6. Connect to MxAccess synchronously (timeout: 30 seconds). +7. Start auto-reconnect monitor loop (if enabled). +8. Start gRPC server on configured port. +9. Start HTTP status web server. + +### 9.4 Shutdown Sequence + +1. Cancel reconnect monitor (5-second wait). +2. Graceful gRPC server shutdown (10-second timeout, then kill). +3. Stop status web server (5-second wait). +4. Dispose all components in reverse order. +5. Disconnect from MxAccess (10-second timeout). + +## 10. Configuration + +All configuration is via `appsettings.json` bound to `LmxProxyConfiguration`. Key settings: + +| Section | Setting | Default | +|---------|---------|---------| +| Root | GrpcPort | 50051 | +| Root | ApiKeyConfigFile | `apikeys.json` | +| Connection | MonitorIntervalSeconds | 5 | +| Connection | ConnectionTimeoutSeconds | 30 | +| Connection | ReadTimeoutSeconds | 5 | +| Connection | WriteTimeoutSeconds | 5 | +| Connection | MaxConcurrentOperations | 10 | +| Connection | AutoReconnect | true | +| Subscription | ChannelCapacity | 1000 | +| Subscription | ChannelFullMode | DropOldest | +| Tls | Enabled | false | +| Tls | RequireClientCertificate | false | +| WebServer | Enabled | true | +| WebServer | Port | 8080 | + +Configuration is validated at startup. Invalid values cause the service to fail to start. + +## 11. Logging + +- Serilog with console and file sinks. +- File sink: `logs/lmxproxy-.txt`, daily rolling, 30 files retained. +- Default level: Information. Overrides: Microsoft=Warning, System=Warning, Grpc=Information. +- Enrichment: FromLogContext, WithMachineName, WithThreadId. + +## 12. Constraints + +- Host **must** target x86 and .NET Framework 4.8 (MXAccess is 32-bit COM). +- Host uses `Grpc.Core` (deprecated C-core library), required because .NET 4.8 does not support `Grpc.Net.Server`. +- Client targets .NET 10 and runs in ScadaLink central/site clusters. +- MxAccess COM operations require STA thread context (wrapped in `Task.Run`). +- The solution file uses `.slnx` format. + +## 13. Protocol + +The protocol specification is defined in `lmxproxy_updates.md`, which is the authoritative source of truth. All RPC signatures, message schemas, and behavioral specifications are per that document. + +### 13.1 Value System (TypedValue) + +Values are transmitted in their native protobuf types via a `TypedValue` oneof: bool, int32, int64, float, double, string, bytes, datetime (int64 UTC Ticks), and typed arrays. An unset oneof represents null. No string serialization or parsing heuristics are used. + +### 13.2 Quality System (QualityCode) + +Quality is a structured `QualityCode` message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Supports AVEVA-aligned quality sub-codes (e.g., `BadSensorFailure` = `0x806D0000`, `GoodLocalOverride` = `0x00D80000`, `BadWaitingForInitialData` = `0x80320000`). See Component-Protocol for the full quality code table. + +### 13.3 Migration from V1 + +The current codebase implements the v1 protocol (string-encoded values, three-state string quality). The v2 protocol is a clean break — all clients and servers will be updated simultaneously. No backward compatibility layer. This is appropriate because LmxProxy is an internal protocol with a small, controlled client count. + +## 14. Component List (10 Components) + +| # | Component | Description | +|---|-----------|-------------| +| 1 | GrpcServer | gRPC service implementation, session validation, request routing | +| 2 | MxAccessClient | MXAccess COM interop wrapper, connection lifecycle, read/write/subscribe | +| 3 | SessionManager | Client session tracking and lifecycle | +| 4 | Security | API key authentication, role-based authorization, TLS management | +| 5 | SubscriptionManager | Tag subscription lifecycle, channel-based update delivery, backpressure | +| 6 | Configuration | appsettings.json structure, validation, options binding | +| 7 | HealthAndMetrics | Health checks, performance metrics, status web server | +| 8 | ServiceHost | Topshelf hosting, startup/shutdown, logging setup, service recovery | +| 9 | Client | LmxProxyClient library, builder, retry, streaming, DI integration | +| 10 | Protocol | gRPC protocol specification, proto definition, code-first contracts |