docs: add LmxProxy requirements documentation with v2 protocol as authoritative design

Generate high-level requirements and 10 component documents derived from source code
and protocol specs. Uses lmxproxy_updates.md (v2 TypedValue/QualityCode) as the source
of truth, with v1 string-based encoding documented as legacy context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-03-21 22:38:11 -04:00
parent 970d0a5cb3
commit 683aea0fbe
12 changed files with 1702 additions and 0 deletions

71
lmxproxy/CLAUDE.md Normal file
View File

@@ -0,0 +1,71 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What This Is
LmxProxy is a gRPC proxy that bridges ScadaLink's Data Connection Layer to AVEVA System Platform via the ArchestrA MXAccess COM API. It has two projects:
- **Host** (`ZB.MOM.WW.LmxProxy.Host`) — .NET Framework 4.8, x86-only Windows service. Hosts a gRPC server (Grpc.Core) that fronts an MxAccessClient talking to ArchestrA MXAccess. Runs as a Windows service via Topshelf.
- **Client** (`ZB.MOM.WW.LmxProxy.Client`) — .NET 10, AnyCPU library. Code-first gRPC client (protobuf-net.Grpc) consumed by ScadaLink's DCL. This is a NuGet-packable library.
The two projects use **different gRPC stacks**: Host uses proto-file-generated code (`Grpc.Core` + `Grpc.Tools`), Client uses code-first contracts (`protobuf-net.Grpc` with `[DataContract]`/`[ServiceContract]` attributes). They are wire-compatible because both target the same `scada.ScadaService` gRPC service.
## Build Commands
```bash
dotnet build ZB.MOM.WW.LmxProxy.slnx # Build entire solution
dotnet build src/ZB.MOM.WW.LmxProxy.Host # Host only (requires x86 platform)
dotnet build src/ZB.MOM.WW.LmxProxy.Client # Client only
```
The Host project requires the `ArchestrA.MXAccess.dll` COM interop assembly in `lib/`. It targets x86 exclusively (MXAccess is 32-bit COM).
## Architecture
### Host Service Startup Chain
`Program.Main` → Topshelf `HostFactory``LmxProxyService.Start()` which:
1. Validates configuration (`appsettings.json` bound to `LmxProxyConfiguration`)
2. Creates `MxAccessClient` (the `IScadaClient` impl that wraps ArchestrA.MXAccess COM)
3. Connects to MxAccess synchronously at startup
4. Starts connection monitor loop (auto-reconnect)
5. Creates `SubscriptionManager`, `SessionManager`, `PerformanceMetrics`, `ApiKeyService`
6. Creates `ScadaGrpcService` (the proto-generated service impl) with all dependencies
7. Starts Grpc.Core `Server` on configured port (default 50051)
8. Starts HTTP status web server (default port 8080)
### Key Host Components
- `MxAccessClient` — Partial class split across 6 files (Connection, ReadWrite, Subscription, EventHandlers, NestedTypes, main). Wraps `LMXProxyServer` COM object. Uses semaphores for concurrency control.
- `ScadaGrpcService` — Inherits proto-generated `ScadaService.ScadaServiceBase`. All RPCs validate session first, then delegate to `IScadaClient`. Values are string-serialized on the wire (v1 protocol).
- `SessionManager` — Tracks client sessions by GUID.
- `SubscriptionManager` — Manages MxAccess subscriptions, fans out updates via `System.Threading.Channels`.
- `ApiKeyInterceptor` — gRPC server interceptor for API key validation.
### Client Architecture
- `ILmxProxyClient` — Public interface for consumers. Connect/Read/Write/Subscribe/Dispose.
- `LmxProxyClient` — Partial class split across multiple files (Connection, Subscription, Metrics, etc.). Uses `protobuf-net.Grpc` code-first contracts (`IScadaService` in `Domain/ScadaContracts.cs`).
- `LmxProxyClientBuilder` — Fluent builder for configuring client instances.
- `Domain/ScadaContracts.cs` — All gRPC message types as `[DataContract]` POCOs and the `IScadaService` interface with `[ServiceContract]`.
- Value conversion: Client parses string values from wire using double → bool → string heuristic in `ConvertToVtq()`. Writes use `.ToString()` via `ConvertToString()`.
### Protocol
Proto definition: `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Protos/scada.proto`
Currently v1 protocol (string-encoded values, string quality). A v2 protocol spec exists in `docs/lmxproxy_updates.md` that introduces `TypedValue` (protobuf oneof) and `QualityCode` (OPC UA status codes) — not yet implemented.
RPCs: Connect, Disconnect, GetConnectionState, Read, ReadBatch, Write, WriteBatch, WriteBatchAndWait, Subscribe (server streaming), CheckApiKey.
### Configuration
Host configured via `appsettings.json` bound to `LmxProxyConfiguration`. Key sections: GrpcPort, Connection (timeouts, auto-reconnect), Subscription (channel capacity), Tls, WebServer, Serilog, RetryPolicies, HealthCheck.
## Important Constraints
- Host **must** target x86 and .NET Framework 4.8 (ArchestrA.MXAccess is 32-bit COM interop).
- Host uses `Grpc.Core` (the deprecated C-core gRPC library), not `Grpc.Net`. This is required because .NET 4.8 doesn't support `Grpc.Net.Server`.
- Client uses `Grpc.Net.Client` and targets .NET 10 — it runs in the ScadaLink central/site clusters.
- The solution file is `.slnx` format (XML-based, not the older text format).

View File

@@ -0,0 +1,200 @@
# Component: Client
## Purpose
A .NET 10 class library providing a typed gRPC client for consuming the LmxProxy service. Used by ScadaLink's Data Connection Layer to connect to AVEVA System Platform via the LmxProxy Host.
## Location
`src/ZB.MOM.WW.LmxProxy.Client/` — all files in this project.
Key files:
- `ILmxProxyClient.cs` — public interface.
- `LmxProxyClient.cs` — main implementation (partial class across multiple files).
- `LmxProxyClientBuilder.cs` — fluent builder for client construction.
- `ServiceCollectionExtensions.cs` — DI integration and options classes.
- `ILmxProxyClientFactory.cs` — factory interface and implementation.
- `StreamingExtensions.cs` — batch and parallel streaming helpers.
- `Domain/ScadaContracts.cs` — code-first gRPC contracts.
- `Security/GrpcChannelFactory.cs` — TLS channel creation.
## Responsibilities
- Connect to and communicate with the LmxProxy Host gRPC service.
- Manage session lifecycle (connect, keep-alive, disconnect).
- Execute read, write, and subscribe operations with retry and concurrency control.
- Provide a fluent builder and DI integration for configuration.
- Track client-side performance metrics.
- Support TLS and mutual TLS connections.
## 1. Public Interface (ILmxProxyClient)
| Method | Description |
|--------|-------------|
| `ConnectAsync(ct)` | Establish gRPC channel and session |
| `DisconnectAsync()` | Graceful disconnect |
| `IsConnectedAsync()` | Thread-safe connection state check |
| `ReadAsync(address, ct)` | Read single tag, returns Vtq |
| `ReadBatchAsync(addresses, ct)` | Read multiple tags, returns dictionary |
| `WriteAsync(address, value, ct)` | Write single tag value |
| `WriteBatchAsync(values, ct)` | Write multiple tag values |
| `SubscribeAsync(addresses, onUpdate, onStreamError, ct)` | Subscribe to tag updates with value and error callbacks |
| `GetMetrics()` | Return operation counts, errors, latency stats |
| `DefaultTimeout` | Configurable timeout (default 30s, range 1s10min) |
Implements `IDisposable` and `IAsyncDisposable`.
## 2. Connection Management
### 2.1 Connect
`ConnectAsync()`:
1. Creates a gRPC channel via `GrpcChannelFactory` (HTTP or HTTPS based on TLS config).
2. Creates a `protobuf-net.Grpc` client for `IScadaService`.
3. Calls the `Connect` RPC with a client ID (format: `ScadaBridge-{guid}`) and optional API key.
4. Stores the returned session ID.
5. Starts the keep-alive timer.
### 2.2 Keep-Alive
- Timer-based ping every **30 seconds** (hardcoded).
- Sends a lightweight `GetConnectionState` RPC.
- On failure: stops the timer, marks disconnected, triggers subscription cleanup.
### 2.3 Disconnect
`DisconnectAsync()`:
1. Stops keep-alive timer.
2. Calls `Disconnect` RPC.
3. Clears session ID.
4. Disposes gRPC channel.
### 2.4 Connection State
`IsConnected` property: `!_disposed && _isConnected && !string.IsNullOrEmpty(_sessionId)`.
## 3. Builder Pattern (LmxProxyClientBuilder)
| Method | Default | Constraint |
|--------|---------|-----------|
| `WithHost(string)` | Required | Non-null/non-empty |
| `WithPort(int)` | 5050 | 165535 |
| `WithApiKey(string?)` | null | Optional |
| `WithTimeout(TimeSpan)` | 30 seconds | > 0 and ≤ 10 minutes |
| `WithLogger(ILogger)` | NullLogger | Optional |
| `WithSslCredentials(string?)` | Disabled | Optional cert path |
| `WithTlsConfiguration(ClientTlsConfiguration)` | null | Full TLS config |
| `WithRetryPolicy(int, TimeSpan)` | 3 attempts, 1s delay | maxAttempts > 0, delay > 0 |
| `WithMetrics()` | Disabled | Enables metric collection |
| `WithCorrelationIdHeader(string)` | null | Custom header name |
## 4. Retry Policy
Polly-based exponential backoff:
- Default: **3 attempts** with **1-second** initial delay.
- Backoff sequence: `delay * 2^(retryAttempt - 1)` → 1s, 2s, 4s.
- Transient errors retried: `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, `Aborted`.
- Each retry is logged with correlation ID at Warning level.
## 5. Subscription
### 5.1 Subscribe API
`SubscribeAsync(addresses, onUpdate, onStreamError, ct)` returns an `ISubscription`:
- Calls the `Subscribe` RPC (server streaming) with the tag list and default sampling interval (**1000ms**).
- Processes streamed `VtqMessage` items asynchronously, invoking the `onUpdate(tag, vtq)` callback for each.
- On stream termination (server disconnect, gRPC error, or connection drop), invokes the `onStreamError` callback exactly once.
- On stream error, the client immediately nullifies its session ID, causing `IsConnected` to return `false`. This triggers the DCL adapter's `Disconnected` event and reconnection cycle.
- Errors are logged per-subscription.
### 5.2 ISubscription
- `Dispose()` — synchronous disposal with **5-second** timeout.
- Automatic callback on disposal for cleanup.
## 6. DI Integration
### 6.1 Service Collection Extensions
| Method | Lifetime | Description |
|--------|----------|-------------|
| `AddLmxProxyClient(IConfiguration)` | Singleton | Bind `LmxProxy` config section |
| `AddLmxProxyClient(IConfiguration, string)` | Singleton | Bind named config section |
| `AddLmxProxyClient(Action<Builder>)` | Singleton | Builder action |
| `AddScopedLmxProxyClient(IConfiguration)` | Scoped | Per-scope lifetime |
| `AddNamedLmxProxyClient(string, Action<Builder>)` | Keyed singleton | Named/keyed registration |
### 6.2 Configuration Options (LmxProxyClientOptions)
Bound from `appsettings.json`:
| Setting | Default | Description |
|---------|---------|-------------|
| Host | `localhost` | Server hostname |
| Port | 5050 | Server port |
| ApiKey | null | API key |
| Timeout | 30 seconds | Operation timeout |
| UseSsl | false | Enable TLS |
| CertificatePath | null | SSL certificate path |
| EnableMetrics | false | Enable client metrics |
| CorrelationIdHeader | null | Custom correlation header |
| Retry:MaxAttempts | 3 | Retry attempts |
| Retry:Delay | 1 second | Initial retry delay |
### 6.3 Factory Pattern
`ILmxProxyClientFactory` creates configured clients:
- `CreateClient()` — uses default `LmxProxy` config section.
- `CreateClient(string)` — uses named config section.
- `CreateClient(Action<Builder>)` — uses builder action.
Registered as singleton in DI.
## 7. Streaming Extensions
Helper methods for large-scale batch operations:
| Method | Default Batch Size | Description |
|--------|--------------------|-------------|
| `ReadStreamAsync` | 100 | Batched reads, 2 retries per batch, stops after 3 consecutive errors. Returns `IAsyncEnumerable<KeyValuePair<string, Vtq>>`. |
| `WriteStreamAsync` | 100 | Batched writes from async enumerable input. Returns total count written. |
| `ProcessInParallelAsync` | — | Parallel processing with max concurrency of **4** (configurable). Semaphore-based rate limiting. |
| `SubscribeStreamAsync` | — | Wraps callback-based subscription into `IAsyncEnumerable<Vtq>` via `System.Threading.Channels`. |
## 8. Client Metrics
When metrics are enabled (`WithMetrics()`):
- Per-operation tracking: counts, error counts, latency.
- Rolling buffer of **1000** latency samples per operation (prevents memory growth).
- Snapshot via `GetMetrics()` returns: `{op}_count`, `{op}_errors`, `{op}_avg_latency_ms`, `{op}_p95_latency_ms`, `{op}_p99_latency_ms`.
## 9. Value and Quality Handling
### 9.1 Values (TypedValue)
Read responses and subscription updates return values as `TypedValue` (protobuf oneof). The client extracts the value directly from the appropriate oneof field (e.g., `vtq.Value.DoubleValue`, `vtq.Value.BoolValue`). Write operations construct `TypedValue` with the correct oneof case for the value's native type. No string serialization or parsing is needed.
### 9.2 Quality (QualityCode)
Quality is received as a `QualityCode` message. Category checks use bitmask: `IsGood = (statusCode & 0xC0000000) == 0x00000000`, `IsBad = (statusCode & 0xC0000000) == 0x80000000`. The `symbolic_name` field provides human-readable quality for logging and display.
### 9.3 Current Implementation (V1 Legacy)
The current codebase still uses v1 string-based encoding. During v2 migration, the following will be removed:
- `ConvertToVtq()` — parses string values via heuristic (double → bool → null → raw string).
- `ConvertToString()` — serializes values via `.ToString()`.
## Dependencies
- **protobuf-net.Grpc** — code-first gRPC client.
- **Grpc.Net.Client** — HTTP/2 gRPC transport.
- **Polly** — retry policies.
- **Microsoft.Extensions.DependencyInjection** — DI integration.
- **Microsoft.Extensions.Configuration** — options binding.
- **Microsoft.Extensions.Logging** — logging abstraction.
## Interactions
- **ScadaLink Data Connection Layer** consumes the client library via `ILmxProxyClient`.
- **Protocol** — the client uses code-first contracts (`IScadaService`) that are wire-compatible with the Host's proto-generated service.
- **Security** — `GrpcChannelFactory` creates TLS-configured channels matching the Host's TLS configuration.

View File

@@ -0,0 +1,122 @@
# Component: Configuration
## Purpose
Defines the `appsettings.json` structure, configuration binding, and startup validation for the LmxProxy Host service.
## Location
- `src/ZB.MOM.WW.LmxProxy.Host/Configuration/LmxProxyConfiguration.cs` — root configuration class.
- `src/ZB.MOM.WW.LmxProxy.Host/Configuration/ConfigurationValidator.cs` — validation logic.
- `src/ZB.MOM.WW.LmxProxy.Host/appsettings.json` — default configuration file.
## Responsibilities
- Define all configurable settings as strongly-typed classes.
- Bind `appsettings.json` sections to configuration objects via `Microsoft.Extensions.Configuration`.
- Validate all settings at startup, failing fast on invalid values.
- Support environment variable overrides.
## 1. Configuration Structure
### 1.1 Root: LmxProxyConfiguration
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| GrpcPort | int | 50051 | gRPC server listen port |
| ApiKeyConfigFile | string | `apikeys.json` | Path to API key configuration file |
| Subscription | SubscriptionConfiguration | — | Subscription channel settings |
| ServiceRecovery | ServiceRecoveryConfiguration | — | Windows SCM recovery settings |
| Connection | ConnectionConfiguration | — | MxAccess connection settings |
| Tls | TlsConfiguration | — | TLS/SSL settings |
| WebServer | WebServerConfiguration | — | Status web server settings |
### 1.2 ConnectionConfiguration
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| MonitorIntervalSeconds | int | 5 | Auto-reconnect check interval |
| ConnectionTimeoutSeconds | int | 30 | Initial connection timeout |
| ReadTimeoutSeconds | int | 5 | Per-read operation timeout |
| WriteTimeoutSeconds | int | 5 | Per-write operation timeout |
| MaxConcurrentOperations | int | 10 | Semaphore limit for concurrent MxAccess operations |
| AutoReconnect | bool | true | Enable auto-reconnect loop |
| NodeName | string? | null | MxAccess node name (optional) |
| GalaxyName | string? | null | MxAccess galaxy name (optional) |
### 1.3 SubscriptionConfiguration
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| ChannelCapacity | int | 1000 | Per-client subscription buffer size |
| ChannelFullMode | string | `DropOldest` | Backpressure strategy: `DropOldest`, `DropNewest`, `Wait` |
### 1.4 TlsConfiguration
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| Enabled | bool | false | Enable TLS on gRPC server |
| ServerCertificatePath | string | `certs/server.crt` | PEM server certificate |
| ServerKeyPath | string | `certs/server.key` | PEM server private key |
| ClientCaCertificatePath | string | `certs/ca.crt` | CA certificate for mTLS |
| RequireClientCertificate | bool | false | Require client certificates |
| CheckCertificateRevocation | bool | false | Enable CRL checking |
### 1.5 WebServerConfiguration
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| Enabled | bool | true | Enable status web server |
| Port | int | 8080 | HTTP listen port |
| Prefix | string? | null | Custom URL prefix (defaults to `http://+:{Port}/`) |
### 1.6 ServiceRecoveryConfiguration
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| FirstFailureDelayMinutes | int | 1 | Restart delay after first failure |
| SecondFailureDelayMinutes | int | 5 | Restart delay after second failure |
| SubsequentFailureDelayMinutes | int | 10 | Restart delay after subsequent failures |
| ResetPeriodDays | int | 1 | Days before failure count resets |
## 2. Validation
`ConfigurationValidator.ValidateAndLog()` runs at startup and checks:
- **GrpcPort**: Must be 165535.
- **Connection**: All timeout values > 0. NodeName and GalaxyName ≤ 255 characters.
- **Subscription**: ChannelCapacity 0100000. ChannelFullMode must be one of `DropOldest`, `DropNewest`, `Wait`.
- **ServiceRecovery**: All failure delay values ≥ 0. ResetPeriodDays > 0.
- **TLS**: If enabled, validates certificate file paths exist.
Validation errors are logged and cause the service to throw `InvalidOperationException`, preventing startup.
## 3. Configuration Sources
Configuration is loaded via `Microsoft.Extensions.Configuration.ConfigurationBuilder`:
1. `appsettings.json` (required).
2. Environment variables (override any JSON setting).
## 4. Serilog Configuration
Logging is configured in the `Serilog` section of `appsettings.json`:
| Setting | Value |
|---------|-------|
| Console sink | ANSI theme, custom template with HH:mm:ss timestamp |
| File sink | `logs/lmxproxy-.txt`, daily rolling, 30 files retained |
| Default level | Information |
| Override: Microsoft | Warning |
| Override: System | Warning |
| Override: Grpc | Information |
| Enrichment | FromLogContext, WithMachineName, WithThreadId |
## Dependencies
- **Microsoft.Extensions.Configuration** — configuration binding.
- **Serilog.Settings.Configuration** — Serilog configuration from appsettings.
## Interactions
- **ServiceHost** (Program.cs) loads and validates configuration at startup.
- All other components receive their settings from the bound configuration objects.

View File

@@ -0,0 +1,86 @@
# Component: GrpcServer
## Purpose
The gRPC service implementation that receives client RPCs, validates sessions, and delegates operations to the MxAccessClient. It is the network-facing entry point for all SCADA operations.
## Location
`src/ZB.MOM.WW.LmxProxy.Host/Grpc/ScadaGrpcService.cs` — inherits proto-generated `ScadaService.ScadaServiceBase`.
## Responsibilities
- Implement all 10 gRPC RPCs defined in `scada.proto`.
- Validate session IDs on all data operations before processing.
- Delegate read/write/subscribe operations to the MxAccessClient.
- Convert between gRPC message types and internal domain types (Vtq, Quality).
- Track operation timing and success/failure via PerformanceMetrics.
- Handle errors gracefully, returning structured error responses rather than throwing.
## 1. RPC Implementations
### 1.1 Connection Management
- **Connect**: Creates a new session via SessionManager if MxAccess is connected. Returns the session ID (32-character hex GUID). Rejects if MxAccess is disconnected.
- **Disconnect**: Terminates the session via SessionManager.
- **GetConnectionState**: Returns `IsConnected`, `ClientId`, and `ConnectedSinceUtcTicks` from the MxAccessClient.
### 1.2 Read Operations
- **Read**: Validates session, applies Polly retry policy, calls MxAccessClient.ReadAsync(), returns VtqMessage. On invalid session, returns a VtqMessage with `Quality.Bad`.
- **ReadBatch**: Validates session, reads all tags via MxAccessClient.ReadBatchAsync() with semaphore-controlled concurrency (max 10 concurrent). Returns results in request order. Batch reads are partially successful — individual tags may have Bad quality (with current UTC timestamp) while the overall response succeeds. If a tag read throws an exception, its VTQ is returned with Bad quality.
### 1.3 Write Operations
- **Write**: Validates session, parses the string value using the type heuristic, calls MxAccessClient.WriteAsync().
- **WriteBatch**: Validates session, writes all items in parallel via MxAccessClient with semaphore concurrency control. Returns per-item success/failure results. Overall `success` is `false` if any item fails (all-or-nothing at the reporting level).
- **WriteBatchAndWait**: Validates session, writes all items first. If any write fails, returns immediately with `success=false`. If writes succeed, polls `flag_tag` at `poll_interval_ms` intervals using type-aware `TypedValueEquals()` comparison (same oneof case required, native type equality, case-sensitive strings, null equals null only). Default timeout: 5000ms, default poll interval: 100ms. If flag matches before timeout: `success=true`, `flag_reached=true`. If timeout expires: `success=true`, `flag_reached=false` (timeout is not an error). Returns `flag_reached` boolean and `elapsed_ms`.
### 1.4 Subscription
- **Subscribe**: Validates session (throws `RpcException(Unauthenticated)` on invalid). Creates a subscription handle via SubscriptionManager. Streams VtqMessage items from the subscription channel to the client. Cleans up the subscription on stream cancellation or error.
### 1.5 API Key Check
- **CheckApiKey**: Returns validity and role information from the interceptor context.
## 2. Value and Quality Handling
### 2.1 Values (TypedValue)
Read responses and subscription updates return values as `TypedValue` (protobuf oneof carrying native types). Write requests receive `TypedValue` and apply the value directly to MxAccess by its native type. If the `oneof` case doesn't match the tag's expected data type, the write returns `WriteResult` with `success=false` indicating type mismatch. No string serialization or parsing heuristics are used.
### 2.2 Quality (QualityCode)
Quality is returned as a `QualityCode` message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. The server maps MxAccess quality codes to OPC UA status codes per the quality table in Component-Protocol. Specific error scenarios return specific quality codes (e.g., tag not found → `BadConfigurationError`, comms loss → `BadCommunicationFailure`).
### 2.3 Current Implementation (V1 Legacy)
The current codebase still uses v1 string-based encoding. During v2 migration, the following v1 behavior will be removed:
- `ConvertValueToString()` — serializes values to strings (bool → lowercase, DateTime → ISO-8601, arrays → JSON, others → `.ToString()`).
- `ParseValue()` — parses string values in order: bool → int → long → double → DateTime → raw string.
- Three-state string quality mapping: ≥192 → `"Good"`, 64191 → `"Uncertain"`, <64 → `"Bad"`.
## 3. Error Handling
- All RPC methods catch exceptions and return error responses with `success=false` and a descriptive message. Exceptions do not propagate as gRPC status codes (except Subscribe, which throws `RpcException` for invalid sessions).
- Each operation is wrapped in a PerformanceMetrics timing scope that records duration and success/failure.
## 4. Session Validation
- All data operations (Read, ReadBatch, Write, WriteBatch, WriteBatchAndWait, Subscribe) validate the session ID before processing.
- Invalid session on read/write operations returns a response with Bad quality VTQ.
- Invalid session on Subscribe throws `RpcException` with `StatusCode.Unauthenticated`.
## Dependencies
- **MxAccessClient** (IScadaClient) — all SCADA operations are delegated here.
- **SessionManager** — session creation, validation, and termination.
- **SubscriptionManager** — subscription lifecycle for the Subscribe RPC.
- **PerformanceMetrics** — operation timing and success/failure tracking.
## Interactions
- **ApiKeyInterceptor** intercepts all RPCs before they reach ScadaGrpcService, enforcing API key authentication and role-based write authorization.
- **SubscriptionManager** provides the channel that Subscribe streams from.
- **StatusReportService** reads PerformanceMetrics data that ScadaGrpcService populates.

View File

@@ -0,0 +1,121 @@
# Component: HealthAndMetrics
## Purpose
Provides health checking, performance metrics collection, and an HTTP status dashboard for monitoring the LmxProxy service.
## Location
- `src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs` — basic health check.
- `src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs` — detailed health check with test tag read.
- `src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs` — operation metrics collection.
- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs` — status report generation.
- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs` — HTTP status endpoint.
## Responsibilities
- Evaluate service health based on connection state, operation success rates, and test tag reads.
- Track per-operation performance metrics (counts, latencies, percentiles).
- Serve an HTML status dashboard and JSON/health HTTP endpoints.
- Report metrics to logs on a periodic interval.
## 1. Health Checks
### 1.1 Basic Health Check (HealthCheckService)
`CheckHealthAsync()` evaluates:
| Check | Healthy | Degraded |
|-------|---------|----------|
| MxAccess connected | Yes | — |
| Success rate (if > 100 total ops) | ≥ 50% | < 50% |
| Client count | ≤ 100 | > 100 |
Returns health data dictionary: `scada_connected`, `scada_connection_state`, `total_clients`, `total_tags`, `total_operations`, `average_success_rate`.
### 1.2 Detailed Health Check (DetailedHealthCheckService)
`CheckHealthAsync()` performs an active probe:
1. Checks `IsConnected` — returns **Unhealthy** if not connected.
2. Reads a test tag (default `System.Heartbeat`).
3. If test tag quality is not Good — returns **Degraded**.
4. If test tag timestamp is older than **5 minutes** — returns **Degraded** (stale data detection).
5. Otherwise returns **Healthy**.
## 2. Performance Metrics
### 2.1 Tracking
`PerformanceMetrics` uses a `ConcurrentDictionary<string, OperationMetrics>` to track operations by name.
Operations tracked: `Read`, `ReadBatch`, `Write`, `WriteBatch` (recorded by ScadaGrpcService).
### 2.2 Recording
Two recording patterns:
- `RecordOperation(name, duration, success)` — explicit recording.
- `BeginOperation(name)` — returns an `ITimingScope` (disposable). On dispose, automatically records duration (via `Stopwatch`) and success flag (set via `SetSuccess(bool)`).
### 2.3 Per-Operation Statistics
`OperationMetrics` maintains:
- `_totalCount`, `_successCount` — running counters.
- `_totalMilliseconds`, `_minMilliseconds`, `_maxMilliseconds` — latency range.
- `_durations` — rolling buffer of up to **1000 latency samples** for percentile calculation.
`MetricsStatistics` snapshot:
- `TotalCount`, `SuccessCount`, `SuccessRate` (percentage).
- `AverageMilliseconds`, `MinMilliseconds`, `MaxMilliseconds`.
- `Percentile95Milliseconds` — calculated from sorted samples at the 95th percentile index.
### 2.4 Periodic Reporting
A timer fires every **60 seconds**, logging a summary of all operation metrics to Serilog.
## 3. Status Web Server
### 3.1 Server
`StatusWebServer` uses `HttpListener` on `http://+:{Port}/` (default port 8080).
- Starts an async request-handling loop, spawning a task per request.
- Graceful shutdown: cancels the listener, waits **5 seconds** for the listener task to exit.
- Returns HTTP 405 for non-GET methods, HTTP 500 on errors.
### 3.2 Endpoints
| Endpoint | Method | Response |
|----------|--------|----------|
| `/` | GET | HTML dashboard (auto-refresh every 30 seconds) |
| `/api/status` | GET | JSON status report (camelCase) |
| `/api/health` | GET | Plain text `OK` (200) or `UNHEALTHY` (503) |
### 3.3 HTML Dashboard
Generated by `StatusReportService`:
- Bootstrap-like CSS grid layout with status cards.
- Color-coded status: green = Healthy, yellow = Degraded, red = Unhealthy/Error.
- Operations table with columns: Count, SuccessRate, Avg/Min/Max/P95 milliseconds.
- Service metadata: ServiceName, Version (assembly version), connection state.
- Subscription stats: TotalClients, TotalTags, ActiveSubscriptions.
- Auto-refresh via `<meta http-equiv="refresh" content="30">`.
- Last updated timestamp.
### 3.4 JSON Status Report
Fully nested structure with camelCase property names:
- Service metadata, connection status, subscription stats, performance data, health check results.
## Dependencies
- **MxAccessClient** — `IsConnected`, `ConnectionState` for health checks; test tag read for detailed check.
- **SubscriptionManager** — subscription statistics.
- **PerformanceMetrics** — operation statistics for status report and health evaluation.
- **Configuration** — `WebServerConfiguration` for port and prefix.
## Interactions
- **GrpcServer** populates PerformanceMetrics via timing scopes on every RPC.
- **ServiceHost** creates all health/metrics/status components at startup and disposes them at shutdown.
- External monitoring systems can poll `/api/health` for availability checks.

View File

@@ -0,0 +1,108 @@
# Component: MxAccessClient
## Purpose
The core component that wraps the ArchestrA MXAccess COM API, providing connection management, tag read/write operations, and subscription-based value change notifications. This is the bridge between the gRPC service layer and AVEVA System Platform.
## Location
`src/ZB.MOM.WW.LmxProxy.Host/MxAccess/MxAccessClient.cs` — partial class split across 6 files:
- `MxAccessClient.cs` — Main class, properties, disposal, factory.
- `MxAccessClient.Connection.cs` — Connection lifecycle (connect, disconnect, reconnect, cleanup).
- `MxAccessClient.ReadWrite.cs` — Read and write operations with retry and concurrency control.
- `MxAccessClient.Subscription.cs` — Subscription management and stored subscription state.
- `MxAccessClient.EventHandlers.cs` — COM event handlers (OnDataChange, OnWriteComplete, OperationComplete).
- `MxAccessClient.NestedTypes.cs` — Internal types and enums.
## Responsibilities
- Manage the MXAccess COM object lifecycle (create, register, unregister, release).
- Maintain connection state (Disconnected, Connecting, Connected, Disconnecting, Error, Reconnecting) and fire state change events.
- Execute read and write operations against MXAccess with concurrency control via semaphores.
- Manage tag subscriptions via MXAccess advise callbacks and store subscription state for reconnection.
- Handle COM threading constraints (STA thread context via `Task.Run`).
## 1. Connection Lifecycle
### 1.1 Connect
`ConnectAsync()` wraps `ConnectInternal()` in `Task.Run` for STA thread context:
1. Validates not disposed.
2. Returns early if already connected.
3. Sets state to `Connecting`.
4. `InitializeMxAccessConnection()` — creates new `LMXProxyServer` COM object, wires event handlers (OnDataChange, OnWriteComplete, OperationComplete).
5. `RegisterWithMxAccess()` — calls `_lmxProxy.Register("ZB.MOM.WW.LmxProxy.Host")`, stores the returned connection handle.
6. Sets state to `Connected`.
7. On error, calls `Cleanup()` and re-throws.
After successful connection, calls `RecreateStoredSubscriptionsAsync()` to restore any previously active subscriptions.
### 1.2 Disconnect
`DisconnectAsync()` wraps `DisconnectInternal()` in `Task.Run`:
1. Checks `IsConnected`.
2. Sets state to `Disconnecting`.
3. `RemoveAllSubscriptions()` — unsubscribes all tags from MXAccess but retains subscription state in `_storedSubscriptions` for reconnection.
4. `UnregisterFromMxAccess()` — calls `_lmxProxy.Unregister(_connectionHandle)`.
5. `Cleanup()` — removes event handlers, calls `Marshal.ReleaseComObject(_lmxProxy)` to force-release all COM references, nulls the proxy and resets the connection handle.
6. Sets state to `Disconnected`.
### 1.3 Connection State
- `IsConnected` property: `_lmxProxy != null && _connectionState == Connected && _connectionHandle > 0`.
- `ConnectionState` enum: Disconnected, Connecting, Connected, Disconnecting, Error, Reconnecting.
- `ConnectionStateChanged` event fires on all state transitions with previous state, current state, and optional message.
### 1.4 Auto-Reconnect
When `AutoReconnect` is enabled (default), the `MonitorConnectionAsync` loop runs continuously:
- Checks `IsConnected` every `MonitorIntervalSeconds` (default 5 seconds).
- On disconnect, attempts reconnect via semaphore-protected `ConnectAsync()`.
- On failure, logs warning and retries at the next interval.
- Reconnection restores stored subscriptions automatically.
## 2. Thread Safety & COM Constraints
- State mutations protected by `lock (_lock)`.
- COM operations wrapped in `Task.Run` for STA thread context (MXAccess is 32-bit COM).
- Concurrency control: `_readSemaphore` and `_writeSemaphore` limit concurrent MXAccess operations to `MaxConcurrentOperations` (default 10, configurable).
- Default max concurrency constant: `DefaultMaxConcurrency = 10`.
## 3. Read Operations
- `ReadAsync(address, ct)` — Applies Polly retry policy, calls `ReadSingleValueAsync()`, returns `Vtq`.
- `ReadBatchAsync(addresses, ct)` — Creates parallel tasks per address via `ReadAddressWithSemaphoreAsync()`. Each task acquires `_readSemaphore` before reading. Returns `IReadOnlyDictionary<address, Vtq>`.
## 4. Write Operations
- `WriteAsync(address, value, ct)` — Applies Polly retry policy, calls `WriteInternalAsync(address, value, ct)`.
- `WriteBatchAsync(values, ct)` — Parallel tasks via `WriteAddressWithSemaphoreAsync()`. Each task acquires `_writeSemaphore` before writing.
- `WriteBatchAndWaitAsync(values, flagAddress, flagValue, responseAddress, responseValue, ct)` — Writes batch, writes flag, polls response tag until match.
## 5. Subscription Management
- Subscriptions stored in `_storedSubscriptions` for reconnection persistence.
- `SubscribeInternalAsync(addresses, callback, storeSubscription)` — registers tags with MXAccess and stores subscription state.
- `RecreateStoredSubscriptionsAsync()` — called after reconnect to re-subscribe all previously active tags without re-storing.
- `RemoveAllSubscriptions()` — unsubscribes from MXAccess but retains `_storedSubscriptions`.
## 6. Event Handlers
- **OnDataChange** — Fired by MXAccess when a subscribed tag value changes. Routes the update to the SubscriptionManager.
- **OnWriteComplete** — Fired when an async write operation completes.
- **OperationComplete** — General operation completion callback.
## Dependencies
- **ArchestrA.MXAccess** COM interop assembly (`lib/ArchestrA.MXAccess.dll`).
- **Polly** — retry policies for read/write operations.
- **Configuration** — `ConnectionConfiguration` for timeouts, concurrency limits, and auto-reconnect settings.
## Interactions
- **GrpcServer** (ScadaGrpcService) delegates all SCADA operations to MxAccessClient via the `IScadaClient` interface.
- **SubscriptionManager** receives value change callbacks originating from MxAccessClient's COM event handlers.
- **HealthAndMetrics** queries `IsConnected` and `ConnectionState` for health checks.
- **ServiceHost** manages the MxAccessClient lifecycle (create at startup, dispose at shutdown).

View File

@@ -0,0 +1,301 @@
# Component: Protocol
## Purpose
Defines the gRPC protocol specification for communication between the LmxProxy Client and Host, including the proto file definition, code-first contracts, message schemas, value type system, and quality codes. The authoritative specification is `docs/lmxproxy_updates.md`.
## Location
- `src/ZB.MOM.WW.LmxProxy.Host/Grpc/Protos/scada.proto` — proto file (Host, proto-generated).
- `src/ZB.MOM.WW.LmxProxy.Client/Domain/ScadaContracts.cs` — code-first contracts (Client, protobuf-net.Grpc).
- `docs/lmxproxy_updates.md` — authoritative protocol specification.
- `docs/lmxproxy_protocol.md` — legacy v1 protocol documentation (superseded).
## Responsibilities
- Define the gRPC service interface (`scada.ScadaService`) and all message types.
- Ensure wire compatibility between the Host's proto-generated code and the Client's code-first contracts.
- Specify the VTQ data model: `TypedValue` for values, `QualityCode` for quality.
- Document OPC UA-aligned quality codes filtered to AVEVA System Platform usage.
## 1. Service Definition
Service: `scada.ScadaService` (gRPC package: `scada`)
| RPC | Request | Response | Type |
|-----|---------|----------|------|
| Connect | ConnectRequest | ConnectResponse | Unary |
| Disconnect | DisconnectRequest | DisconnectResponse | Unary |
| GetConnectionState | GetConnectionStateRequest | GetConnectionStateResponse | Unary |
| Read | ReadRequest | ReadResponse | Unary |
| ReadBatch | ReadBatchRequest | ReadBatchResponse | Unary |
| Write | WriteRequest | WriteResponse | Unary |
| WriteBatch | WriteBatchRequest | WriteBatchResponse | Unary |
| WriteBatchAndWait | WriteBatchAndWaitRequest | WriteBatchAndWaitResponse | Unary |
| Subscribe | SubscribeRequest | stream VtqMessage | Server streaming |
| CheckApiKey | CheckApiKeyRequest | CheckApiKeyResponse | Unary |
## 2. Value Type System (TypedValue)
Values are transmitted in their native protobuf types via a `TypedValue` oneof. No string serialization or parsing heuristics are used.
```
TypedValue {
oneof value {
bool bool_value = 1
int32 int32_value = 2
int64 int64_value = 3
float float_value = 4
double double_value = 5
string string_value = 6
bytes bytes_value = 7
int64 datetime_value = 8 // UTC DateTime.Ticks (100ns intervals since 0001-01-01)
ArrayValue array_value = 9 // typed arrays
}
}
```
`ArrayValue` contains typed repeated fields via oneof: `BoolArray`, `Int32Array`, `Int64Array`, `FloatArray`, `DoubleArray`, `StringArray`. Each contains a `repeated` field of the corresponding primitive.
### 2.1 Null Handling
- Null is represented by an unset `oneof` (no field selected in `TypedValue`).
- A null or missing VTQ message is treated as Bad quality with null value and current UTC timestamp.
### 2.2 Type Mapping from Internal Tag Model
| Tag Data Type | TypedValue Field |
|---------------|-----------------|
| `bool` | `bool_value` |
| `int32` | `int32_value` |
| `int64` | `int64_value` |
| `float` | `float_value` |
| `double` | `double_value` |
| `string` | `string_value` |
| `byte[]` | `bytes_value` |
| `DateTime` | `datetime_value` (UTC Ticks as int64) |
| `float[]` | `array_value.float_values` |
| `int32[]` | `array_value.int32_values` |
| Other arrays | Corresponding `ArrayValue` field |
## 3. Quality System (QualityCode)
Quality is a structured message with an OPC UA-compatible numeric status code and a human-readable symbolic name:
```
QualityCode {
uint32 status_code = 1 // OPC UA-compatible numeric status code
string symbolic_name = 2 // Human-readable name (e.g., "Good", "BadSensorFailure")
}
```
### 3.1 Category Extraction
Category derived from high bits via `(statusCode & 0xC0000000)`:
- `0x00000000` = Good
- `0x40000000` = Uncertain
- `0x80000000` = Bad
```csharp
public static bool IsGood(uint statusCode) => (statusCode & 0xC0000000) == 0x00000000;
public static bool IsBad(uint statusCode) => (statusCode & 0xC0000000) == 0x80000000;
```
### 3.2 Supported Quality Codes
Filtered to codes actively used by AVEVA System Platform, InTouch, and OI Server/DAServer (per AVEVA Tech Note TN1305):
**Good Quality:**
| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description |
|--------------|-------------------|------------------|-------------|
| `Good` | `0x00000000` | `0x00C0` | Value is reliable, non-specific |
| `GoodLocalOverride` | `0x00D80000` | `0x00D8` | Manually overridden; input disconnected |
**Uncertain Quality:**
| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description |
|--------------|-------------------|------------------|-------------|
| `UncertainLastUsableValue` | `0x40900000` | `0x0044` | External source stopped writing; value is stale |
| `UncertainSensorNotAccurate` | `0x42390000` | `0x0050` | Sensor out of calibration or clamped |
| `UncertainEngineeringUnitsExceeded` | `0x40540000` | `0x0054` | Outside defined engineering limits |
| `UncertainSubNormal` | `0x40580000` | `0x0058` | Derived from insufficient good sources |
**Bad Quality:**
| Symbolic Name | OPC UA Status Code | AVEVA OPC DA Hex | Description |
|--------------|-------------------|------------------|-------------|
| `Bad` | `0x80000000` | `0x0000` | Non-specific bad; value not useful |
| `BadConfigurationError` | `0x80040000` | `0x0004` | Server config problem (e.g., item deleted) |
| `BadNotConnected` | `0x808A0000` | `0x0008` | Input not logically connected to source |
| `BadDeviceFailure` | `0x806B0000` | `0x000C` | Device failure detected |
| `BadSensorFailure` | `0x806D0000` | `0x0010` | Sensor failure detected |
| `BadLastKnownValue` | `0x80050000` | `0x0014` | Comm failed; last known value available |
| `BadCommunicationFailure` | `0x80050000` | `0x0018` | Comm failed; no last known value |
| `BadOutOfService` | `0x808F0000` | `0x001C` | Block off-scan/locked; item inactive |
| `BadWaitingForInitialData` | `0x80320000` | — | Initializing; OI Server establishing communication |
**Notes:**
- AVEVA OPC DA quality codes use a 16-bit structure: 2 bits major (Good/Bad/Uncertain), 4 bits minor (sub-status), 2 bits limit (Not Limited, Low, High, Constant). The OPC UA status codes above are the standard UA equivalents.
- The limit bits are appended to any quality code. For example, `Good + High Limited` = `0x00C2` in OPC DA. In OPC UA, limits are conveyed via separate status code bits but the base code remains the same.
### 3.3 Error Condition Mapping
| Scenario | Quality |
|----------|---------|
| Normal read | `Good` (`0x00000000`) |
| Tag not found | `BadConfigurationError` (`0x80040000`) |
| Tag read exception / comms loss | `BadCommunicationFailure` (`0x80050000`) |
| Sensor failure | `BadSensorFailure` (`0x806D0000`) |
| Device failure | `BadDeviceFailure` (`0x806B0000`) |
| Stale value | `UncertainLastUsableValue` (`0x40900000`) |
| Block off-scan / disabled | `BadOutOfService` (`0x808F0000`) |
| Local override active | `GoodLocalOverride` (`0x00D80000`) |
| Initializing / waiting for first value | `BadWaitingForInitialData` (`0x80320000`) |
| Write to read-only tag | `WriteResult.success=false`, message indicates read-only |
| Type mismatch on write | `WriteResult.success=false`, message indicates type mismatch |
## 4. Message Schemas
### 4.1 VtqMessage
The core data type for tag value transport:
| Field | Proto Type | Order | Description |
|-------|-----------|-------|-------------|
| tag | string | 1 | Tag address |
| value | TypedValue | 2 | Typed value (native protobuf types) |
| timestamp_utc_ticks | int64 | 3 | UTC DateTime.Ticks (100ns intervals since 0001-01-01) |
| quality | QualityCode | 4 | Structured quality with status code and symbolic name |
A null or missing VTQ message is treated as Bad quality with null value and current UTC timestamp.
### 4.2 Connection Messages
**ConnectRequest**: `client_id` (string), `api_key` (string)
**ConnectResponse**: `success` (bool), `message` (string), `session_id` (string — 32-char hex GUID)
**DisconnectRequest**: `session_id` (string)
**DisconnectResponse**: `success` (bool), `message` (string)
**GetConnectionStateRequest**: `session_id` (string)
**GetConnectionStateResponse**: `is_connected` (bool), `client_id` (string), `connected_since_utc_ticks` (int64)
### 4.3 Read Messages
**ReadRequest**: `session_id` (string), `tag` (string)
**ReadResponse**: `success` (bool), `message` (string), `vtq` (VtqMessage)
**ReadBatchRequest**: `session_id` (string), `tags` (repeated string)
**ReadBatchResponse**: `success` (bool), `message` (string), `vtqs` (repeated VtqMessage)
### 4.4 Write Messages
**WriteRequest**: `session_id` (string), `tag` (string), `value` (TypedValue)
**WriteResponse**: `success` (bool), `message` (string)
**WriteItem**: `tag` (string), `value` (TypedValue)
**WriteResult**: `tag` (string), `success` (bool), `message` (string)
**WriteBatchRequest**: `session_id` (string), `items` (repeated WriteItem)
**WriteBatchResponse**: `success` (bool), `message` (string), `results` (repeated WriteResult)
### 4.5 WriteBatchAndWait Messages
**WriteBatchAndWaitRequest**:
- `session_id` (string)
- `items` (repeated WriteItem) — values to write
- `flag_tag` (string) — tag to poll after writes
- `flag_value` (TypedValue) — expected value (type-aware comparison)
- `timeout_ms` (int32) — max wait time (default 5000ms if ≤ 0)
- `poll_interval_ms` (int32) — polling interval (default 100ms if ≤ 0)
**WriteBatchAndWaitResponse**:
- `success` (bool)
- `message` (string)
- `write_results` (repeated WriteResult)
- `flag_reached` (bool) — whether the flag value was matched
- `elapsed_ms` (int32) — total elapsed time
**Behavior:**
1. All writes execute first. If any write fails, returns immediately with `success=false`.
2. If writes succeed, polls `flag_tag` at `poll_interval_ms` intervals.
3. Uses type-aware `TypedValueEquals()` comparison (see Section 4.5.1).
4. If flag matches before timeout: `success=true`, `flag_reached=true`.
5. If timeout expires: `success=true`, `flag_reached=false` (timeout is not an error).
#### 4.5.1 Flag Comparison Rules
Type-aware comparison via `TypedValueEquals()`:
- Both values must have the same `oneof` case (same type). Mismatched types are never equal.
- Numeric comparison uses the native type's equality (no floating-point string round-trip issues).
- String comparison is case-sensitive.
- Bool comparison is direct equality.
- Null (unset `oneof`) equals null. Null does not equal any set value.
- Array comparison: element-by-element equality, same length required.
- `datetime_value` compared as `int64` equality (tick-level precision).
### 4.6 Subscription Messages
**SubscribeRequest**: `session_id` (string), `tags` (repeated string), `sampling_ms` (int32)
Response: streamed `VtqMessage` items.
### 4.7 API Key Messages
**CheckApiKeyRequest**: `api_key` (string)
**CheckApiKeyResponse**: `is_valid` (bool), `message` (string)
## 5. Dual gRPC Stack Compatibility
The Host and Client use different gRPC implementations:
| Aspect | Host | Client |
|--------|------|--------|
| Stack | Grpc.Core (C-core) | Grpc.Net.Client |
| Contract | Proto file (`scada.proto`) + Grpc.Tools codegen | Code-first (`[ServiceContract]`, `[DataContract]`) via protobuf-net.Grpc |
| Runtime | .NET Framework 4.8 | .NET 10 |
Both target `scada.ScadaService` and produce identical wire format. Field ordering in `[DataMember(Order = N)]` matches proto field numbers.
## 6. V1 Legacy Protocol
The current codebase implements the v1 protocol. The following describes v1 behavior that will be replaced during migration to v2.
### 6.1 V1 Value Encoding
All values transmitted as strings:
- Write direction: server parses string values in order: bool → int → long → double → DateTime → raw string.
- Read direction: server serializes via `.ToString()` (bool → lowercase, DateTime → ISO-8601, arrays → JSON).
- Client parses: double → bool → null (empty string) → raw string.
### 6.2 V1 Quality
Three-state string quality (`"Good"`, `"Uncertain"`, `"Bad"`, case-insensitive). OPC UA numeric ranges: ≥192 = Good, 64191 = Uncertain, <64 = Bad.
### 6.3 V1 → V2 Field Changes
| Message | Field | V1 Type | V2 Type |
|---------|-------|---------|---------|
| VtqMessage | value | string | TypedValue |
| VtqMessage | quality | string | QualityCode |
| WriteRequest | value | string | TypedValue |
| WriteItem | value | string | TypedValue |
| WriteBatchAndWaitRequest | flag_value | string | TypedValue |
All RPC signatures remain unchanged. Only value and quality fields change type.
### 6.4 Migration Strategy
Clean break — no backward compatibility layer. All clients and servers updated simultaneously. This is appropriate because LmxProxy is an internal protocol with a small, controlled client count. Dual-format support adds complexity with no long-term benefit.
## Dependencies
- **Grpc.Core** + **Grpc.Tools** — proto compilation and server hosting (Host).
- **protobuf-net.Grpc** — code-first contracts (Client).
- **Grpc.Net.Client** — HTTP/2 transport (Client).
## Interactions
- **GrpcServer** implements the service defined by this protocol.
- **Client** consumes the service defined by this protocol.
- **MxAccessClient** is the backend that executes the operations requested via the protocol.

View File

@@ -0,0 +1,119 @@
# Component: Security
## Purpose
Provides API key-based authentication and role-based authorization for the gRPC service, along with TLS certificate management for transport security.
## Location
- `src/ZB.MOM.WW.LmxProxy.Host/Security/ApiKeyService.cs` — API key storage and validation.
- `src/ZB.MOM.WW.LmxProxy.Host/Security/ApiKeyInterceptor.cs` — gRPC server interceptor for authentication/authorization.
- `src/ZB.MOM.WW.LmxProxy.Client/Security/GrpcChannelFactory.cs` — Client-side TLS channel factory.
## Responsibilities
- Load and hot-reload API keys from a JSON configuration file.
- Validate API keys on every gRPC request via a server interceptor.
- Enforce role-based access control (ReadOnly vs ReadWrite).
- Manage TLS certificates for server and optional mutual TLS.
## 1. API Key Service
### 1.1 Key Storage
- Keys are stored in a JSON file (default `apikeys.json`).
- File format: `{ "ApiKeys": [{ "Key": "...", "Description": "...", "Role": "ReadOnly|ReadWrite", "Enabled": true|false }] }`.
- If the file does not exist at startup, the service auto-generates a default file with two random keys: one ReadOnly and one ReadWrite.
### 1.2 Hot Reload
- A `FileSystemWatcher` monitors the API key file for changes.
- Rapid changes are debounced (1-second minimum between reloads).
- `ReloadConfigurationAsync` uses a `SemaphoreSlim` to serialize reload operations.
- New and modified keys take effect on the next request. Removed or disabled keys reject future requests immediately.
- Active sessions are not affected by key changes — sessions are tracked independently by SessionManager.
### 1.3 Validation
- `ValidateApiKey(apiKey)` — Returns the `ApiKey` object if the key exists and `Enabled` is true, otherwise null.
- `HasRole(apiKey, requiredRole)` — Returns true if the key has the required role. Role hierarchy: ReadWrite implies ReadOnly.
## 2. API Key Interceptor
### 2.1 Authentication Flow
The `ApiKeyInterceptor` intercepts every unary and server-streaming RPC:
1. Extracts the `x-api-key` header from gRPC request metadata.
2. Calls `ApiKeyService.ValidateApiKey()`.
3. If the key is invalid or missing, returns `StatusCode.Unauthenticated`.
4. For write-protected methods (`Write`, `WriteBatch`, `WriteBatchAndWait`), checks that the key has the `ReadWrite` role. Returns `StatusCode.PermissionDenied` if the key is `ReadOnly`.
5. Adds the validated `ApiKey` to `context.UserState["ApiKey"]` for downstream use.
6. Continues to the service method.
### 2.2 Write-Protected Methods
These RPCs require the `ReadWrite` role:
- `Write`
- `WriteBatch`
- `WriteBatchAndWait`
All other RPCs (`Connect`, `Disconnect`, `GetConnectionState`, `Read`, `ReadBatch`, `Subscribe`, `CheckApiKey`) are allowed for `ReadOnly` keys.
## 3. API Key Model
| Field | Type | Description |
|-------|------|-------------|
| Key | string | The secret API key value |
| Description | string | Human-readable name for the key |
| Role | ApiKeyRole | `ReadOnly` or `ReadWrite` |
| Enabled | bool | Whether the key is active |
`ApiKeyRole` enum: `ReadOnly` (read and subscribe only), `ReadWrite` (full access including writes).
## 4. TLS Configuration
### 4.1 Server-Side (Host)
Configured via `TlsConfiguration` in `appsettings.json`:
| Setting | Default | Description |
|---------|---------|-------------|
| Enabled | false | Enable TLS on the gRPC server |
| ServerCertificatePath | `certs/server.crt` | PEM server certificate |
| ServerKeyPath | `certs/server.key` | PEM server private key |
| ClientCaCertificatePath | `certs/ca.crt` | CA certificate for mTLS client validation |
| RequireClientCertificate | false | Require client certificates (mutual TLS) |
| CheckCertificateRevocation | false | Check certificate revocation lists |
If TLS is enabled but certificates are missing, the service generates self-signed certificates at startup.
### 4.2 Client-Side
`ClientTlsConfiguration` in the client library:
| Setting | Default | Description |
|---------|---------|-------------|
| UseTls | false | Enable TLS on the client connection |
| ClientCertificatePath | null | Client certificate for mTLS |
| ClientKeyPath | null | Client private key for mTLS |
| ServerCaCertificatePath | null | Custom CA for server validation |
| ServerNameOverride | null | SNI/hostname override |
| ValidateServerCertificate | true | Validate the server certificate chain |
| AllowSelfSignedCertificates | false | Accept self-signed server certificates |
| IgnoreAllCertificateErrors | false | Skip all certificate validation (dangerous) |
- SSL protocols: TLS 1.2 and TLS 1.3.
- Client certificates loaded from PEM files and converted to PKCS12.
- Custom CA trust store support via chain building.
## Dependencies
- **Configuration** — TLS settings and API key file path from `appsettings.json`.
- **System.IO.FileSystemWatcher** — API key file change detection.
## Interactions
- **GrpcServer** — the ApiKeyInterceptor runs before every RPC in ScadaGrpcService.
- **ServiceHost** — creates ApiKeyService and ApiKeyInterceptor at startup, configures gRPC server credentials.
- **Client** — GrpcChannelFactory creates TLS-configured gRPC channels in LmxProxyClient.

View File

@@ -0,0 +1,108 @@
# Component: ServiceHost
## Purpose
The entry point and lifecycle manager for the LmxProxy Windows service. Handles Topshelf service hosting, Serilog logging setup, component initialization/teardown ordering, and Windows SCM service recovery configuration.
## Location
- `src/ZB.MOM.WW.LmxProxy.Host/Program.cs` — entry point, Serilog setup, Topshelf configuration.
- `src/ZB.MOM.WW.LmxProxy.Host/LmxProxyService.cs` — service lifecycle (Start, Stop, Pause, Continue, Shutdown).
## Responsibilities
- Configure and launch the Topshelf Windows service.
- Load and validate configuration from `appsettings.json`.
- Initialize Serilog logging.
- Orchestrate service startup: create all components in dependency order, connect to MxAccess, start servers.
- Orchestrate service shutdown: stop servers, dispose all components in reverse order.
- Configure Windows SCM service recovery policies.
## 1. Entry Point (Program.cs)
1. Builds configuration from `appsettings.json` + environment variables via `ConfigurationBuilder`.
2. Configures Serilog from the `Serilog` section of appsettings (console + file sinks).
3. Validates configuration using `ConfigurationValidator.ValidateAndLog()`.
4. Configures Topshelf `HostFactory`:
- Service name: `ZB.MOM.WW.LmxProxy.Host`
- Display name: `SCADA Bridge LMX Proxy`
- Start automatically on boot.
- Service recovery: first failure 1 min, second 5 min, subsequent 10 min, reset period 1 day.
5. Runs the Topshelf host (blocks until service stops).
## 2. Service Lifecycle (LmxProxyService)
### 2.1 Startup Sequence (Start)
Components are created and started in dependency order:
1. Validate configuration.
2. Check/generate TLS certificates (if TLS enabled).
3. Create `PerformanceMetrics`.
4. Create `ApiKeyService` — loads API keys from file.
5. Create `MxAccessClient` via factory.
6. Subscribe to connection state changes.
7. Connect to MxAccess synchronously — times out at `ConnectionTimeoutSeconds` (default 30s).
8. Start `MonitorConnectionAsync` (if `AutoReconnect` enabled).
9. Create `SubscriptionManager`.
10. Create `SessionManager`.
11. Create `HealthCheckService` + `DetailedHealthCheckService`.
12. Create `StatusReportService` + `StatusWebServer`.
13. Create `ScadaGrpcService`.
14. Create `ApiKeyInterceptor`.
15. Configure gRPC `Server` with TLS or insecure credentials.
16. Start gRPC server on `0.0.0.0:{GrpcPort}`.
17. Start `StatusWebServer`.
### 2.2 Shutdown Sequence (Stop)
Components are stopped and disposed in reverse order:
1. Cancel reconnect monitor — wait **5 seconds** for exit.
2. Graceful gRPC server shutdown — **10-second** timeout, then kill.
3. Stop StatusWebServer — **5-second** wait.
4. Dispose all components in reverse creation order.
5. Disconnect from MxAccess — **10-second** timeout.
### 2.3 Other Lifecycle Events
- **Pause**: Supported by Topshelf but behavior is a no-op beyond logging.
- **Continue**: Resume from pause, no-op beyond logging.
- **Shutdown**: System shutdown signal, triggers the same shutdown sequence as Stop.
## 3. Service Recovery (Windows SCM)
Configured via Topshelf's `EnableServiceRecovery`:
| Failure | Action | Delay |
|---------|--------|-------|
| First | Restart service | 1 minute |
| Second | Restart service | 5 minutes |
| Subsequent | Restart service | 10 minutes |
| Reset period | — | 1 day |
All values are configurable via `ServiceRecoveryConfiguration`.
## 4. Service Identity
| Property | Value |
|----------|-------|
| Service name | `ZB.MOM.WW.LmxProxy.Host` |
| Display name | `SCADA Bridge LMX Proxy` |
| Start mode | Automatic |
| Platform | x86 (.NET Framework 4.8) |
| Framework | Topshelf |
## Dependencies
- **Topshelf** — Windows service framework.
- **Serilog** — structured logging (console + file sinks).
- **Microsoft.Extensions.Configuration** — configuration loading.
- **Configuration** — validated configuration objects.
- All other components are created and managed by LmxProxyService.
## Interactions
- **Configuration** is loaded and validated first; all other components receive their settings from it.
- **MxAccessClient** is connected synchronously during startup. If connection fails within the timeout, the service fails to start.
- **GrpcServer** and **StatusWebServer** are started last, after all dependencies are ready.

View File

@@ -0,0 +1,76 @@
# Component: SessionManager
## Purpose
Tracks active client sessions, mapping session IDs to client metadata. Provides session creation, validation, and termination for the gRPC service layer.
## Location
`src/ZB.MOM.WW.LmxProxy.Host/Sessions/SessionManager.cs`
## Responsibilities
- Create new sessions with unique identifiers when clients connect.
- Validate session IDs on every data operation.
- Track session metadata (client ID, API key, connection time, last activity).
- Terminate sessions on client disconnect.
- Provide session listing for monitoring and status reporting.
## 1. Session Storage
- Sessions are stored in a `ConcurrentDictionary<string, SessionInfo>` (lock-free, thread-safe).
- Session state is in-memory only — all sessions are lost on service restart.
- `ActiveSessionCount` property returns the current count of tracked sessions.
## 2. Session Lifecycle
### 2.1 Creation
`CreateSession(clientId, apiKey)`:
- Generates a unique session ID: `Guid.NewGuid().ToString("N")` (32-character lowercase hex string, no hyphens).
- Creates a `SessionInfo` record with `ConnectedAt` and `LastActivity` set to `DateTime.UtcNow`.
- Stores the session in the dictionary.
- Returns the session ID to the client.
### 2.2 Validation
`ValidateSession(sessionId)`:
- Looks up the session ID in the dictionary.
- If found, updates `LastActivity` to `DateTime.UtcNow` and returns `true`.
- If not found, returns `false`.
### 2.3 Termination
`TerminateSession(sessionId)`:
- Removes the session from the dictionary.
- Returns `true` if the session existed, `false` otherwise.
### 2.4 Query
- `GetSession(sessionId)` — Returns `SessionInfo` or `null` if not found.
- `GetAllSessions()` — Returns `IReadOnlyList<SessionInfo>` snapshot of all active sessions.
## 3. SessionInfo
| Field | Type | Description |
|-------|------|-------------|
| SessionId | string | 32-character hex GUID |
| ClientId | string | Client-provided identifier |
| ApiKey | string | API key used for authentication |
| ConnectedAt | DateTime | UTC time of session creation |
| LastActivity | DateTime | UTC time of last operation (updated on each validation) |
| ConnectedSinceUtcTicks | long | `ConnectedAt.Ticks` for gRPC response serialization |
## 4. Disposal
`Dispose()` clears all sessions from the dictionary. No notifications are sent to connected clients.
## Dependencies
None. SessionManager is a standalone in-memory store with no external dependencies.
## Interactions
- **GrpcServer** calls `CreateSession` on Connect, `ValidateSession` on every data operation, and `TerminateSession` on Disconnect.
- **HealthAndMetrics** reads `ActiveSessionCount` for health check data.
- **StatusReportService** reads session information for the status dashboard.

View File

@@ -0,0 +1,116 @@
# Component: SubscriptionManager
## Purpose
Manages the lifecycle of tag value subscriptions, multiplexing multiple client subscriptions onto shared MXAccess tag subscriptions and delivering updates via per-client bounded channels with configurable backpressure.
## Location
`src/ZB.MOM.WW.LmxProxy.Host/Subscriptions/SubscriptionManager.cs`
## Responsibilities
- Create per-client subscription channels with bounded capacity.
- Share underlying MXAccess tag subscriptions across multiple clients subscribing to the same tags.
- Deliver tag value updates from MXAccess callbacks to all subscribed clients.
- Handle backpressure when client channels are full (DropOldest, DropNewest, or Wait).
- Clean up subscriptions on client disconnect.
- Notify all subscribed clients with bad quality when MXAccess disconnects.
## 1. Architecture
### 1.1 Per-Client Channels
Each subscribing client gets a bounded `System.Threading.Channel<(string address, Vtq vtq)>`:
- Capacity: configurable (default 1000 messages).
- Full mode: configurable (default `DropOldest`).
- `SingleReader = true`, `SingleWriter = false`.
### 1.2 Shared Tag Subscriptions
Tag subscriptions to MXAccess are shared across clients:
- When the first client subscribes to a tag, a new MXAccess subscription is created.
- When additional clients subscribe to the same tag, they are added to the existing tag subscription's client set.
- When the last client unsubscribes from a tag, the MXAccess subscription is disposed.
### 1.3 Thread Safety
- `ReaderWriterLockSlim` protects tag subscription updates.
- `ConcurrentDictionary` for client subscription tracking.
## 2. Subscription Flow
### 2.1 Subscribe
`SubscribeAsync(clientId, addresses, ct)`:
1. Creates a bounded channel with configured capacity and full mode.
2. Creates a `ClientSubscription` record (clientId, channel, address set, CancellationTokenSource, counters).
3. For each tag address:
- If the tag already has a subscription, adds the client to the existing `TagSubscription.clientIds` set.
- Otherwise, creates a new `TagSubscription` and calls `_scadaClient.SubscribeAsync()` to register with MXAccess (outside the lock to avoid blocking).
4. Registers a cancellation token callback to automatically call `UnsubscribeClient` on disconnect.
5. Returns the channel reader for the GrpcServer to stream from.
### 2.2 Value Updates
`OnTagValueChanged(address, Vtq)` — called from MxAccessClient's COM event handler:
1. Looks up the tag subscription to find all subscribed clients.
2. For each client, calls `channel.Writer.TryWrite((address, vtq))`.
3. If the channel is full:
- **DropOldest**: Logs a warning, increments `DroppedMessageCount`. The oldest message is automatically discarded by the channel.
- **DropNewest**: Drops the incoming message.
- **Wait**: Blocks the writer until space is available (not recommended for gRPC streaming).
4. On channel closed (client disconnected), schedules `UnsubscribeClient` cleanup.
### 2.3 Unsubscribe
`UnsubscribeClient(clientId)`:
1. Removes the client from the client dictionary.
2. For each tag the client was subscribed to, removes the client from the tag's subscriber set.
3. If a tag has no remaining subscribers, disposes the MXAccess subscription handle.
4. Completes the client's channel writer (signals end of stream).
## 3. Backpressure
| Mode | Behavior | Use Case |
|------|----------|----------|
| DropOldest | Silently discards oldest message when channel is full | Default. Fire-and-forget semantics. No client blocking. |
| DropNewest | Drops the incoming message when channel is full | Preserves history, drops latest updates. |
| Wait | Blocks the writer until space is available | Not recommended for gRPC streaming (blocks callback thread). |
Per-client statistics track `DeliveredMessageCount` and `DroppedMessageCount` for monitoring via the status dashboard.
## 4. Disconnection Handling
### 4.1 Client Disconnect
When a client's gRPC stream ends (cancellation or error), the cancellation token callback triggers `UnsubscribeClient`, which cleans up all tag subscriptions for that client.
### 4.2 MxAccess Disconnect
`OnConnectionStateChanged` — when the MxAccess connection drops:
- Sends a bad-quality Vtq to all subscribed clients via their channels.
- Each client receives an async notification of the connection loss.
- Tag subscriptions are retained in memory for reconnection (via MxAccessClient's `_storedSubscriptions`).
## 5. Statistics
`GetSubscriptionStats()` returns:
- `TotalClients` — number of active client subscriptions.
- `TotalTags` — number of unique tags with active MXAccess subscriptions.
- `ActiveSubscriptions` — total client-tag subscription count.
## Dependencies
- **MxAccessClient** (IScadaClient) — creates and disposes MXAccess tag subscriptions.
- **Configuration** — `SubscriptionConfiguration` for channel capacity and full mode.
## Interactions
- **GrpcServer** calls `SubscribeAsync` on Subscribe RPC and reads from the returned channel.
- **MxAccessClient** delivers value updates via the `OnTagValueChanged` callback.
- **HealthAndMetrics** reads subscription statistics for health checks and status reports.
- **ServiceHost** disposes the SubscriptionManager at shutdown.

View File

@@ -0,0 +1,274 @@
# LmxProxy - High Level Requirements
## 1. System Purpose
LmxProxy is a gRPC proxy service that bridges SCADA clients to AVEVA System Platform (Wonderware) via the ArchestrA MXAccess COM API. It exists because MXAccess is a 32-bit COM component that requires co-location with System Platform on a Windows machine running .NET Framework 4.8. LmxProxy isolates this constraint behind a gRPC interface, allowing modern .NET clients to access System Platform data remotely over HTTP/2.
## 2. Architecture
### 2.1 Two-Project Structure
- **ZB.MOM.WW.LmxProxy.Host** — .NET Framework 4.8, x86-only Windows service. Hosts a gRPC server (Grpc.Core) fronting the MXAccess COM API. Runs on the same machine as AVEVA System Platform.
- **ZB.MOM.WW.LmxProxy.Client** — .NET 10, AnyCPU class library. Code-first gRPC client (protobuf-net.Grpc) consumed by ScadaLink's Data Connection Layer. Packaged as a NuGet library.
### 2.2 Dual gRPC Stacks
The two projects use different gRPC implementations that are wire-compatible:
- **Host**: Proto-file-generated code via `Grpc.Core` + `Grpc.Tools`. Uses the deprecated C-core gRPC library because .NET Framework 4.8 does not support `Grpc.Net.Server`.
- **Client**: Code-first contracts via `protobuf-net.Grpc` with `[DataContract]`/`[ServiceContract]` attributes over `Grpc.Net.Client`.
Both target the same `scada.ScadaService` gRPC service definition and are wire-compatible.
### 2.3 Deployment Model
- The Host service runs on the AVEVA System Platform machine (or any machine with MXAccess access).
- Clients connect remotely over gRPC (HTTP/2) on a configurable port (default 50051).
- The Host runs as a Windows service managed by Topshelf.
## 3. Communication Protocol
### 3.1 Transport
- gRPC over HTTP/2.
- Default server port: 50051.
- Optional TLS with mutual TLS (mTLS) support.
### 3.2 RPCs
The service exposes 10 RPCs:
| RPC | Type | Description |
|-----|------|-------------|
| Connect | Unary | Establish session, returns session ID |
| Disconnect | Unary | Terminate session |
| GetConnectionState | Unary | Query MxAccess connection status |
| Read | Unary | Read single tag value |
| ReadBatch | Unary | Read multiple tag values |
| Write | Unary | Write single tag value |
| WriteBatch | Unary | Write multiple tag values |
| WriteBatchAndWait | Unary | Write values, poll flag tag until match or timeout |
| Subscribe | Server streaming | Stream tag value updates to client |
| CheckApiKey | Unary | Validate API key and return role |
### 3.3 Data Model (VTQ)
All tag values are represented as VTQ (Value, Timestamp, Quality) tuples:
- **Value**: `TypedValue` — a protobuf `oneof` carrying the value in its native type (bool, int32, int64, float, double, string, bytes, datetime, typed arrays). An unset `oneof` represents null.
- **Timestamp**: UTC `DateTime.Ticks` as `int64` (100-nanosecond intervals since 0001-01-01 00:00:00 UTC).
- **Quality**: `QualityCode` — a structured message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Category derived from high bits: `0x00xxxxxx` = Good, `0x40xxxxxx` = Uncertain, `0x80xxxxxx` = Bad.
## 4. Session Lifecycle
- Clients call `Connect` with a client ID and optional API key to establish a session.
- The server returns a 32-character hex GUID as the session ID.
- All subsequent operations require the session ID for validation.
- Sessions persist until explicit `Disconnect` or server restart. There is no idle timeout.
- Session state is tracked in memory (not persisted). All sessions are lost on service restart.
## 5. Authentication & Authorization
### 5.1 API Key Authentication
- API keys are validated via the `x-api-key` gRPC metadata header.
- Keys are stored in a JSON file (`apikeys.json` by default) with hot-reload via FileSystemWatcher (1-second debounce).
- If no API key file exists, the service auto-generates a default file with two random keys (one ReadOnly, one ReadWrite).
- Authentication is enforced at the gRPC interceptor level before any service method executes.
### 5.2 Role-Based Authorization
Two roles with hierarchical permissions:
| Role | Read | Subscribe | Write |
|------|------|-----------|-------|
| ReadOnly | Yes | Yes | No |
| ReadWrite | Yes | Yes | Yes |
Write-protected methods: `Write`, `WriteBatch`, `WriteBatchAndWait`. A ReadOnly key attempting a write receives `StatusCode.PermissionDenied`.
### 5.3 TLS/Security
- TLS is optional (disabled by default in configuration, though `Tls.Enabled` defaults to `true` in the config class).
- Supports server TLS and mutual TLS (client certificate validation).
- Client CA certificate path configurable for mTLS.
- Certificate revocation checking is optional.
- Client library supports TLS 1.2 and TLS 1.3, custom CA trust stores, self-signed certificate allowance, and server name override.
## 6. Operations
### 6.1 Read
- Single tag read with configurable retry policy.
- Batch read with semaphore-controlled concurrency (default max 10 concurrent operations).
- Read timeout: 5 seconds (configurable).
### 6.2 Write
- Single tag write with retry policy. Values are sent as `TypedValue` (native protobuf types). Type mismatches between the value and the tag's expected type return a write failure.
- Batch write with semaphore-controlled concurrency.
- Write timeout: 5 seconds (configurable).
- WriteBatchAndWait: writes a batch, then polls the flag tag at a configurable interval until its value matches the expected flag value (type-aware comparison via `TypedValueEquals`) or a timeout expires. Default timeout: 5000ms, default poll interval: 100ms. Timeout is not an error — returns `flag_reached=false`.
### 6.3 Subscribe
- Server-streaming RPC. Client sends a list of tags and a sampling interval (in milliseconds).
- Server maintains a per-client bounded channel (default capacity 1000 messages).
- Updates are pushed as `VtqMessage` items on the stream.
- When the MxAccess connection drops, all subscribed clients receive a bad-quality notification.
- Subscriptions are cleaned up on client disconnect. When the last client unsubscribes from a tag, the underlying MxAccess subscription is disposed.
## 7. Connection Resilience
### 7.1 Host Auto-Reconnect
- If the MxAccess connection is lost, the Host automatically attempts reconnection at a fixed interval (default 5 seconds).
- Stored subscriptions are recreated after a successful reconnect.
- Auto-reconnect is configurable (`Connection.AutoReconnect`, default true).
### 7.2 Client Keep-Alive
- The client sends a lightweight `GetConnectionState` ping every 30 seconds.
- On keep-alive failure, the client marks the connection as disconnected and cleans up subscriptions.
### 7.3 Client Retry Policy
- Polly-based exponential backoff retry.
- Default: 3 attempts with 1-second initial delay (1s → 2s → 4s).
- Transient errors retried: Unavailable, DeadlineExceeded, ResourceExhausted, Aborted.
## 8. Health Monitoring & Metrics
### 8.1 Health Checks
Two health check implementations:
- **Basic** (`HealthCheckService`): Checks MxAccess connection state, subscription stats, and operation success rate. Returns Degraded if success rate < 50% (with > 100 operations) or client count > 100.
- **Detailed** (`DetailedHealthCheckService`): Reads a test tag (`System.Heartbeat`). Returns Unhealthy if not connected, Degraded if test tag quality is not Good or timestamp is older than 5 minutes.
### 8.2 Performance Metrics
- Per-operation tracking: Read, ReadBatch, Write, WriteBatch.
- Metrics: total count, success count, success rate, average/min/max latency, 95th percentile latency.
- Rolling buffer of 1000 latency samples per operation for percentile calculation.
- Metrics reported to logs every 60 seconds.
### 8.3 Status Web Server
- HTTP status server on port 8080 (configurable).
- Endpoints:
- `GET /` — HTML dashboard with auto-refresh (30 seconds), color-coded status cards, operations table.
- `GET /api/status` — JSON status report.
- `GET /api/health` — Plain text `OK` (200) or `UNHEALTHY` (503).
### 8.4 Client Metrics
- Per-operation counts, error counts, and latency tracking (average, p95, p99).
- Rolling buffer of 1000 latency samples.
- Exposed via `ILmxProxyClient.GetMetrics()`.
## 9. Service Hosting
### 9.1 Topshelf Windows Service
- Service name: `ZB.MOM.WW.LmxProxy.Host`
- Display name: `SCADA Bridge LMX Proxy`
- Starts automatically on boot.
### 9.2 Service Recovery (Windows SCM)
| Failure | Restart Delay |
|---------|--------------|
| First | 1 minute |
| Second | 5 minutes |
| Subsequent | 10 minutes |
| Reset period | 1 day |
### 9.3 Startup Sequence
1. Load configuration from `appsettings.json` + environment variables.
2. Configure Serilog (console + file sinks).
3. Validate configuration.
4. Check/generate TLS certificates (if TLS enabled).
5. Initialize services: PerformanceMetrics, ApiKeyService, MxAccessClient, SubscriptionManager, SessionManager, HealthCheckService, StatusReportService.
6. Connect to MxAccess synchronously (timeout: 30 seconds).
7. Start auto-reconnect monitor loop (if enabled).
8. Start gRPC server on configured port.
9. Start HTTP status web server.
### 9.4 Shutdown Sequence
1. Cancel reconnect monitor (5-second wait).
2. Graceful gRPC server shutdown (10-second timeout, then kill).
3. Stop status web server (5-second wait).
4. Dispose all components in reverse order.
5. Disconnect from MxAccess (10-second timeout).
## 10. Configuration
All configuration is via `appsettings.json` bound to `LmxProxyConfiguration`. Key settings:
| Section | Setting | Default |
|---------|---------|---------|
| Root | GrpcPort | 50051 |
| Root | ApiKeyConfigFile | `apikeys.json` |
| Connection | MonitorIntervalSeconds | 5 |
| Connection | ConnectionTimeoutSeconds | 30 |
| Connection | ReadTimeoutSeconds | 5 |
| Connection | WriteTimeoutSeconds | 5 |
| Connection | MaxConcurrentOperations | 10 |
| Connection | AutoReconnect | true |
| Subscription | ChannelCapacity | 1000 |
| Subscription | ChannelFullMode | DropOldest |
| Tls | Enabled | false |
| Tls | RequireClientCertificate | false |
| WebServer | Enabled | true |
| WebServer | Port | 8080 |
Configuration is validated at startup. Invalid values cause the service to fail to start.
## 11. Logging
- Serilog with console and file sinks.
- File sink: `logs/lmxproxy-.txt`, daily rolling, 30 files retained.
- Default level: Information. Overrides: Microsoft=Warning, System=Warning, Grpc=Information.
- Enrichment: FromLogContext, WithMachineName, WithThreadId.
## 12. Constraints
- Host **must** target x86 and .NET Framework 4.8 (MXAccess is 32-bit COM).
- Host uses `Grpc.Core` (deprecated C-core library), required because .NET 4.8 does not support `Grpc.Net.Server`.
- Client targets .NET 10 and runs in ScadaLink central/site clusters.
- MxAccess COM operations require STA thread context (wrapped in `Task.Run`).
- The solution file uses `.slnx` format.
## 13. Protocol
The protocol specification is defined in `lmxproxy_updates.md`, which is the authoritative source of truth. All RPC signatures, message schemas, and behavioral specifications are per that document.
### 13.1 Value System (TypedValue)
Values are transmitted in their native protobuf types via a `TypedValue` oneof: bool, int32, int64, float, double, string, bytes, datetime (int64 UTC Ticks), and typed arrays. An unset oneof represents null. No string serialization or parsing heuristics are used.
### 13.2 Quality System (QualityCode)
Quality is a structured `QualityCode` message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Supports AVEVA-aligned quality sub-codes (e.g., `BadSensorFailure` = `0x806D0000`, `GoodLocalOverride` = `0x00D80000`, `BadWaitingForInitialData` = `0x80320000`). See Component-Protocol for the full quality code table.
### 13.3 Migration from V1
The current codebase implements the v1 protocol (string-encoded values, three-state string quality). The v2 protocol is a clean break — all clients and servers will be updated simultaneously. No backward compatibility layer. This is appropriate because LmxProxy is an internal protocol with a small, controlled client count.
## 14. Component List (10 Components)
| # | Component | Description |
|---|-----------|-------------|
| 1 | GrpcServer | gRPC service implementation, session validation, request routing |
| 2 | MxAccessClient | MXAccess COM interop wrapper, connection lifecycle, read/write/subscribe |
| 3 | SessionManager | Client session tracking and lifecycle |
| 4 | Security | API key authentication, role-based authorization, TLS management |
| 5 | SubscriptionManager | Tag subscription lifecycle, channel-based update delivery, backpressure |
| 6 | Configuration | appsettings.json structure, validation, options binding |
| 7 | HealthAndMetrics | Health checks, performance metrics, status web server |
| 8 | ServiceHost | Topshelf hosting, startup/shutdown, logging setup, service recovery |
| 9 | Client | LmxProxyClient library, builder, retry, streaming, DI integration |
| 10 | Protocol | gRPC protocol specification, proto definition, code-first contracts |