docs: add LmxProxy requirements documentation with v2 protocol as authoritative design
Generate high-level requirements and 10 component documents derived from source code and protocol specs. Uses lmxproxy_updates.md (v2 TypedValue/QualityCode) as the source of truth, with v1 string-based encoding documented as legacy context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
274
lmxproxy/docs/requirements/HighLevelReqs.md
Normal file
274
lmxproxy/docs/requirements/HighLevelReqs.md
Normal file
@@ -0,0 +1,274 @@
|
||||
# LmxProxy - High Level Requirements
|
||||
|
||||
## 1. System Purpose
|
||||
|
||||
LmxProxy is a gRPC proxy service that bridges SCADA clients to AVEVA System Platform (Wonderware) via the ArchestrA MXAccess COM API. It exists because MXAccess is a 32-bit COM component that requires co-location with System Platform on a Windows machine running .NET Framework 4.8. LmxProxy isolates this constraint behind a gRPC interface, allowing modern .NET clients to access System Platform data remotely over HTTP/2.
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
### 2.1 Two-Project Structure
|
||||
|
||||
- **ZB.MOM.WW.LmxProxy.Host** — .NET Framework 4.8, x86-only Windows service. Hosts a gRPC server (Grpc.Core) fronting the MXAccess COM API. Runs on the same machine as AVEVA System Platform.
|
||||
- **ZB.MOM.WW.LmxProxy.Client** — .NET 10, AnyCPU class library. Code-first gRPC client (protobuf-net.Grpc) consumed by ScadaLink's Data Connection Layer. Packaged as a NuGet library.
|
||||
|
||||
### 2.2 Dual gRPC Stacks
|
||||
|
||||
The two projects use different gRPC implementations that are wire-compatible:
|
||||
|
||||
- **Host**: Proto-file-generated code via `Grpc.Core` + `Grpc.Tools`. Uses the deprecated C-core gRPC library because .NET Framework 4.8 does not support `Grpc.Net.Server`.
|
||||
- **Client**: Code-first contracts via `protobuf-net.Grpc` with `[DataContract]`/`[ServiceContract]` attributes over `Grpc.Net.Client`.
|
||||
|
||||
Both target the same `scada.ScadaService` gRPC service definition and are wire-compatible.
|
||||
|
||||
### 2.3 Deployment Model
|
||||
|
||||
- The Host service runs on the AVEVA System Platform machine (or any machine with MXAccess access).
|
||||
- Clients connect remotely over gRPC (HTTP/2) on a configurable port (default 50051).
|
||||
- The Host runs as a Windows service managed by Topshelf.
|
||||
|
||||
## 3. Communication Protocol
|
||||
|
||||
### 3.1 Transport
|
||||
|
||||
- gRPC over HTTP/2.
|
||||
- Default server port: 50051.
|
||||
- Optional TLS with mutual TLS (mTLS) support.
|
||||
|
||||
### 3.2 RPCs
|
||||
|
||||
The service exposes 10 RPCs:
|
||||
|
||||
| RPC | Type | Description |
|
||||
|-----|------|-------------|
|
||||
| Connect | Unary | Establish session, returns session ID |
|
||||
| Disconnect | Unary | Terminate session |
|
||||
| GetConnectionState | Unary | Query MxAccess connection status |
|
||||
| Read | Unary | Read single tag value |
|
||||
| ReadBatch | Unary | Read multiple tag values |
|
||||
| Write | Unary | Write single tag value |
|
||||
| WriteBatch | Unary | Write multiple tag values |
|
||||
| WriteBatchAndWait | Unary | Write values, poll flag tag until match or timeout |
|
||||
| Subscribe | Server streaming | Stream tag value updates to client |
|
||||
| CheckApiKey | Unary | Validate API key and return role |
|
||||
|
||||
### 3.3 Data Model (VTQ)
|
||||
|
||||
All tag values are represented as VTQ (Value, Timestamp, Quality) tuples:
|
||||
|
||||
- **Value**: `TypedValue` — a protobuf `oneof` carrying the value in its native type (bool, int32, int64, float, double, string, bytes, datetime, typed arrays). An unset `oneof` represents null.
|
||||
- **Timestamp**: UTC `DateTime.Ticks` as `int64` (100-nanosecond intervals since 0001-01-01 00:00:00 UTC).
|
||||
- **Quality**: `QualityCode` — a structured message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Category derived from high bits: `0x00xxxxxx` = Good, `0x40xxxxxx` = Uncertain, `0x80xxxxxx` = Bad.
|
||||
|
||||
## 4. Session Lifecycle
|
||||
|
||||
- Clients call `Connect` with a client ID and optional API key to establish a session.
|
||||
- The server returns a 32-character hex GUID as the session ID.
|
||||
- All subsequent operations require the session ID for validation.
|
||||
- Sessions persist until explicit `Disconnect` or server restart. There is no idle timeout.
|
||||
- Session state is tracked in memory (not persisted). All sessions are lost on service restart.
|
||||
|
||||
## 5. Authentication & Authorization
|
||||
|
||||
### 5.1 API Key Authentication
|
||||
|
||||
- API keys are validated via the `x-api-key` gRPC metadata header.
|
||||
- Keys are stored in a JSON file (`apikeys.json` by default) with hot-reload via FileSystemWatcher (1-second debounce).
|
||||
- If no API key file exists, the service auto-generates a default file with two random keys (one ReadOnly, one ReadWrite).
|
||||
- Authentication is enforced at the gRPC interceptor level before any service method executes.
|
||||
|
||||
### 5.2 Role-Based Authorization
|
||||
|
||||
Two roles with hierarchical permissions:
|
||||
|
||||
| Role | Read | Subscribe | Write |
|
||||
|------|------|-----------|-------|
|
||||
| ReadOnly | Yes | Yes | No |
|
||||
| ReadWrite | Yes | Yes | Yes |
|
||||
|
||||
Write-protected methods: `Write`, `WriteBatch`, `WriteBatchAndWait`. A ReadOnly key attempting a write receives `StatusCode.PermissionDenied`.
|
||||
|
||||
### 5.3 TLS/Security
|
||||
|
||||
- TLS is optional (disabled by default in configuration, though `Tls.Enabled` defaults to `true` in the config class).
|
||||
- Supports server TLS and mutual TLS (client certificate validation).
|
||||
- Client CA certificate path configurable for mTLS.
|
||||
- Certificate revocation checking is optional.
|
||||
- Client library supports TLS 1.2 and TLS 1.3, custom CA trust stores, self-signed certificate allowance, and server name override.
|
||||
|
||||
## 6. Operations
|
||||
|
||||
### 6.1 Read
|
||||
|
||||
- Single tag read with configurable retry policy.
|
||||
- Batch read with semaphore-controlled concurrency (default max 10 concurrent operations).
|
||||
- Read timeout: 5 seconds (configurable).
|
||||
|
||||
### 6.2 Write
|
||||
|
||||
- Single tag write with retry policy. Values are sent as `TypedValue` (native protobuf types). Type mismatches between the value and the tag's expected type return a write failure.
|
||||
- Batch write with semaphore-controlled concurrency.
|
||||
- Write timeout: 5 seconds (configurable).
|
||||
- WriteBatchAndWait: writes a batch, then polls the flag tag at a configurable interval until its value matches the expected flag value (type-aware comparison via `TypedValueEquals`) or a timeout expires. Default timeout: 5000ms, default poll interval: 100ms. Timeout is not an error — returns `flag_reached=false`.
|
||||
|
||||
### 6.3 Subscribe
|
||||
|
||||
- Server-streaming RPC. Client sends a list of tags and a sampling interval (in milliseconds).
|
||||
- Server maintains a per-client bounded channel (default capacity 1000 messages).
|
||||
- Updates are pushed as `VtqMessage` items on the stream.
|
||||
- When the MxAccess connection drops, all subscribed clients receive a bad-quality notification.
|
||||
- Subscriptions are cleaned up on client disconnect. When the last client unsubscribes from a tag, the underlying MxAccess subscription is disposed.
|
||||
|
||||
## 7. Connection Resilience
|
||||
|
||||
### 7.1 Host Auto-Reconnect
|
||||
|
||||
- If the MxAccess connection is lost, the Host automatically attempts reconnection at a fixed interval (default 5 seconds).
|
||||
- Stored subscriptions are recreated after a successful reconnect.
|
||||
- Auto-reconnect is configurable (`Connection.AutoReconnect`, default true).
|
||||
|
||||
### 7.2 Client Keep-Alive
|
||||
|
||||
- The client sends a lightweight `GetConnectionState` ping every 30 seconds.
|
||||
- On keep-alive failure, the client marks the connection as disconnected and cleans up subscriptions.
|
||||
|
||||
### 7.3 Client Retry Policy
|
||||
|
||||
- Polly-based exponential backoff retry.
|
||||
- Default: 3 attempts with 1-second initial delay (1s → 2s → 4s).
|
||||
- Transient errors retried: Unavailable, DeadlineExceeded, ResourceExhausted, Aborted.
|
||||
|
||||
## 8. Health Monitoring & Metrics
|
||||
|
||||
### 8.1 Health Checks
|
||||
|
||||
Two health check implementations:
|
||||
|
||||
- **Basic** (`HealthCheckService`): Checks MxAccess connection state, subscription stats, and operation success rate. Returns Degraded if success rate < 50% (with > 100 operations) or client count > 100.
|
||||
- **Detailed** (`DetailedHealthCheckService`): Reads a test tag (`System.Heartbeat`). Returns Unhealthy if not connected, Degraded if test tag quality is not Good or timestamp is older than 5 minutes.
|
||||
|
||||
### 8.2 Performance Metrics
|
||||
|
||||
- Per-operation tracking: Read, ReadBatch, Write, WriteBatch.
|
||||
- Metrics: total count, success count, success rate, average/min/max latency, 95th percentile latency.
|
||||
- Rolling buffer of 1000 latency samples per operation for percentile calculation.
|
||||
- Metrics reported to logs every 60 seconds.
|
||||
|
||||
### 8.3 Status Web Server
|
||||
|
||||
- HTTP status server on port 8080 (configurable).
|
||||
- Endpoints:
|
||||
- `GET /` — HTML dashboard with auto-refresh (30 seconds), color-coded status cards, operations table.
|
||||
- `GET /api/status` — JSON status report.
|
||||
- `GET /api/health` — Plain text `OK` (200) or `UNHEALTHY` (503).
|
||||
|
||||
### 8.4 Client Metrics
|
||||
|
||||
- Per-operation counts, error counts, and latency tracking (average, p95, p99).
|
||||
- Rolling buffer of 1000 latency samples.
|
||||
- Exposed via `ILmxProxyClient.GetMetrics()`.
|
||||
|
||||
## 9. Service Hosting
|
||||
|
||||
### 9.1 Topshelf Windows Service
|
||||
|
||||
- Service name: `ZB.MOM.WW.LmxProxy.Host`
|
||||
- Display name: `SCADA Bridge LMX Proxy`
|
||||
- Starts automatically on boot.
|
||||
|
||||
### 9.2 Service Recovery (Windows SCM)
|
||||
|
||||
| Failure | Restart Delay |
|
||||
|---------|--------------|
|
||||
| First | 1 minute |
|
||||
| Second | 5 minutes |
|
||||
| Subsequent | 10 minutes |
|
||||
| Reset period | 1 day |
|
||||
|
||||
### 9.3 Startup Sequence
|
||||
|
||||
1. Load configuration from `appsettings.json` + environment variables.
|
||||
2. Configure Serilog (console + file sinks).
|
||||
3. Validate configuration.
|
||||
4. Check/generate TLS certificates (if TLS enabled).
|
||||
5. Initialize services: PerformanceMetrics, ApiKeyService, MxAccessClient, SubscriptionManager, SessionManager, HealthCheckService, StatusReportService.
|
||||
6. Connect to MxAccess synchronously (timeout: 30 seconds).
|
||||
7. Start auto-reconnect monitor loop (if enabled).
|
||||
8. Start gRPC server on configured port.
|
||||
9. Start HTTP status web server.
|
||||
|
||||
### 9.4 Shutdown Sequence
|
||||
|
||||
1. Cancel reconnect monitor (5-second wait).
|
||||
2. Graceful gRPC server shutdown (10-second timeout, then kill).
|
||||
3. Stop status web server (5-second wait).
|
||||
4. Dispose all components in reverse order.
|
||||
5. Disconnect from MxAccess (10-second timeout).
|
||||
|
||||
## 10. Configuration
|
||||
|
||||
All configuration is via `appsettings.json` bound to `LmxProxyConfiguration`. Key settings:
|
||||
|
||||
| Section | Setting | Default |
|
||||
|---------|---------|---------|
|
||||
| Root | GrpcPort | 50051 |
|
||||
| Root | ApiKeyConfigFile | `apikeys.json` |
|
||||
| Connection | MonitorIntervalSeconds | 5 |
|
||||
| Connection | ConnectionTimeoutSeconds | 30 |
|
||||
| Connection | ReadTimeoutSeconds | 5 |
|
||||
| Connection | WriteTimeoutSeconds | 5 |
|
||||
| Connection | MaxConcurrentOperations | 10 |
|
||||
| Connection | AutoReconnect | true |
|
||||
| Subscription | ChannelCapacity | 1000 |
|
||||
| Subscription | ChannelFullMode | DropOldest |
|
||||
| Tls | Enabled | false |
|
||||
| Tls | RequireClientCertificate | false |
|
||||
| WebServer | Enabled | true |
|
||||
| WebServer | Port | 8080 |
|
||||
|
||||
Configuration is validated at startup. Invalid values cause the service to fail to start.
|
||||
|
||||
## 11. Logging
|
||||
|
||||
- Serilog with console and file sinks.
|
||||
- File sink: `logs/lmxproxy-.txt`, daily rolling, 30 files retained.
|
||||
- Default level: Information. Overrides: Microsoft=Warning, System=Warning, Grpc=Information.
|
||||
- Enrichment: FromLogContext, WithMachineName, WithThreadId.
|
||||
|
||||
## 12. Constraints
|
||||
|
||||
- Host **must** target x86 and .NET Framework 4.8 (MXAccess is 32-bit COM).
|
||||
- Host uses `Grpc.Core` (deprecated C-core library), required because .NET 4.8 does not support `Grpc.Net.Server`.
|
||||
- Client targets .NET 10 and runs in ScadaLink central/site clusters.
|
||||
- MxAccess COM operations require STA thread context (wrapped in `Task.Run`).
|
||||
- The solution file uses `.slnx` format.
|
||||
|
||||
## 13. Protocol
|
||||
|
||||
The protocol specification is defined in `lmxproxy_updates.md`, which is the authoritative source of truth. All RPC signatures, message schemas, and behavioral specifications are per that document.
|
||||
|
||||
### 13.1 Value System (TypedValue)
|
||||
|
||||
Values are transmitted in their native protobuf types via a `TypedValue` oneof: bool, int32, int64, float, double, string, bytes, datetime (int64 UTC Ticks), and typed arrays. An unset oneof represents null. No string serialization or parsing heuristics are used.
|
||||
|
||||
### 13.2 Quality System (QualityCode)
|
||||
|
||||
Quality is a structured `QualityCode` message with `uint32 status_code` (OPC UA-compatible) and `string symbolic_name`. Supports AVEVA-aligned quality sub-codes (e.g., `BadSensorFailure` = `0x806D0000`, `GoodLocalOverride` = `0x00D80000`, `BadWaitingForInitialData` = `0x80320000`). See Component-Protocol for the full quality code table.
|
||||
|
||||
### 13.3 Migration from V1
|
||||
|
||||
The current codebase implements the v1 protocol (string-encoded values, three-state string quality). The v2 protocol is a clean break — all clients and servers will be updated simultaneously. No backward compatibility layer. This is appropriate because LmxProxy is an internal protocol with a small, controlled client count.
|
||||
|
||||
## 14. Component List (10 Components)
|
||||
|
||||
| # | Component | Description |
|
||||
|---|-----------|-------------|
|
||||
| 1 | GrpcServer | gRPC service implementation, session validation, request routing |
|
||||
| 2 | MxAccessClient | MXAccess COM interop wrapper, connection lifecycle, read/write/subscribe |
|
||||
| 3 | SessionManager | Client session tracking and lifecycle |
|
||||
| 4 | Security | API key authentication, role-based authorization, TLS management |
|
||||
| 5 | SubscriptionManager | Tag subscription lifecycle, channel-based update delivery, backpressure |
|
||||
| 6 | Configuration | appsettings.json structure, validation, options binding |
|
||||
| 7 | HealthAndMetrics | Health checks, performance metrics, status web server |
|
||||
| 8 | ServiceHost | Topshelf hosting, startup/shutdown, logging setup, service recovery |
|
||||
| 9 | Client | LmxProxyClient library, builder, retry, streaming, DI integration |
|
||||
| 10 | Protocol | gRPC protocol specification, proto definition, code-first contracts |
|
||||
Reference in New Issue
Block a user