deprecate(lmxproxy): move all LmxProxy code, tests, and docs to deprecated/
LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL adapter files, and related docs to deprecated/. Removed LmxProxy registration from DataConnectionFactory, project reference from DCL, protocol option from UI, and cleaned up all requirement docs.
This commit is contained in:
@@ -0,0 +1,121 @@
|
||||
# Component: HealthAndMetrics
|
||||
|
||||
## Purpose
|
||||
|
||||
Provides health checking, performance metrics collection, and an HTTP status dashboard for monitoring the LmxProxy service.
|
||||
|
||||
## Location
|
||||
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs` — basic health check.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs` — detailed health check with test tag read.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs` — operation metrics collection.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs` — status report generation.
|
||||
- `src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs` — HTTP status endpoint.
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Evaluate service health based on connection state, operation success rates, and test tag reads.
|
||||
- Track per-operation performance metrics (counts, latencies, percentiles).
|
||||
- Serve an HTML status dashboard and JSON/health HTTP endpoints.
|
||||
- Report metrics to logs on a periodic interval.
|
||||
|
||||
## 1. Health Checks
|
||||
|
||||
### 1.1 Basic Health Check (HealthCheckService)
|
||||
|
||||
`CheckHealthAsync()` evaluates:
|
||||
|
||||
| Check | Healthy | Degraded |
|
||||
|-------|---------|----------|
|
||||
| MxAccess connected | Yes | — |
|
||||
| Success rate (if > 100 total ops) | ≥ 50% | < 50% |
|
||||
| Client count | ≤ 100 | > 100 |
|
||||
|
||||
Returns health data dictionary: `scada_connected`, `scada_connection_state`, `total_clients`, `total_tags`, `total_operations`, `average_success_rate`.
|
||||
|
||||
### 1.2 Detailed Health Check (DetailedHealthCheckService)
|
||||
|
||||
`CheckHealthAsync()` performs an active probe:
|
||||
|
||||
1. Checks `IsConnected` — returns **Unhealthy** if not connected.
|
||||
2. Reads a test tag (default `System.Heartbeat`).
|
||||
3. If test tag quality is not Good — returns **Degraded**.
|
||||
4. If test tag timestamp is older than **5 minutes** — returns **Degraded** (stale data detection).
|
||||
5. Otherwise returns **Healthy**.
|
||||
|
||||
## 2. Performance Metrics
|
||||
|
||||
### 2.1 Tracking
|
||||
|
||||
`PerformanceMetrics` uses a `ConcurrentDictionary<string, OperationMetrics>` to track operations by name.
|
||||
|
||||
Operations tracked: `Read`, `ReadBatch`, `Write`, `WriteBatch` (recorded by ScadaGrpcService).
|
||||
|
||||
### 2.2 Recording
|
||||
|
||||
Two recording patterns:
|
||||
- `RecordOperation(name, duration, success)` — explicit recording.
|
||||
- `BeginOperation(name)` — returns an `ITimingScope` (disposable). On dispose, automatically records duration (via `Stopwatch`) and success flag (set via `SetSuccess(bool)`).
|
||||
|
||||
### 2.3 Per-Operation Statistics
|
||||
|
||||
`OperationMetrics` maintains:
|
||||
- `_totalCount`, `_successCount` — running counters.
|
||||
- `_totalMilliseconds`, `_minMilliseconds`, `_maxMilliseconds` — latency range.
|
||||
- `_durations` — rolling buffer of up to **1000 latency samples** for percentile calculation.
|
||||
|
||||
`MetricsStatistics` snapshot:
|
||||
- `TotalCount`, `SuccessCount`, `SuccessRate` (percentage).
|
||||
- `AverageMilliseconds`, `MinMilliseconds`, `MaxMilliseconds`.
|
||||
- `Percentile95Milliseconds` — calculated from sorted samples at the 95th percentile index.
|
||||
|
||||
### 2.4 Periodic Reporting
|
||||
|
||||
A timer fires every **60 seconds**, logging a summary of all operation metrics to Serilog.
|
||||
|
||||
## 3. Status Web Server
|
||||
|
||||
### 3.1 Server
|
||||
|
||||
`StatusWebServer` uses `HttpListener` on `http://+:{Port}/` (default port 8080).
|
||||
|
||||
- Starts an async request-handling loop, spawning a task per request.
|
||||
- Graceful shutdown: cancels the listener, waits **5 seconds** for the listener task to exit.
|
||||
- Returns HTTP 405 for non-GET methods, HTTP 500 on errors.
|
||||
|
||||
### 3.2 Endpoints
|
||||
|
||||
| Endpoint | Method | Response |
|
||||
|----------|--------|----------|
|
||||
| `/` | GET | HTML dashboard (auto-refresh every 30 seconds) |
|
||||
| `/api/status` | GET | JSON status report (camelCase) |
|
||||
| `/api/health` | GET | Plain text `OK` (200) or `UNHEALTHY` (503) |
|
||||
|
||||
### 3.3 HTML Dashboard
|
||||
|
||||
Generated by `StatusReportService`:
|
||||
- Bootstrap-like CSS grid layout with status cards.
|
||||
- Color-coded status: green = Healthy, yellow = Degraded, red = Unhealthy/Error.
|
||||
- Operations table with columns: Count, SuccessRate, Avg/Min/Max/P95 milliseconds.
|
||||
- Service metadata: ServiceName, Version (assembly version), connection state.
|
||||
- Subscription stats: TotalClients, TotalTags, ActiveSubscriptions.
|
||||
- Auto-refresh via `<meta http-equiv="refresh" content="30">`.
|
||||
- Last updated timestamp.
|
||||
|
||||
### 3.4 JSON Status Report
|
||||
|
||||
Fully nested structure with camelCase property names:
|
||||
- Service metadata, connection status, subscription stats, performance data, health check results.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **MxAccessClient** — `IsConnected`, `ConnectionState` for health checks; test tag read for detailed check.
|
||||
- **SubscriptionManager** — subscription statistics.
|
||||
- **PerformanceMetrics** — operation statistics for status report and health evaluation.
|
||||
- **Configuration** — `WebServerConfiguration` for port and prefix.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **GrpcServer** populates PerformanceMetrics via timing scopes on every RPC.
|
||||
- **ServiceHost** creates all health/metrics/status components at startup and disposes them at shutdown.
|
||||
- External monitoring systems can poll `/api/health` for availability checks.
|
||||
Reference in New Issue
Block a user