Generate high-level requirements and 10 component documents derived from source code and protocol specs. Uses lmxproxy_updates.md (v2 TypedValue/QualityCode) as the source of truth, with v1 string-based encoding documented as legacy context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.9 KiB
Component: HealthAndMetrics
Purpose
Provides health checking, performance metrics collection, and an HTTP status dashboard for monitoring the LmxProxy service.
Location
src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs— basic health check.src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs— detailed health check with test tag read.src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs— operation metrics collection.src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs— status report generation.src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs— HTTP status endpoint.
Responsibilities
- Evaluate service health based on connection state, operation success rates, and test tag reads.
- Track per-operation performance metrics (counts, latencies, percentiles).
- Serve an HTML status dashboard and JSON/health HTTP endpoints.
- Report metrics to logs on a periodic interval.
1. Health Checks
1.1 Basic Health Check (HealthCheckService)
CheckHealthAsync() evaluates:
| Check | Healthy | Degraded |
|---|---|---|
| MxAccess connected | Yes | — |
| Success rate (if > 100 total ops) | ≥ 50% | < 50% |
| Client count | ≤ 100 | > 100 |
Returns health data dictionary: scada_connected, scada_connection_state, total_clients, total_tags, total_operations, average_success_rate.
1.2 Detailed Health Check (DetailedHealthCheckService)
CheckHealthAsync() performs an active probe:
- Checks
IsConnected— returns Unhealthy if not connected. - Reads a test tag (default
System.Heartbeat). - If test tag quality is not Good — returns Degraded.
- If test tag timestamp is older than 5 minutes — returns Degraded (stale data detection).
- Otherwise returns Healthy.
2. Performance Metrics
2.1 Tracking
PerformanceMetrics uses a ConcurrentDictionary<string, OperationMetrics> to track operations by name.
Operations tracked: Read, ReadBatch, Write, WriteBatch (recorded by ScadaGrpcService).
2.2 Recording
Two recording patterns:
RecordOperation(name, duration, success)— explicit recording.BeginOperation(name)— returns anITimingScope(disposable). On dispose, automatically records duration (viaStopwatch) and success flag (set viaSetSuccess(bool)).
2.3 Per-Operation Statistics
OperationMetrics maintains:
_totalCount,_successCount— running counters._totalMilliseconds,_minMilliseconds,_maxMilliseconds— latency range._durations— rolling buffer of up to 1000 latency samples for percentile calculation.
MetricsStatistics snapshot:
TotalCount,SuccessCount,SuccessRate(percentage).AverageMilliseconds,MinMilliseconds,MaxMilliseconds.Percentile95Milliseconds— calculated from sorted samples at the 95th percentile index.
2.4 Periodic Reporting
A timer fires every 60 seconds, logging a summary of all operation metrics to Serilog.
3. Status Web Server
3.1 Server
StatusWebServer uses HttpListener on http://+:{Port}/ (default port 8080).
- Starts an async request-handling loop, spawning a task per request.
- Graceful shutdown: cancels the listener, waits 5 seconds for the listener task to exit.
- Returns HTTP 405 for non-GET methods, HTTP 500 on errors.
3.2 Endpoints
| Endpoint | Method | Response |
|---|---|---|
/ |
GET | HTML dashboard (auto-refresh every 30 seconds) |
/api/status |
GET | JSON status report (camelCase) |
/api/health |
GET | Plain text OK (200) or UNHEALTHY (503) |
3.3 HTML Dashboard
Generated by StatusReportService:
- Bootstrap-like CSS grid layout with status cards.
- Color-coded status: green = Healthy, yellow = Degraded, red = Unhealthy/Error.
- Operations table with columns: Count, SuccessRate, Avg/Min/Max/P95 milliseconds.
- Service metadata: ServiceName, Version (assembly version), connection state.
- Subscription stats: TotalClients, TotalTags, ActiveSubscriptions.
- Auto-refresh via
<meta http-equiv="refresh" content="30">. - Last updated timestamp.
3.4 JSON Status Report
Fully nested structure with camelCase property names:
- Service metadata, connection status, subscription stats, performance data, health check results.
Dependencies
- MxAccessClient —
IsConnected,ConnectionStatefor health checks; test tag read for detailed check. - SubscriptionManager — subscription statistics.
- PerformanceMetrics — operation statistics for status report and health evaluation.
- Configuration —
WebServerConfigurationfor port and prefix.
Interactions
- GrpcServer populates PerformanceMetrics via timing scopes on every RPC.
- ServiceHost creates all health/metrics/status components at startup and disposes them at shutdown.
- External monitoring systems can poll
/api/healthfor availability checks.