Files
scadalink-design/deprecated/lmxproxy/docs/requirements/Component-HealthAndMetrics.md
Joseph Doherty 9dccf8e72f deprecate(lmxproxy): move all LmxProxy code, tests, and docs to deprecated/
LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL
adapter files, and related docs to deprecated/. Removed LmxProxy registration
from DataConnectionFactory, project reference from DCL, protocol option from
UI, and cleaned up all requirement docs.
2026-04-08 15:56:23 -04:00

4.9 KiB

Component: HealthAndMetrics

Purpose

Provides health checking, performance metrics collection, and an HTTP status dashboard for monitoring the LmxProxy service.

Location

  • src/ZB.MOM.WW.LmxProxy.Host/Health/HealthCheckService.cs — basic health check.
  • src/ZB.MOM.WW.LmxProxy.Host/Health/DetailedHealthCheckService.cs — detailed health check with test tag read.
  • src/ZB.MOM.WW.LmxProxy.Host/Metrics/PerformanceMetrics.cs — operation metrics collection.
  • src/ZB.MOM.WW.LmxProxy.Host/Status/StatusReportService.cs — status report generation.
  • src/ZB.MOM.WW.LmxProxy.Host/Status/StatusWebServer.cs — HTTP status endpoint.

Responsibilities

  • Evaluate service health based on connection state, operation success rates, and test tag reads.
  • Track per-operation performance metrics (counts, latencies, percentiles).
  • Serve an HTML status dashboard and JSON/health HTTP endpoints.
  • Report metrics to logs on a periodic interval.

1. Health Checks

1.1 Basic Health Check (HealthCheckService)

CheckHealthAsync() evaluates:

Check Healthy Degraded
MxAccess connected Yes
Success rate (if > 100 total ops) ≥ 50% < 50%
Client count ≤ 100 > 100

Returns health data dictionary: scada_connected, scada_connection_state, total_clients, total_tags, total_operations, average_success_rate.

1.2 Detailed Health Check (DetailedHealthCheckService)

CheckHealthAsync() performs an active probe:

  1. Checks IsConnected — returns Unhealthy if not connected.
  2. Reads a test tag (default System.Heartbeat).
  3. If test tag quality is not Good — returns Degraded.
  4. If test tag timestamp is older than 5 minutes — returns Degraded (stale data detection).
  5. Otherwise returns Healthy.

2. Performance Metrics

2.1 Tracking

PerformanceMetrics uses a ConcurrentDictionary<string, OperationMetrics> to track operations by name.

Operations tracked: Read, ReadBatch, Write, WriteBatch (recorded by ScadaGrpcService).

2.2 Recording

Two recording patterns:

  • RecordOperation(name, duration, success) — explicit recording.
  • BeginOperation(name) — returns an ITimingScope (disposable). On dispose, automatically records duration (via Stopwatch) and success flag (set via SetSuccess(bool)).

2.3 Per-Operation Statistics

OperationMetrics maintains:

  • _totalCount, _successCount — running counters.
  • _totalMilliseconds, _minMilliseconds, _maxMilliseconds — latency range.
  • _durations — rolling buffer of up to 1000 latency samples for percentile calculation.

MetricsStatistics snapshot:

  • TotalCount, SuccessCount, SuccessRate (percentage).
  • AverageMilliseconds, MinMilliseconds, MaxMilliseconds.
  • Percentile95Milliseconds — calculated from sorted samples at the 95th percentile index.

2.4 Periodic Reporting

A timer fires every 60 seconds, logging a summary of all operation metrics to Serilog.

3. Status Web Server

3.1 Server

StatusWebServer uses HttpListener on http://+:{Port}/ (default port 8080).

  • Starts an async request-handling loop, spawning a task per request.
  • Graceful shutdown: cancels the listener, waits 5 seconds for the listener task to exit.
  • Returns HTTP 405 for non-GET methods, HTTP 500 on errors.

3.2 Endpoints

Endpoint Method Response
/ GET HTML dashboard (auto-refresh every 30 seconds)
/api/status GET JSON status report (camelCase)
/api/health GET Plain text OK (200) or UNHEALTHY (503)

3.3 HTML Dashboard

Generated by StatusReportService:

  • Bootstrap-like CSS grid layout with status cards.
  • Color-coded status: green = Healthy, yellow = Degraded, red = Unhealthy/Error.
  • Operations table with columns: Count, SuccessRate, Avg/Min/Max/P95 milliseconds.
  • Service metadata: ServiceName, Version (assembly version), connection state.
  • Subscription stats: TotalClients, TotalTags, ActiveSubscriptions.
  • Auto-refresh via <meta http-equiv="refresh" content="30">.
  • Last updated timestamp.

3.4 JSON Status Report

Fully nested structure with camelCase property names:

  • Service metadata, connection status, subscription stats, performance data, health check results.

Dependencies

  • MxAccessClientIsConnected, ConnectionState for health checks; test tag read for detailed check.
  • SubscriptionManager — subscription statistics.
  • PerformanceMetrics — operation statistics for status report and health evaluation.
  • ConfigurationWebServerConfiguration for port and prefix.

Interactions

  • GrpcServer populates PerformanceMetrics via timing scopes on every RPC.
  • ServiceHost creates all health/metrics/status components at startup and disposes them at shutdown.
  • External monitoring systems can poll /api/health for availability checks.