Files
scadaproj/components/health/current-state/mxaccessgw/CURRENT-STATE.md
T
Joseph Doherty 3d25ee5090 docs(health): current-state x3 + GAPS + README
Code-verified current-state docs for OtOpcUa (three-tier full), ScadaBridge
(two-tier, no /healthz), and MxAccessGateway (bare liveness only / no probes).
GAPS backlog with P1 for MxGateway and convergence items for Akka status policy,
DB probe technique, and response writer. README with per-project status table.
2026-06-01 06:23:53 -04:00

134 lines
7.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Health — current state: MxAccessGateway
Repo: `~/Desktop/MxAccessGateway`. Stack: .NET 10 gateway (x64) + .NET 4.8 worker (x86), gRPC;
solution `src/MxGateway.sln`.
Health code lives in `src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`. All paths relative
to repo root.
Verified 2026-06-01.
**Summary: bare liveness only.** MxAccessGateway has a single `/health/live` endpoint that returns
a hardcoded `GatewayHealthReply` JSON object. `AddHealthChecks()` is called at startup but is
entirely unused — no `IHealthCheck` implementations are registered, `MapHealthChecks` is never
called, and there is no readiness or active-node tier. The net48 x86 worker process has no HTTP
server and therefore no health endpoint of any kind.
## 1. Endpoint wiring
`src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`:
- `:61``builder.Services.AddHealthChecks()` is called in the DI registration block. **This call
is dead**: no `.AddCheck<T>()` call follows it, no `MapHealthChecks` is ever called. The
framework registers the health-check infrastructure but nothing is wired through it.
- `:139145``MapGatewayEndpoints` maps a raw `endpoints.MapGet("/health/live", ...)` (not
`MapHealthChecks`). The handler is an inline lambda that returns `Results.Ok(new GatewayHealthReply(...))`
with a hardcoded `Status: "Healthy"`:
```csharp
endpoints.MapGet(
"/health/live",
() => Results.Ok(new GatewayHealthReply(
Status: "Healthy",
DefaultBackend: GatewayContractInfo.DefaultBackendName,
WorkerProtocolVersion: GatewayContractInfo.WorkerProtocolVersion)))
.WithName("LiveHealth");
```
This endpoint always returns HTTP 200 `{"Status":"Healthy",...}` as long as the process is alive.
It carries no authentication requirement (no `[Authorize]` or `.RequireAuthorization()`).
## 2. Response shape
`GatewayHealthReply` is a record with three fields:
- `Status` — always `"Healthy"` (hardcoded string, not the ASP.NET Core `HealthStatus` enum)
- `DefaultBackend` — value of `GatewayContractInfo.DefaultBackendName` (the configured backend
name, useful for confirming which gateway instance a probe hit)
- `WorkerProtocolVersion` — value of `GatewayContractInfo.WorkerProtocolVersion` (the gRPC
protocol version the gateway expects from the worker, useful for version-skew detection)
The response is not `HealthChecks.UI.Client` JSON and is not the standard ASP.NET Core health
response shape. It is a bespoke JSON record.
## 3. Probes
None. There is no `IHealthCheck` registered. The `/health/live` response does not reflect:
- Whether the SQLite auth-store is reachable
- Whether any active MXAccess session is functional
- Whether the x86 worker named-pipe IPC is connected or the worker process is alive
- Whether the gRPC service is actually accepting calls
The endpoint is purely a process liveness indicator.
## 4. Tier coverage
| Tier | Endpoint | Status |
|---|---|---|
| Process liveness | `/health/live` (raw `MapGet`) | ✅ present (but non-standard) |
| Readiness | `/health/ready` | ⛔ absent |
| Active node | `/health/active` | ⛔ absent (not Akka-based; not applicable as-is) |
| `healthz` convention | `/healthz` | ⛔ absent |
MxAccessGateway is not an Akka.NET application — it has no cluster, no leader election, and no
active-node concept. The "active" tier in the shared spec translates here to "is the worker process
connected and the gRPC service ready to accept calls?" rather than cluster leadership.
## 5. x86 worker
`ZB.MOM.WW.MxGateway.Worker` is a .NET 4.8 console application communicating with the gateway
over Windows named-pipe IPC. It has no HTTP server, no health endpoint, and no exposure to any
probe mechanism. Its liveness must be inferred indirectly — either via the gateway process
monitoring it (not currently implemented) or via the `GrpcDependencyHealthCheck` the gateway
could use to probe the IPC channel.
## 6. Notable gaps
- `AddHealthChecks()` at `:61` is dead code. No `IHealthCheck` is ever registered via this call.
- `/health/live` uses `MapGet` (a raw minimal-API handler) rather than `MapHealthChecks`. It
bypasses the ASP.NET Core health-check middleware entirely, which means it does not participate
in the standard health-check pipeline (no `IHealthCheckPublisher`, no `HealthReport`, no UI
integration).
- The hardcoded `"Healthy"` status means the endpoint cannot reflect real probe results even if
probes were added later — the handler must be replaced, not just supplemented.
- No readiness gating: orchestrators (Kubernetes, Traefik) that rely on `/health/ready` returning
503 until the process is actually ready will receive 200 (or 404) from MxAccessGateway today.
---
## Adoption plan → `ZB.MOM.WW.Health`
**Replace `/health/live` + wire the shared tiers:**
The `AddHealthChecks()` call at `GatewayApplication.cs:61` is already present — it just needs
probes registered against it. The raw `MapGet("/health/live", ...)` handler at `:139145` must be
removed and replaced with `app.MapZbHealth()` from `ZB.MOM.WW.Health`.
Steps:
1. **Remove** the inline `MapGet("/health/live", ...)` lambda (`:139145`). The `GatewayHealthReply`
record and `DefaultBackend`/`WorkerProtocolVersion` metadata can be surfaced differently (e.g., a
`/info` endpoint or as custom data on the health response).
2. **Register a `GrpcDependencyHealthCheck`** (from `ZB.MOM.WW.Health`) that probes the
named-pipe IPC channel to the x86 worker. Tag `["ready"]`. This replaces the hardcoded
liveness-only response with a real probe that reflects whether the worker is reachable.
3. **Optionally add a `GrpcDependencyHealthCheck`** for any downstream gRPC dependency (e.g., the
Galaxy Repository connection) if the gateway is expected to be healthy only when its upstreams are
reachable. Tag `["ready"]`.
4. **Call `app.MapZbHealth()`** — this maps `/health/ready` (tag `ready`), `/health/active` (tag
`active`; initially empty — no active-node concept in MxGateway), and `/healthz` (bare liveness).
The `/healthz` endpoint replaces the semantic role that `/health/live` served today.
5. **Do not add `ZB.MOM.WW.Health.Akka`** — MxAccessGateway has no Akka dependency. The consumer
matrix in the design specifies MxGateway uses the core package only.
**Keep bespoke:**
- The `WorkerProtocolVersion` / `DefaultBackend` metadata from `GatewayHealthReply` is
MxAccessGateway-specific; keep it as a separate `/info` endpoint or embed it as `Data` on a
custom probe rather than normalizing it into the shared contract.
- The x86 worker itself (net48 console, named-pipe IPC, no HTTP) remains outside the shared health
scheme. The `GrpcDependencyHealthCheck` observes the worker indirectly from the gateway side.
- Per-gateway auth and TLS concerns on who may call health endpoints remain per-project.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Health`
library build. MxGateway is the **highest-priority adopter** (P1 gap — no probes/tiers today)
and should be the first app wired up once the nupkg is available.