3d25ee5090
Code-verified current-state docs for OtOpcUa (three-tier full), ScadaBridge (two-tier, no /healthz), and MxAccessGateway (bare liveness only / no probes). GAPS backlog with P1 for MxGateway and convergence items for Akka status policy, DB probe technique, and response writer. README with per-project status table.
134 lines
7.0 KiB
Markdown
134 lines
7.0 KiB
Markdown
# Health — current state: MxAccessGateway
|
||
|
||
Repo: `~/Desktop/MxAccessGateway`. Stack: .NET 10 gateway (x64) + .NET 4.8 worker (x86), gRPC;
|
||
solution `src/MxGateway.sln`.
|
||
Health code lives in `src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`. All paths relative
|
||
to repo root.
|
||
Verified 2026-06-01.
|
||
|
||
**Summary: bare liveness only.** MxAccessGateway has a single `/health/live` endpoint that returns
|
||
a hardcoded `GatewayHealthReply` JSON object. `AddHealthChecks()` is called at startup but is
|
||
entirely unused — no `IHealthCheck` implementations are registered, `MapHealthChecks` is never
|
||
called, and there is no readiness or active-node tier. The net48 x86 worker process has no HTTP
|
||
server and therefore no health endpoint of any kind.
|
||
|
||
## 1. Endpoint wiring
|
||
|
||
`src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`:
|
||
|
||
- `:61` — `builder.Services.AddHealthChecks()` is called in the DI registration block. **This call
|
||
is dead**: no `.AddCheck<T>()` call follows it, no `MapHealthChecks` is ever called. The
|
||
framework registers the health-check infrastructure but nothing is wired through it.
|
||
- `:139–145` — `MapGatewayEndpoints` maps a raw `endpoints.MapGet("/health/live", ...)` (not
|
||
`MapHealthChecks`). The handler is an inline lambda that returns `Results.Ok(new GatewayHealthReply(...))`
|
||
with a hardcoded `Status: "Healthy"`:
|
||
|
||
```csharp
|
||
endpoints.MapGet(
|
||
"/health/live",
|
||
() => Results.Ok(new GatewayHealthReply(
|
||
Status: "Healthy",
|
||
DefaultBackend: GatewayContractInfo.DefaultBackendName,
|
||
WorkerProtocolVersion: GatewayContractInfo.WorkerProtocolVersion)))
|
||
.WithName("LiveHealth");
|
||
```
|
||
|
||
This endpoint always returns HTTP 200 `{"Status":"Healthy",...}` as long as the process is alive.
|
||
It carries no authentication requirement (no `[Authorize]` or `.RequireAuthorization()`).
|
||
|
||
## 2. Response shape
|
||
|
||
`GatewayHealthReply` is a record with three fields:
|
||
- `Status` — always `"Healthy"` (hardcoded string, not the ASP.NET Core `HealthStatus` enum)
|
||
- `DefaultBackend` — value of `GatewayContractInfo.DefaultBackendName` (the configured backend
|
||
name, useful for confirming which gateway instance a probe hit)
|
||
- `WorkerProtocolVersion` — value of `GatewayContractInfo.WorkerProtocolVersion` (the gRPC
|
||
protocol version the gateway expects from the worker, useful for version-skew detection)
|
||
|
||
The response is not `HealthChecks.UI.Client` JSON and is not the standard ASP.NET Core health
|
||
response shape. It is a bespoke JSON record.
|
||
|
||
## 3. Probes
|
||
|
||
None. There is no `IHealthCheck` registered. The `/health/live` response does not reflect:
|
||
|
||
- Whether the SQLite auth-store is reachable
|
||
- Whether any active MXAccess session is functional
|
||
- Whether the x86 worker named-pipe IPC is connected or the worker process is alive
|
||
- Whether the gRPC service is actually accepting calls
|
||
|
||
The endpoint is purely a process liveness indicator.
|
||
|
||
## 4. Tier coverage
|
||
|
||
| Tier | Endpoint | Status |
|
||
|---|---|---|
|
||
| Process liveness | `/health/live` (raw `MapGet`) | ✅ present (but non-standard) |
|
||
| Readiness | `/health/ready` | ⛔ absent |
|
||
| Active node | `/health/active` | ⛔ absent (not Akka-based; not applicable as-is) |
|
||
| `healthz` convention | `/healthz` | ⛔ absent |
|
||
|
||
MxAccessGateway is not an Akka.NET application — it has no cluster, no leader election, and no
|
||
active-node concept. The "active" tier in the shared spec translates here to "is the worker process
|
||
connected and the gRPC service ready to accept calls?" rather than cluster leadership.
|
||
|
||
## 5. x86 worker
|
||
|
||
`ZB.MOM.WW.MxGateway.Worker` is a .NET 4.8 console application communicating with the gateway
|
||
over Windows named-pipe IPC. It has no HTTP server, no health endpoint, and no exposure to any
|
||
probe mechanism. Its liveness must be inferred indirectly — either via the gateway process
|
||
monitoring it (not currently implemented) or via the `GrpcDependencyHealthCheck` the gateway
|
||
could use to probe the IPC channel.
|
||
|
||
## 6. Notable gaps
|
||
|
||
- `AddHealthChecks()` at `:61` is dead code. No `IHealthCheck` is ever registered via this call.
|
||
- `/health/live` uses `MapGet` (a raw minimal-API handler) rather than `MapHealthChecks`. It
|
||
bypasses the ASP.NET Core health-check middleware entirely, which means it does not participate
|
||
in the standard health-check pipeline (no `IHealthCheckPublisher`, no `HealthReport`, no UI
|
||
integration).
|
||
- The hardcoded `"Healthy"` status means the endpoint cannot reflect real probe results even if
|
||
probes were added later — the handler must be replaced, not just supplemented.
|
||
- No readiness gating: orchestrators (Kubernetes, Traefik) that rely on `/health/ready` returning
|
||
503 until the process is actually ready will receive 200 (or 404) from MxAccessGateway today.
|
||
|
||
---
|
||
|
||
## Adoption plan → `ZB.MOM.WW.Health`
|
||
|
||
**Replace `/health/live` + wire the shared tiers:**
|
||
|
||
The `AddHealthChecks()` call at `GatewayApplication.cs:61` is already present — it just needs
|
||
probes registered against it. The raw `MapGet("/health/live", ...)` handler at `:139–145` must be
|
||
removed and replaced with `app.MapZbHealth()` from `ZB.MOM.WW.Health`.
|
||
|
||
Steps:
|
||
|
||
1. **Remove** the inline `MapGet("/health/live", ...)` lambda (`:139–145`). The `GatewayHealthReply`
|
||
record and `DefaultBackend`/`WorkerProtocolVersion` metadata can be surfaced differently (e.g., a
|
||
`/info` endpoint or as custom data on the health response).
|
||
2. **Register a `GrpcDependencyHealthCheck`** (from `ZB.MOM.WW.Health`) that probes the
|
||
named-pipe IPC channel to the x86 worker. Tag `["ready"]`. This replaces the hardcoded
|
||
liveness-only response with a real probe that reflects whether the worker is reachable.
|
||
3. **Optionally add a `GrpcDependencyHealthCheck`** for any downstream gRPC dependency (e.g., the
|
||
Galaxy Repository connection) if the gateway is expected to be healthy only when its upstreams are
|
||
reachable. Tag `["ready"]`.
|
||
4. **Call `app.MapZbHealth()`** — this maps `/health/ready` (tag `ready`), `/health/active` (tag
|
||
`active`; initially empty — no active-node concept in MxGateway), and `/healthz` (bare liveness).
|
||
The `/healthz` endpoint replaces the semantic role that `/health/live` served today.
|
||
5. **Do not add `ZB.MOM.WW.Health.Akka`** — MxAccessGateway has no Akka dependency. The consumer
|
||
matrix in the design specifies MxGateway uses the core package only.
|
||
|
||
**Keep bespoke:**
|
||
|
||
- The `WorkerProtocolVersion` / `DefaultBackend` metadata from `GatewayHealthReply` is
|
||
MxAccessGateway-specific; keep it as a separate `/info` endpoint or embed it as `Data` on a
|
||
custom probe rather than normalizing it into the shared contract.
|
||
- The x86 worker itself (net48 console, named-pipe IPC, no HTTP) remains outside the shared health
|
||
scheme. The `GrpcDependencyHealthCheck` observes the worker indirectly from the gateway side.
|
||
- Per-gateway auth and TLS concerns on who may call health endpoints remain per-project.
|
||
|
||
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Health`
|
||
library build. MxGateway is the **highest-priority adopter** (P1 gap — no probes/tiers today)
|
||
and should be the first app wired up once the nupkg is available.
|