Files
lmxopcua/code-reviews/Host/findings.md
T
Joseph Doherty e4abe186a1 review(Host): allow-anonymous /metrics + unconditional LDAP validator
Code review at HEAD 7286d320. Host-001 (High): /metrics was auth-gated on admin
nodes (Prometheus 401) -> AllowAnonymous. Host-002: register LdapOptionsValidator
unconditionally for fail-fast startup validation on admin-only nodes. Host-004: fix
metrics XML doc. Host-003 (docs) left Open.
2026-06-19 10:22:59 -04:00

105 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — Host
<!-- Template for a per-module findings file. Copy to code-reviews/<Module>/findings.md.
See ../../REVIEW-PROCESS.md for the full process. The base README.md is generated
from these files by regen-readme.py — do not edit README.md by hand. -->
| Field | Value |
|---|---|
| Module | `src/Server/ZB.MOM.WW.OtOpcUa.Host` |
| Reviewer | Claude Code |
| Review date | 2026-06-19 |
| Commit reviewed | `7286d320` |
| Status | Reviewed |
| Open findings | 1 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | 1 finding (Host-001) |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | 1 finding (Host-002) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | 1 finding (Host-003) |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | 1 finding (Host-004) |
## Findings
<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
never reused. Findings are never deleted — close them by changing Status and
completing Resolution. -->
### Host-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs:47`, `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs:245` |
| Status | Resolved |
**Description:** `MapOtOpcUaMetrics()` calls `MapZbMetrics()` which calls `MapPrometheusScrapingEndpoint()` without chaining `.AllowAnonymous()`. On admin-role nodes (any node with `hasAdmin=true`), `AddOtOpcUaAuth` registers a `FallbackPolicy = RequireAuthenticatedUser`, which gates every endpoint that does not explicitly opt out. `MapZbHealth` correctly calls `.AllowAnonymous()` on all three health tiers (verified in `ZbHealthEndpointExtensions.cs`), but the Prometheus `/metrics` endpoint does not. The result is that on any admin or admin+driver node the Prometheus scraper receives a `401 Unauthorized` response, breaking observability. Driver-only nodes are unaffected (no auth middleware is added when `hasAdmin=false`).
**Recommendation:** Chain `.AllowAnonymous()` on the `MapZbMetrics()` call site in `Program.cs`, or add it inside `MapOtOpcUaMetrics()` in `ObservabilityExtensions.cs`. The latter is consistent with how `MapOtOpcUaHealth` / `MapZbHealth` are structured.
**Resolution:** Fixed 2026-06-19. Added `.AllowAnonymous()` to `MapOtOpcUaMetrics()` in `ObservabilityExtensions.cs` (verified by build; no unit-test harness for Host). Aligns the metrics endpoint with the health endpoints, which have always called `AllowAnonymous()` via `ZbHealthEndpointExtensions`.
---
### Host-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs:163180`, `src/Server/ZB.MOM.WW.OtOpcUa.Host/Configuration/LdapOptionsValidator.cs` |
| Status | Resolved |
**Description:** `AddValidatedOptions<LdapOptions, LdapOptionsValidator>` is registered only inside the `if (hasDriver)` block. On admin-only nodes (`hasAdmin=true`, `hasDriver=false`) `AddOtOpcUaAuth` registers `IOptions<LdapOptions>` via a plain `.Bind()` call with no validator attached. This means the fail-fast startup guard — including the insecure-transport check (`Transport == None && !AllowInsecure → reject`) — is silently skipped on admin-only deployments. A production admin-only node with a misconfigured or plaintext LDAP section boots without error and the misconfiguration only surfaces as a login failure at runtime.
**Recommendation:** Move `AddValidatedOptions<LdapOptions, LdapOptionsValidator>` above both `if (hasAdmin)` and `if (hasDriver)` so it runs unconditionally. The validator's own guard (`if (!options.Enabled || options.DevStubMode) return;`) already makes it a safe no-op when LDAP is disabled or in dev-stub mode.
**Resolution:** Fixed 2026-06-19. Moved `AddValidatedOptions<LdapOptions, LdapOptionsValidator>` above both role blocks in `Program.cs`, making it unconditional. The validator's own early-return guard keeps it safe on disabled/DevStub configs. Verified by build (no unit-test harness for Host).
---
### Host-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `docs/ServiceHosting.md` (section "Per-role configuration overlays") |
| Status | Open |
**Description:** `docs/ServiceHosting.md` states the configuration loading order as "base `appsettings.json` → role overlay (`appsettings.{role}.json`) → environment overlay (`appsettings.{Environment}.json`) — later layers win." This is incorrect. The actual order established by `Program.cs:5971` is: `appsettings.json``appsettings.{Environment}.json` (WebApplicationBuilder default) → `appsettings.{role}.json` (appended by Program.cs) → environment variables (re-appended) → command-line args (re-appended). The role overlay therefore **wins over** `appsettings.{Environment}.json`, not the other way around. The code behaviour is correct and intentional (explained by the comment at Program.cs:6670); only the doc is wrong.
**Recommendation:** Update `docs/ServiceHosting.md` to reflect the actual precedence: `appsettings.json` < `appsettings.{Environment}.json` < `appsettings.{role}.json` < environment variables < command-line args. Note that the role overlay intentionally outranks the environment-specific JSON so role-level security defaults cannot be overridden by a developer's local `appsettings.Development.json`, while environment variables and command-line args still outrank everything. Docs-only change; no src change needed.
**Resolution:** _(open — docs/ edit outside this pass's src-only scope; no code change required)_
---
### Host-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs:4142` |
| Status | Resolved |
**Description:** The XML doc on `MapOtOpcUaMetrics` said "the default leaves it unauthenticated for local Prometheus scrapes." Before the Host-001 fix this was factually wrong (the endpoint was auth-gated on admin nodes). The phrasing also implied intent to restrict to local use, which is misleading for production deployments where a Prometheus instance scrapes across the network.
**Recommendation:** Update the comment to make the `AllowAnonymous` behaviour explicit and drop the "local" qualifier, since the endpoint is intentionally open to any scraper.
**Resolution:** Fixed 2026-06-19 as part of the Host-001 fix. Comment updated alongside the `AllowAnonymous()` addition. Verified by build (no unit-test harness for Host).