review(Host): allow-anonymous /metrics + unconditional LDAP validator

Code review at HEAD 7286d320. Host-001 (High): /metrics was auth-gated on admin
nodes (Prometheus 401) -> AllowAnonymous. Host-002: register LdapOptionsValidator
unconditionally for fail-fast startup validation on admin-only nodes. Host-004: fix
metrics XML doc. Host-003 (docs) left Open.
This commit is contained in:
Joseph Doherty
2026-06-19 10:22:59 -04:00
parent d23e585cdb
commit e4abe186a1
3 changed files with 119 additions and 5 deletions
+104
View File
@@ -0,0 +1,104 @@
# Code Review — Host
<!-- Template for a per-module findings file. Copy to code-reviews/<Module>/findings.md.
See ../../REVIEW-PROCESS.md for the full process. The base README.md is generated
from these files by regen-readme.py — do not edit README.md by hand. -->
| Field | Value |
|---|---|
| Module | `src/Server/ZB.MOM.WW.OtOpcUa.Host` |
| Reviewer | Claude Code |
| Review date | 2026-06-19 |
| Commit reviewed | `7286d320` |
| Status | Reviewed |
| Open findings | 1 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | 1 finding (Host-001) |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | 1 finding (Host-002) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | 1 finding (Host-003) |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | 1 finding (Host-004) |
## Findings
<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
never reused. Findings are never deleted — close them by changing Status and
completing Resolution. -->
### Host-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs:47`, `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs:245` |
| Status | Resolved |
**Description:** `MapOtOpcUaMetrics()` calls `MapZbMetrics()` which calls `MapPrometheusScrapingEndpoint()` without chaining `.AllowAnonymous()`. On admin-role nodes (any node with `hasAdmin=true`), `AddOtOpcUaAuth` registers a `FallbackPolicy = RequireAuthenticatedUser`, which gates every endpoint that does not explicitly opt out. `MapZbHealth` correctly calls `.AllowAnonymous()` on all three health tiers (verified in `ZbHealthEndpointExtensions.cs`), but the Prometheus `/metrics` endpoint does not. The result is that on any admin or admin+driver node the Prometheus scraper receives a `401 Unauthorized` response, breaking observability. Driver-only nodes are unaffected (no auth middleware is added when `hasAdmin=false`).
**Recommendation:** Chain `.AllowAnonymous()` on the `MapZbMetrics()` call site in `Program.cs`, or add it inside `MapOtOpcUaMetrics()` in `ObservabilityExtensions.cs`. The latter is consistent with how `MapOtOpcUaHealth` / `MapZbHealth` are structured.
**Resolution:** Fixed 2026-06-19. Added `.AllowAnonymous()` to `MapOtOpcUaMetrics()` in `ObservabilityExtensions.cs` (verified by build; no unit-test harness for Host). Aligns the metrics endpoint with the health endpoints, which have always called `AllowAnonymous()` via `ZbHealthEndpointExtensions`.
---
### Host-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs:163180`, `src/Server/ZB.MOM.WW.OtOpcUa.Host/Configuration/LdapOptionsValidator.cs` |
| Status | Resolved |
**Description:** `AddValidatedOptions<LdapOptions, LdapOptionsValidator>` is registered only inside the `if (hasDriver)` block. On admin-only nodes (`hasAdmin=true`, `hasDriver=false`) `AddOtOpcUaAuth` registers `IOptions<LdapOptions>` via a plain `.Bind()` call with no validator attached. This means the fail-fast startup guard — including the insecure-transport check (`Transport == None && !AllowInsecure → reject`) — is silently skipped on admin-only deployments. A production admin-only node with a misconfigured or plaintext LDAP section boots without error and the misconfiguration only surfaces as a login failure at runtime.
**Recommendation:** Move `AddValidatedOptions<LdapOptions, LdapOptionsValidator>` above both `if (hasAdmin)` and `if (hasDriver)` so it runs unconditionally. The validator's own guard (`if (!options.Enabled || options.DevStubMode) return;`) already makes it a safe no-op when LDAP is disabled or in dev-stub mode.
**Resolution:** Fixed 2026-06-19. Moved `AddValidatedOptions<LdapOptions, LdapOptionsValidator>` above both role blocks in `Program.cs`, making it unconditional. The validator's own early-return guard keeps it safe on disabled/DevStub configs. Verified by build (no unit-test harness for Host).
---
### Host-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `docs/ServiceHosting.md` (section "Per-role configuration overlays") |
| Status | Open |
**Description:** `docs/ServiceHosting.md` states the configuration loading order as "base `appsettings.json` → role overlay (`appsettings.{role}.json`) → environment overlay (`appsettings.{Environment}.json`) — later layers win." This is incorrect. The actual order established by `Program.cs:5971` is: `appsettings.json``appsettings.{Environment}.json` (WebApplicationBuilder default) → `appsettings.{role}.json` (appended by Program.cs) → environment variables (re-appended) → command-line args (re-appended). The role overlay therefore **wins over** `appsettings.{Environment}.json`, not the other way around. The code behaviour is correct and intentional (explained by the comment at Program.cs:6670); only the doc is wrong.
**Recommendation:** Update `docs/ServiceHosting.md` to reflect the actual precedence: `appsettings.json` < `appsettings.{Environment}.json` < `appsettings.{role}.json` < environment variables < command-line args. Note that the role overlay intentionally outranks the environment-specific JSON so role-level security defaults cannot be overridden by a developer's local `appsettings.Development.json`, while environment variables and command-line args still outrank everything. Docs-only change; no src change needed.
**Resolution:** _(open — docs/ edit outside this pass's src-only scope; no code change required)_
---
### Host-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs:4142` |
| Status | Resolved |
**Description:** The XML doc on `MapOtOpcUaMetrics` said "the default leaves it unauthenticated for local Prometheus scrapes." Before the Host-001 fix this was factually wrong (the endpoint was auth-gated on admin nodes). The phrasing also implied intent to restrict to local use, which is misleading for production deployments where a Prometheus instance scrapes across the network.
**Recommendation:** Update the comment to make the `AllowAnonymous` behaviour explicit and drop the "local" qualifier, since the endpoint is intentionally open to any scraper.
**Resolution:** Fixed 2026-06-19 as part of the Host-001 fix. Comment updated alongside the `AllowAnonymous()` addition. Verified by build (no unit-test harness for Host).
@@ -37,14 +37,16 @@ public static class ObservabilityExtensions
}
/// <summary>
/// Mounts the Prometheus scrape endpoint on the existing ASP.NET pipeline. Call after
/// <c>app.UseAuthentication/UseAuthorization</c> if metrics access should require auth;
/// the default leaves it unauthenticated for local Prometheus scrapes.
/// Mounts the Prometheus <c>/metrics</c> scrape endpoint on the existing ASP.NET pipeline.
/// The endpoint is explicitly marked <c>AllowAnonymous</c> so unauthenticated Prometheus
/// scrapers can reach it regardless of the host's auth fallback policy (which on admin-role
/// nodes is <c>RequireAuthenticatedUser</c>). This mirrors the behaviour of
/// <c>MapZbHealth</c>, which also marks its endpoints anonymous.
/// </summary>
/// <param name="app">The endpoint route builder.</param>
public static IEndpointRouteBuilder MapOtOpcUaMetrics(this IEndpointRouteBuilder app)
{
app.MapZbMetrics();
app.MapZbMetrics().AllowAnonymous();
return app;
}
}
+9 -1
View File
@@ -82,6 +82,13 @@ builder.AddZbSerilog(o => o.ServiceName = "otopcua");
builder.Services.AddOtOpcUaConfigDb(builder.Configuration);
builder.Services.AddOtOpcUaCluster(builder.Configuration);
// Validate LdapOptions unconditionally so ANY role node (admin-only, driver-only, or fused)
// fails fast at startup on a misconfigured or insecure-transport LDAP section.
// The validator's own guard (if !Enabled || DevStubMode) keeps it a safe no-op on disabled/dev
// configs. Previously this was inside the hasDriver block, which left admin-only nodes without
// the startup check (AddOtOpcUaAuth binds LdapOptions but does not attach a validator).
builder.Services.AddValidatedOptions<LdapOptions, LdapOptionsValidator>(builder.Configuration, LdapOptions.SectionName);
if (hasDriver)
{
builder.Services.AddOtOpcUaRuntime();
@@ -160,7 +167,6 @@ if (hasDriver)
ScriptRootLoggerFactory.Build(
sp.GetRequiredService<IScriptLogPublisher>(), scriptLogFilePath, scriptLogTopicMinLevel, Serilog.Log.Logger)));
builder.Services.AddValidatedOptions<LdapOptions, LdapOptionsValidator>(builder.Configuration, LdapOptions.SectionName);
// TryAdd so a fused admin+driver node (where AddOtOpcUaAuth also registers these) ends up
// with exactly one descriptor; on a driver-only node these are the sole registrations.
// OtOpcUaLdapAuthService is the app ILdapAuthService (Enabled switch + DevStubMode over the
@@ -168,6 +174,8 @@ if (hasDriver)
// <string> per call to turn the directory's groups into roles, so register it here for driver-
// only nodes (AddOtOpcUaAuth registers it on admin nodes); ILdapGroupRoleMappingService it
// depends on is already registered unconditionally by AddOtOpcUaConfigDb above.
// Note: AddValidatedOptions<LdapOptions, LdapOptionsValidator> is now registered unconditionally
// above both role blocks so admin-only nodes also get fail-fast LDAP startup validation.
builder.Services.TryAddSingleton<ILdapAuthService, OtOpcUaLdapAuthService>();
builder.Services.TryAddScoped<IGroupRoleMapper<string>, OtOpcUaGroupRoleMapper>();
builder.Services.AddSingleton<IOpcUaUserAuthenticator, LdapOpcUaUserAuthenticator>();