code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.
regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ScadaLink.Host` |
|
||||
| Design doc | `docs/requirements/Component-Host.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 0 |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -48,6 +48,38 @@ Serilog sink setup is hard-coded in `Program.cs` rather than configuration-drive
|
||||
REQ-HOST-8 requires (Host-014), and `StartupRetry` retries indiscriminately on every
|
||||
exception type including permanent schema-validation failures (Host-015).
|
||||
|
||||
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
||||
|
||||
All fifteen prior findings (Host-001..015) remain `Resolved` in the current tree
|
||||
and the regressions introduced for them — Host-001's predicate, the externalised
|
||||
secrets, the Site GrpcPort/RemotingPort/seed-port validation rules, the escaped
|
||||
HOCON builder with `DownIfAlone` and millisecond-precision durations, the
|
||||
configuration-driven Serilog sinks, the transient-only `StartupRetry`
|
||||
classifier — are all still in place. This re-review walked the ten checklist
|
||||
categories over the full module again and recorded seven new findings, none of
|
||||
them crash/data-loss class. Host-016 (Medium) mirrors the resolved Host-004
|
||||
shipped-config bug on the **Communication** side: `appsettings.Site.json`'s
|
||||
second `CentralContactPoints` entry points at the site's own remoting port
|
||||
(`localhost:8082`) instead of central, an incorrect dev example that copies
|
||||
into multi-central deployments. Host-017 (Medium) flags a partial REQ-HOST-7
|
||||
implementation — the documented site-shutdown ordering (stop accepting streams
|
||||
first, cancel active streams via `IHostApplicationLifetime.ApplicationStopping`,
|
||||
then tear down actors) is not wired: the site path registers no
|
||||
`ApplicationStopping` handler that signals `SiteStreamGrpcServer`, and the gRPC
|
||||
server exposes no cancel-all-streams entry point. The remaining five are Low:
|
||||
`NodeOptions.NodeName` (the operator-configured value stamped on
|
||||
`AuditLog.SourceNode`) is absent from both shipped per-role configs even though
|
||||
the docker per-node configs set it (Host-018); the migration `StartupRetry`
|
||||
call passes `default` for `CancellationToken`, so a SIGTERM during the
|
||||
bounded-retry window is ignored for up to ~2 minutes (Host-019);
|
||||
`LoggerConfigurationFactory` layers `MinimumLevel.Is` over
|
||||
`ReadFrom.Configuration`, so any `Serilog:MinimumLevel` an operator sets is
|
||||
silently overridden by `ScadaLink:Logging:MinimumLevel` (Host-020); the
|
||||
shipped `appsettings.json` carries a Microsoft `Logging:LogLevel` block but
|
||||
Serilog is the only logger provider and the section is dead config (Host-021);
|
||||
and `ParseLevel` silently swallows an unrecognised `MinimumLevel` value (e.g.
|
||||
a typo) and falls back to `Information` with no warning (Host-022).
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
@@ -63,6 +95,21 @@ exception type including permanent schema-validation failures (Host-015).
|
||||
| 9 | Testing coverage | ☑ | Strong suite; regression tests added for Host-001/004/006/007/010/011. No coverage for the new `down-if-alone`, sub-second-duration, or non-transient-retry paths (Host-012/013/015). |
|
||||
| 10 | Documentation & comments | ☑ | REQ-HOST-6 stale-doc resolved. Re-review: REQ-HOST-8 says sinks are "configuration-driven" but they are code-defined (Host-014). |
|
||||
|
||||
_Re-review (2026-05-28, `1eb6e97`):_
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ☑ | Re-review: `appsettings.Site.json` second `CentralContactPoints` entry targets the site's own remoting port instead of central (Host-016) — same defect class as the resolved Host-004 seed-list bug. |
|
||||
| 2 | Akka.NET conventions | ☑ | CoordinatedShutdown, receptionist registration, singleton scoping, role-scoped site singletons, ClusterClient initial-contact wiring all reviewed; no new issues. |
|
||||
| 3 | Concurrency & thread safety | ☑ | `_trackedDisposables` is locked on both sides of the lifecycle; `_actorSystem` publication is safe via the IHost startup `await` boundary. New Low: `StartupRetry` migration call passes `default` `CancellationToken`, so SIGTERM during the retry window is ignored (Host-019). |
|
||||
| 4 | Error handling & resilience | ☑ | `IsTransientDatabaseFault` correctly classifies socket / timeout / SqlException; the retry helper itself remains sound. Host-019 is the resilience gap. |
|
||||
| 5 | Security | ☑ | Secrets stay externalised; the `_secrets` placeholder comment is intact. No new issues. |
|
||||
| 6 | Performance & resource management | ☑ | No new undisposed resources; gRPC stream lifetime cap remains correct. No new issues. |
|
||||
| 7 | Design-document adherence | ☑ | Re-review: REQ-HOST-7 site-shutdown ordering — stop accepting new streams, cancel active streams via `ApplicationStopping`, then tear down actors — is not wired in `Program.cs` (Host-017). |
|
||||
| 8 | Code organization & conventions | ☑ | Re-review: `NodeOptions.NodeName` is absent from the shipped per-role configs even though it stamps `AuditLog.SourceNode` (Host-018); the appsettings `Logging:LogLevel` Microsoft section is dead config under Serilog (Host-021). |
|
||||
| 9 | Testing coverage | ☑ | Strong existing suite. No coverage for the Site `CentralContactPoints` second-entry rule (Host-016), the site-shutdown ordering (Host-017), the `NodeName`-absent shipped config (Host-018), the unused `CancellationToken` parameter (Host-019), the `MinimumLevel.Is` override semantics (Host-020) or the `ParseLevel` silent fallback (Host-022). |
|
||||
| 10 | Documentation & comments | ☑ | Re-review: layered `MinimumLevel.Is` / `ReadFrom.Configuration` semantics are not surfaced — an operator-set `Serilog:MinimumLevel` is silently overridden by `ScadaLink:Logging:MinimumLevel` (Host-020); `ParseLevel` silently coerces a misspelled level to `Information` with no warning (Host-022). |
|
||||
|
||||
## Findings
|
||||
|
||||
### Host-001 — `/health/ready` includes the leader-only `active-node` check
|
||||
@@ -777,3 +824,278 @@ site now passes it. Regression tests in `StartupRetryTests`:
|
||||
when `isTransient` returns false) and `ExecuteWithRetry_TransientThenPermanent_StopsAtPermanent`
|
||||
(retries a `TimeoutException` then stops at a permanent `InvalidOperationException`).
|
||||
Full Host suite green (182 passed).
|
||||
|
||||
### Host-016 — Site `CentralContactPoints` second entry targets the site's own remoting port
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Host/appsettings.Site.json:33-37` |
|
||||
|
||||
**Description**
|
||||
|
||||
The shipped site config sets `Node:RemotingPort = 8082` and lists
|
||||
`Communication:CentralContactPoints` as
|
||||
`["akka.tcp://scadalink@localhost:8081", "akka.tcp://scadalink@localhost:8082"]`.
|
||||
The second contact point — port `8082` — is the **site's own** remoting endpoint,
|
||||
not a central node. `SiteCommunicationActor` / `ClusterClient` uses these
|
||||
addresses as initial contacts when discovering the central
|
||||
`ClusterClientReceptionist`; a contact pointing at the site itself can never
|
||||
reach the central receptionist and will be a permanent failure in the
|
||||
initial-contact rotation. For the single-node dev loopback layout the first
|
||||
contact (`8081`, central) succeeds and the bug is masked, but this is exactly
|
||||
the kind of dev-config "example" that gets duplicated into multi-central
|
||||
deployments — the same failure mode the resolved Host-004 finding called out
|
||||
for the seed-node list. `StartupValidator` validates seed nodes against the
|
||||
gRPC port (Host-004) but does not cross-check `CentralContactPoints` against
|
||||
the site's own `RemotingPort`, so the misconfiguration passes silently.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Correct the shipped site example to list two central remoting endpoints (e.g.
|
||||
`localhost:8081` for `central-a` and a distinct port for `central-b` in a
|
||||
multi-node layout). Consider extending `StartupValidator` to reject any
|
||||
`Communication:CentralContactPoints` entry whose host+port matches this site
|
||||
node's `NodeHostname`+`RemotingPort`. Add a regression test in
|
||||
`StartupValidatorTests` mirroring `Site_SeedNodeOnGrpcPort_FailsValidation`.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Open._
|
||||
|
||||
### Host-017 — Site-shutdown ordering from REQ-HOST-7 is not wired
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Host/Program.cs:229-265`, `src/ScadaLink.Communication/Grpc/SiteStreamGrpcServer.cs` |
|
||||
|
||||
**Description**
|
||||
|
||||
REQ-HOST-7 documents an explicit four-step shutdown sequence for site nodes:
|
||||
"(1) On `CoordinatedShutdown`, stop accepting new gRPC streams first.
|
||||
(2) Cancel all active gRPC streams (triggering client-side reconnect).
|
||||
(3) Tear down actors.
|
||||
(4) Use `IHostApplicationLifetime.ApplicationStopping` to signal the gRPC
|
||||
server." The site path in `Program.cs` (the `role == "Site"` branch) registers
|
||||
no `IHostApplicationLifetime.ApplicationStopping` callback, and
|
||||
`SiteStreamGrpcServer` exposes no "stop accepting" / "cancel all streams"
|
||||
entry point — it has `SetReady` but no corresponding `SetUnavailable` or
|
||||
`CancelAllStreams`. In practice, on `SIGTERM` Kestrel closes its listener
|
||||
naturally and `AkkaHostedService.StopAsync` runs Akka `CoordinatedShutdown`,
|
||||
but there is no explicit, ordered handoff that meets the documented contract:
|
||||
in-flight streams are not actively cancelled before actors begin tearing down,
|
||||
so clients see a stream that goes silent (and only times out via gRPC
|
||||
keepalive) rather than a clean `Cancelled` they can reconnect on. This is a
|
||||
contract-vs-code drift — either the design doc is overstating what is
|
||||
implemented, or the implementation is incomplete.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add a `SiteStreamGrpcServer.CancelAllStreams()` method that flips a "shutting
|
||||
down" flag (so `SubscribeSite` immediately fails new streams with
|
||||
`StatusCode.Unavailable`) and cancels every entry's `Cts` in the `_streams`
|
||||
map. In `Program.cs` site branch, resolve `IHostApplicationLifetime` and
|
||||
register a callback on `ApplicationStopping` that calls `CancelAllStreams()`
|
||||
before the Akka hosted service runs `CoordinatedShutdown` (or order via
|
||||
`AkkaHostedService.StopAsync` itself — `IHostedService.StopAsync` runs in
|
||||
reverse-registration order, so the gRPC server's lifetime can be sequenced
|
||||
before Akka shutdown). Alternatively, reconcile REQ-HOST-7 with the actual
|
||||
implementation if the explicit ordering is no longer intended. Add an
|
||||
integration test under `tests/ScadaLink.Host.Tests` that starts a site host,
|
||||
opens a stream, triggers shutdown, and asserts the stream completes with
|
||||
`Cancelled` before the actor system tears down.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Open._
|
||||
|
||||
### Host-018 — Shipped per-role configs omit `NodeOptions.NodeName`, leaving `SourceNode` null
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Host/appsettings.Central.json`, `src/ScadaLink.Host/appsettings.Site.json`, `src/ScadaLink.Host/NodeOptions.cs:10-16` |
|
||||
|
||||
**Description**
|
||||
|
||||
`NodeOptions.NodeName` is documented as "the operator-configured semantic node
|
||||
name used to stamp the SourceNode column on audit rows", with conventional
|
||||
values `node-a`/`node-b` for site nodes and `central-a`/`central-b` for
|
||||
central nodes. The CLAUDE.md "Centralized Audit Log" key-decision section
|
||||
calls this out: `SourceNode` is meant to be carried verbatim through audit
|
||||
telemetry and reconciliation, and is indexed via
|
||||
`IX_AuditLog_Node_Occurred (SourceNode, OccurredAtUtc)`. The docker per-node
|
||||
configs (`docker/central-node-a/appsettings.Central.json`,
|
||||
`docker/site-a-node-a/appsettings.Site.json`, etc.) all set
|
||||
`ScadaLink:Node:NodeName`. The **shipped, default** per-role files in
|
||||
`src/ScadaLink.Host/` — the templates a developer running the binary
|
||||
directly will use — do not. `NodeIdentityProvider` normalises an empty
|
||||
`NodeName` to `null`, so dev audit rows carry a null `SourceNode` and the
|
||||
indexed lookup never narrows. The dev examples should match the docker
|
||||
examples; at minimum the field should appear in the shipped templates with a
|
||||
placeholder explaining the convention.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add `"NodeName": "central-a"` (or a placeholder like `"${NODE_NAME}"`) to
|
||||
`appsettings.Central.json` and `"NodeName": "node-a"` to
|
||||
`appsettings.Site.json`, with a short comment that the value must be set
|
||||
per-node in multi-node deployments. Consider validating in `StartupValidator`
|
||||
that `NodeName` is non-empty, or accept the null and document explicitly that
|
||||
single-node dev deployments leave `SourceNode` null.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Open._
|
||||
|
||||
### Host-019 — Migration `StartupRetry` call drops the host `CancellationToken`
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Host/Program.cs:154-165` |
|
||||
|
||||
**Description**
|
||||
|
||||
`StartupRetry.ExecuteWithRetryAsync` accepts an optional
|
||||
`CancellationToken cancellationToken = default` and observes it both at the
|
||||
top of each attempt and inside the `Task.Delay` between retries. The migration
|
||||
call site in `Program.cs` passes no token, so the helper runs with
|
||||
`CancellationToken.None`. With `maxAttempts: 8`, `initialDelay: 2s`, and the
|
||||
30s cap, a database that stays unreachable can keep the retry loop alive for
|
||||
~2 minutes before the host process responds to `SIGTERM` / `Ctrl+C` /
|
||||
Windows-Service stop. The `Program.cs` startup pipeline does not yet have a
|
||||
host-lifetime token to forward at this point (the `app` is built but not
|
||||
yet running), but `app.Lifetime.ApplicationStopping` is available the moment
|
||||
`builder.Build()` returns. Threading it into the retry call honours the host
|
||||
lifecycle and matches the helper's documented contract.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Pass `app.Lifetime.ApplicationStopping` (or `CancellationToken.None`
|
||||
explicitly with a comment if intentional) into
|
||||
`StartupRetry.ExecuteWithRetryAsync`. Add a `StartupRetryTests` case
|
||||
exercising token-cancellation mid-backoff.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Open._
|
||||
|
||||
### Host-020 — `MinimumLevel.Is` silently overrides any operator-set `Serilog:MinimumLevel`
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Host/LoggerConfigurationFactory.cs:36-43` |
|
||||
|
||||
**Description**
|
||||
|
||||
`LoggerConfigurationFactory.Build` reads the `Serilog` configuration section
|
||||
via `ReadFrom.Configuration(configuration)` (which can include a
|
||||
`MinimumLevel` block — the standard Serilog way to set the floor) and **then**
|
||||
calls `.MinimumLevel.Is(minimumLevel)` derived from
|
||||
`ScadaLink:Logging:MinimumLevel`. Serilog's fluent builder applies the later
|
||||
call, so any `Serilog:MinimumLevel:Default` an operator sets is silently
|
||||
overridden by `ScadaLink:Logging:MinimumLevel` (or by its
|
||||
`Information` fallback when the ScadaLink key is absent). There are now two
|
||||
documented configuration paths for the same setting with non-obvious
|
||||
precedence, and the override direction is the opposite of what most Serilog
|
||||
users would expect (the more-specific `Serilog` section being the authority).
|
||||
The XML doc on `Build` says "the explicit `MinimumLevel.Is` pins the floor"
|
||||
but does not warn that the floor *overrides* the Serilog section's own
|
||||
`MinimumLevel`.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Pick one mechanism: either (a) drop the `MinimumLevel.Is` call and let
|
||||
`ReadFrom.Configuration` consume `Serilog:MinimumLevel`, migrating any docs/
|
||||
deployments that reference `ScadaLink:Logging:MinimumLevel`; or (b) keep the
|
||||
current "ScadaLink:Logging" path and reject `Serilog:MinimumLevel` if present
|
||||
(throw at startup so the operator sees the conflict). At minimum, expand the
|
||||
XML doc + REQ-HOST-8 to spell out the precedence explicitly.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Open._
|
||||
|
||||
### Host-021 — Microsoft `Logging:LogLevel` section in `appsettings.json` is dead config under Serilog
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Host/appsettings.json:2-6` |
|
||||
|
||||
**Description**
|
||||
|
||||
`appsettings.json` carries a Microsoft `Logging:LogLevel:Default = Information`
|
||||
block. The `Logging:LogLevel` map is consumed by
|
||||
`Microsoft.Extensions.Logging.ConfigurationConsoleLoggerOptions` and similar
|
||||
provider configurations bound from the standard `Logging` section. The Host
|
||||
calls `builder.Host.UseSerilog()`, which replaces the default
|
||||
`ILoggerFactory` setup with Serilog as the **only** logger provider; Serilog
|
||||
reads from `configuration.ReadFrom.Configuration(...)` which consumes the
|
||||
`Serilog` section, **not** `Logging:LogLevel`. The result is that an operator
|
||||
editing `Logging:LogLevel:Default` (a very natural thing to try, since it is
|
||||
the .NET convention) sees no behaviour change — the section is dead config.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either remove the `Logging:LogLevel` block from `appsettings.json` (Serilog
|
||||
owns logging configuration in this Host), or replace it with a brief comment
|
||||
explaining it is intentionally retained for non-Serilog tooling. Document the
|
||||
authoritative location (`Serilog` + `ScadaLink:Logging`) in
|
||||
`Component-Host.md` REQ-HOST-8 if not already explicit.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Open._
|
||||
|
||||
### Host-022 — `ParseLevel` silently coerces unrecognised `MinimumLevel` to `Information`
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Host/LoggerConfigurationFactory.cs:50-55` |
|
||||
|
||||
**Description**
|
||||
|
||||
`LoggerConfigurationFactory.ParseLevel` uses
|
||||
`Enum.TryParse<LogEventLevel>(level, ignoreCase: true, out var parsed)` and
|
||||
returns `LogEventLevel.Information` when parsing fails — without logging the
|
||||
fallback. An operator who sets
|
||||
`ScadaLink:Logging:MinimumLevel = "Informaiton"` (a common typo) or
|
||||
`"Verbose,Debug"` or any unrecognised value gets the default level silently;
|
||||
there is no warning, no log line, no startup error. Combined with Host-020
|
||||
(this is the only mechanism that pins the floor), a misspelt value is
|
||||
invisible until someone wonders why the level change "didn't take". The
|
||||
helper is small and could either fail-fast in `StartupValidator` or emit a
|
||||
console warning before the logger is configured.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In `LoggerConfigurationFactory.Build`, when `loggingOptions.MinimumLevel` is
|
||||
non-null/non-blank but does not parse to a valid `LogEventLevel`, write a
|
||||
`Console.Error.WriteLine` warning (the logger is not yet built) and proceed
|
||||
with `Information`. Alternatively, validate the value in `StartupValidator`
|
||||
and fail fast — that matches the pattern used for other ScadaLink
|
||||
configuration keys. Add a `LoggerConfigurationTests` case asserting the
|
||||
behaviour you choose.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Open._
|
||||
|
||||
Reference in New Issue
Block a user