Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
397 lines
17 KiB
Markdown
397 lines
17 KiB
Markdown
# Code Review — Host
|
||
|
||
| Field | Value |
|
||
|-------|-------|
|
||
| Module | `src/ScadaLink.Host` |
|
||
| Design doc | `docs/requirements/Component-Host.md` |
|
||
| Status | Reviewed |
|
||
| Last reviewed | 2026-05-16 |
|
||
| Reviewer | claude-agent |
|
||
| Commit reviewed | `9c60592` |
|
||
| Open findings | 11 |
|
||
|
||
## Summary
|
||
|
||
The Host module is the composition root for the entire ScadaLink system: a single
|
||
binary whose behaviour (`Central` vs `Site`) is driven entirely by configuration. The
|
||
implementation is generally faithful to `Component-Host.md` — startup validation,
|
||
role-based registration, Serilog enrichment, Windows Service support, dead-letter
|
||
monitoring, CoordinatedShutdown, and gRPC hosting on site nodes are all present and
|
||
backed by a solid test suite (`tests/ScadaLink.Host.Tests`).
|
||
|
||
The most significant problem is the readiness endpoint: `/health/ready` runs **all**
|
||
registered health checks, including the leader-only `active-node` check, so a fully
|
||
operational *standby* central node permanently reports `503` on `/health/ready` —
|
||
directly contradicting REQ-HOST-4a, which defines readiness as cluster membership +
|
||
DB connectivity (not leadership). Several other findings concern configuration that
|
||
is validated-but-never-consumed (`MachineDataDb`), design-doc drift (Akka.Persistence
|
||
is required by REQ-HOST-6 but the system uses no persistent actors), an incorrect
|
||
seed-node entry in the shipped site config, blocking sync-over-async during startup,
|
||
and unguarded string interpolation when building HOCON. None are crash/data-loss
|
||
class, but the readiness bug is High because it breaks load-balancer behaviour with
|
||
no safe workaround.
|
||
|
||
## Checklist coverage
|
||
|
||
| # | Category | Examined | Notes |
|
||
|---|----------|----------|-------|
|
||
| 1 | Correctness & logic bugs | ☑ | `/health/ready` includes the leader-only check (Host-001); site seed-node config points at the gRPC port (Host-004). |
|
||
| 2 | Akka.NET conventions | ☑ | CoordinatedShutdown, receptionist registration, singleton scoping all correct. HOCON built by raw string interpolation (Host-006); `StartAsync` returns before actors are confirmed running (Host-009). |
|
||
| 3 | Concurrency & thread safety | ☑ | Blocking `GetAwaiter().GetResult()` on a hosted-service startup thread (Host-005). `DeadLetterMonitorActor` state is actor-confined — no issues. |
|
||
| 4 | Error handling & resilience | ☑ | Top-level try/catch logs fatal and rethrows. No retry around DB migration / readiness preconditions (Host-010). |
|
||
| 5 | Security | ☑ | Plaintext DB password, LDAP service-account password and dev JWT key checked into `appsettings.Central.json` (Host-003). |
|
||
| 6 | Performance & resource management | ☑ | No undisposed resources. Inbound API script compilation is a synchronous startup loop — acceptable. |
|
||
| 7 | Design-document adherence | ☑ | REQ-HOST-6 mandates Akka.Persistence config but none exists and no persistent actors exist — doc is stale (Host-002). REQ-HOST-4 GrpcPort-≠-RemotingPort rule not enforced (Host-007). |
|
||
| 8 | Code organization & conventions | ☑ | `MachineDataDb` validated/declared but never consumed (Host-008). `LoggingOptions.MinimumLevel` is dead (Host-011). |
|
||
| 9 | Testing coverage | ☑ | Strong suite; no test asserts `/health/ready` excludes `active-node`, which is why Host-001 slipped through (noted in Host-001). |
|
||
| 10 | Documentation & comments | ☑ | Comments are accurate. REQ-HOST-6 in the design doc is the main stale-doc item (Host-002). |
|
||
|
||
## Findings
|
||
|
||
### Host-001 — `/health/ready` includes the leader-only `active-node` check
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | High |
|
||
| Category | Correctness & logic bugs |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/Program.cs:135-145` |
|
||
|
||
**Description**
|
||
|
||
`/health/ready` is mapped with `MapHealthChecks("/health/ready", ...)` and **no
|
||
`Predicate`**, so it executes every registered check: `database`, `akka-cluster`
|
||
*and* `active-node`. `ActiveNodeHealthCheck` (`Health/ActiveNodeHealthCheck.cs:38`)
|
||
returns `Unhealthy` on any node that is not the cluster leader. As a result a
|
||
standby central node that is fully operational (cluster member `Up`, database
|
||
reachable) still returns `503` on `/health/ready`. This contradicts REQ-HOST-4a,
|
||
which defines readiness as cluster membership + DB connectivity + singletons —
|
||
explicitly *not* leadership. `/health/active` is the endpoint intended to report
|
||
leadership. A load balancer using `/health/ready` to decide whether a node may
|
||
serve traffic will permanently treat the standby as unready, defeating failover
|
||
readiness. No test covers this: `HealthCheckTests.HealthReady_Endpoint_ReturnsResponse`
|
||
only asserts a response is returned, not the standby semantics.
|
||
|
||
**Recommendation**
|
||
|
||
Add a `Predicate` to the `/health/ready` mapping that excludes the `active-node`
|
||
check, e.g. `Predicate = check => check.Name != "active-node"` (or tag the readiness
|
||
checks and filter by tag). Add a regression test asserting a non-leader node returns
|
||
`200` on `/health/ready`.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-002 — Akka.Persistence required by REQ-HOST-6 is not configured and not used
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Design-document adherence |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:70-108` |
|
||
|
||
**Description**
|
||
|
||
REQ-HOST-6 states the Host "must configure the Akka.NET actor system using
|
||
Akka.Hosting with ... **Persistence**: Configured with the appropriate journal and
|
||
snapshot store (SQL for central, SQLite for site)." The HOCON built in
|
||
`AkkaHostedService.StartAsync` contains no `akka.persistence` section, no journal and
|
||
no snapshot-store plugin, and `ScadaLink.Host.csproj` references neither
|
||
`Akka.Persistence.Hosting` nor any persistence plugin (the design doc Dependencies
|
||
list `Akka.Persistence.Hosting`). A repo-wide search finds **no** `PersistentActor` /
|
||
`ReceivePersistentActor` subclasses — the system deliberately uses custom SQLite
|
||
storage services instead. The code is internally consistent, but the design document
|
||
is stale: it mandates a subsystem that does not exist. This is a documented-vs-actual
|
||
drift that will mislead future maintainers and any audit against REQ-HOST-6.
|
||
|
||
**Recommendation**
|
||
|
||
Update `Component-Host.md` REQ-HOST-6 and the Dependencies list to remove the
|
||
Akka.Persistence requirement (or explicitly state persistence is provided by
|
||
component-owned SQLite storage, not Akka.Persistence). If persistence *is* intended,
|
||
add the plugin packages and HOCON. Either way, code and doc must agree.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-003 — Secrets committed in plaintext in `appsettings.Central.json`
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Security |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/appsettings.Central.json:20-31` |
|
||
|
||
**Description**
|
||
|
||
`appsettings.Central.json` contains real-looking secrets in plaintext, checked into
|
||
source control: SQL Server passwords in the `ConfigurationDb` / `MachineDataDb`
|
||
connection strings (`Password=ScadaLink_Dev1#`), an LDAP service-account password
|
||
(`LdapServiceAccountPassword: "password"`), and a JWT signing key
|
||
(`JwtSigningKey: "scadalink-dev-jwt-signing-key-..."`). Even though these are
|
||
intended as development defaults, shipping them in the default config invites them
|
||
being reused verbatim in production, and a committed JWT signing key allows anyone
|
||
with repo access to forge session tokens. `TrustServerCertificate=true` additionally
|
||
disables TLS validation for the SQL connection.
|
||
|
||
**Recommendation**
|
||
|
||
Move all secrets out of committed `appsettings*.json` into environment variables,
|
||
user-secrets, or a secret store. Keep only non-sensitive structural defaults in the
|
||
file and document the required environment variables. At minimum add a clear comment
|
||
that these values are dev-only and must be overridden, and rotate the JWT key per
|
||
environment.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-004 — Site seed-node list points at the gRPC port, not a remoting port
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Correctness & logic bugs |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/appsettings.Site.json:10-19` |
|
||
|
||
**Description**
|
||
|
||
The shipped site config sets `Node:RemotingPort = 8082` and `Node:GrpcPort = 8083`,
|
||
but `Cluster:SeedNodes` is `["akka.tcp://scadalink@localhost:8082",
|
||
"akka.tcp://scadalink@localhost:8083"]`. The second seed node targets `8083`, which
|
||
is the Kestrel HTTP/2 gRPC port — not an Akka remoting endpoint. A node attempting to
|
||
join via that seed will try to establish an Akka.Remote TCP association against the
|
||
gRPC listener and fail. `StartupValidator` only checks that ≥2 seed nodes exist
|
||
(`StartupValidator.cs:54-56`), so this misconfiguration passes validation silently.
|
||
For the single-node dev site it is harmless (the first seed succeeds), but it is an
|
||
incorrect example that will be copied into multi-node site configs.
|
||
|
||
**Recommendation**
|
||
|
||
Correct the site seed-node list to reference the two site nodes' *remoting* ports
|
||
(e.g. `8082` and `8084`), never the gRPC port. Consider extending `StartupValidator`
|
||
to reject a seed node whose port equals this node's `GrpcPort`.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-005 — Blocking sync-over-async (`GetAwaiter().GetResult()`) inside `StartAsync`
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Concurrency & thread safety |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:345` |
|
||
|
||
**Description**
|
||
|
||
`RegisterSiteActors` calls `storeAndForwardService.StartAsync().GetAwaiter().GetResult()`
|
||
synchronously, blocking inside the `IHostedService.StartAsync` path. `StartAsync` is
|
||
itself declared synchronous (returns `Task.CompletedTask`), so the work cannot be
|
||
awaited cleanly. Blocking on async work risks thread-pool starvation during startup
|
||
and, if the awaited operation captures a synchronization context, deadlock. It also
|
||
hides exceptions behind an `AggregateException` wrapper.
|
||
|
||
**Recommendation**
|
||
|
||
Make `AkkaHostedService.StartAsync` genuinely `async` and `await
|
||
storeAndForwardService.StartAsync(cancellationToken)`. Propagate the
|
||
`CancellationToken` and let exceptions surface as the original type.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-006 — HOCON assembled by unescaped string interpolation
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Akka.NET conventions |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:70-108` |
|
||
|
||
**Description**
|
||
|
||
The Akka HOCON is built with an interpolated string that injects
|
||
`_nodeOptions.NodeHostname`, `_clusterOptions.SeedNodes`, the computed roles, and
|
||
`SplitBrainResolverStrategy` directly into the configuration text. Values are not
|
||
escaped. A hostname or seed-node string containing a quote, backslash, brace, or
|
||
comment sequence would corrupt the HOCON and produce a confusing parse error far from
|
||
the real cause; `SplitBrainResolverStrategy` is interpolated without quoting, so a
|
||
value with whitespace breaks the document. Building cluster configuration from raw
|
||
string concatenation is also harder to maintain than the typed Akka.Hosting builder
|
||
the design doc (REQ-HOST-6) actually calls for ("via Akka.Hosting").
|
||
|
||
**Recommendation**
|
||
|
||
Prefer the `Akka.Hosting` `AddAkka(...)` builder with strongly-typed `WithRemoting`,
|
||
`WithClustering`, and split-brain-resolver configuration instead of hand-built HOCON.
|
||
If HOCON must be retained, validate/escape interpolated values (especially hostname
|
||
and seed nodes) before substitution.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-007 — REQ-HOST-4 rule "GrpcPort ≠ RemotingPort" is not enforced
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Design-document adherence |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/StartupValidator.cs:43-47` |
|
||
|
||
**Description**
|
||
|
||
REQ-HOST-4 requires: "Site nodes must have `GrpcPort` in valid port range (1–65535)
|
||
**and different from `RemotingPort`**." `StartupValidator` validates the GrpcPort
|
||
range but never compares it to `RemotingPort`. A site config that sets both ports to
|
||
the same value passes validation and then fails opaquely at runtime when Kestrel and
|
||
Akka.Remote both try to bind the port. The GrpcPort range check is also skipped
|
||
entirely when the key is absent (`grpcPortStr != null`), relying on the
|
||
`NodeOptions` default of 8083 — acceptable, but the equality rule is the missing
|
||
piece.
|
||
|
||
**Recommendation**
|
||
|
||
Add a check in the `role == "Site"` block: if `GrpcPort` (resolved, including the
|
||
8083 default) equals `RemotingPort`, add an error
|
||
`"ScadaLink:Node:GrpcPort must differ from RemotingPort"`.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-008 — `MachineDataDb` is validated and declared but never consumed
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/StartupValidator.cs:33-34`, `src/ScadaLink.Host/DatabaseOptions.cs:6` |
|
||
|
||
**Description**
|
||
|
||
`StartupValidator` requires a non-empty `ScadaLink:Database:MachineDataDb` connection
|
||
string for Central nodes, and `DatabaseOptions` exposes a `MachineDataDb` property,
|
||
but a repo-wide search shows the value is never read anywhere outside the Host module
|
||
— only `ConfigurationDb` is passed to `AddConfigurationDatabase`
|
||
(`Program.cs:83-85`). The Host therefore fails startup if `MachineDataDb` is missing
|
||
even though nothing uses it. This is either dead configuration that should be removed
|
||
or a missing wiring (a machine-data DbContext that was never registered).
|
||
|
||
**Recommendation**
|
||
|
||
Determine whether a machine-data store is actually required. If yes, wire it into the
|
||
relevant component's DI registration. If no, remove the `MachineDataDb` validation
|
||
rule, the `DatabaseOptions` property, and the key from `appsettings.Central.json`.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-009 — `StartAsync` reports success before role actors are confirmed running
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Akka.NET conventions |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:127-141` |
|
||
|
||
**Description**
|
||
|
||
`StartAsync` creates actors with `ActorOf` (a fire-and-forget operation — the actor's
|
||
`PreStart` runs asynchronously on its own thread) and then returns
|
||
`Task.CompletedTask`. For site nodes, `grpcServer.SetReady(_actorSystem)` is called
|
||
synchronously at the end of `RegisterSiteActors`, marking the gRPC server ready even
|
||
though `SiteCommunicationActor`, the deployment-manager singleton, and the
|
||
`ClusterClient` may not yet have completed their `PreStart`/initial-contact handshake.
|
||
REQ-HOST-7 requires "Actor system and SiteStreamManager ... initialized before gRPC
|
||
begins accepting connections" — `SiteStreamManager.Initialize` is awaited-equivalent,
|
||
but the broader actor graph is not. The window is small and the gRPC server still
|
||
rejects streams until `SetReady`, so impact is limited, but readiness is being
|
||
asserted optimistically.
|
||
|
||
**Recommendation**
|
||
|
||
If strict ordering matters, gate `SetReady` on confirmation that
|
||
`SiteCommunicationActor` is fully initialized (e.g. an `Ask` round-trip or a
|
||
readiness message), or document explicitly that gRPC readiness only guarantees the
|
||
actor system exists, not that the cluster handshake has completed.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-010 — No retry/backoff around startup preconditions (DB migration, readiness)
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Error handling & resilience |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/Program.cs:112-125` |
|
||
|
||
**Description**
|
||
|
||
On Central startup the Host opens a DI scope and calls
|
||
`MigrationHelper.ApplyOrValidateMigrationsAsync` directly. If the SQL Server is not
|
||
yet reachable (common in container orchestration where the DB and app start
|
||
together), the call throws, the top-level `catch` logs `Fatal`, and the process
|
||
exits. There is no bounded retry/backoff to tolerate a database that is briefly
|
||
unavailable at boot. The design intent (REQ-HOST-4a, readiness gating, `503` until
|
||
ready) is about *serving traffic*, but the migration step happens before the host
|
||
even runs and has no such tolerance.
|
||
|
||
**Recommendation**
|
||
|
||
Wrap the migration/validation step in a bounded retry with exponential backoff (e.g.
|
||
Polly), or move schema apply behind the readiness gate so the process stays up and
|
||
reports `503` until the database becomes reachable.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### Host-011 — `LoggingOptions.MinimumLevel` is dead configuration
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.Host/LoggingOptions.cs:5`, `src/ScadaLink.Host/Program.cs:42-50` |
|
||
|
||
**Description**
|
||
|
||
`LoggingOptions` exposes a `MinimumLevel` property bound from `ScadaLink:Logging`
|
||
(`SiteServiceRegistration.BindSharedOptions`), and both `appsettings.Central.json`
|
||
and `appsettings.Site.json` set `"Logging": { "MinimumLevel": "Information" }`.
|
||
However Serilog is configured purely via `ReadFrom.Configuration(configuration)`,
|
||
which reads the standard `Serilog` section — not `ScadaLink:Logging`. The
|
||
`LoggingOptions.MinimumLevel` value is never read by any code, so changing it has no
|
||
effect. This is misleading: an operator editing `ScadaLink:Logging:MinimumLevel`
|
||
expecting a log-level change will see nothing happen.
|
||
|
||
**Recommendation**
|
||
|
||
Either consume `LoggingOptions.MinimumLevel` when configuring the Serilog
|
||
`LoggerConfiguration` (e.g. set `MinimumLevel.Is(...)` from it), or remove the option
|
||
class and the `ScadaLink:Logging` sections and rely solely on the `Serilog`
|
||
configuration section. Keep one mechanism, not two.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|