Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
17 KiB
Code Review — Host
| Field | Value |
|---|---|
| Module | src/ScadaLink.Host |
| Design doc | docs/requirements/Component-Host.md |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | 9c60592 |
| Open findings | 11 |
Summary
The Host module is the composition root for the entire ScadaLink system: a single
binary whose behaviour (Central vs Site) is driven entirely by configuration. The
implementation is generally faithful to Component-Host.md — startup validation,
role-based registration, Serilog enrichment, Windows Service support, dead-letter
monitoring, CoordinatedShutdown, and gRPC hosting on site nodes are all present and
backed by a solid test suite (tests/ScadaLink.Host.Tests).
The most significant problem is the readiness endpoint: /health/ready runs all
registered health checks, including the leader-only active-node check, so a fully
operational standby central node permanently reports 503 on /health/ready —
directly contradicting REQ-HOST-4a, which defines readiness as cluster membership +
DB connectivity (not leadership). Several other findings concern configuration that
is validated-but-never-consumed (MachineDataDb), design-doc drift (Akka.Persistence
is required by REQ-HOST-6 but the system uses no persistent actors), an incorrect
seed-node entry in the shipped site config, blocking sync-over-async during startup,
and unguarded string interpolation when building HOCON. None are crash/data-loss
class, but the readiness bug is High because it breaks load-balancer behaviour with
no safe workaround.
Checklist coverage
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ☑ | /health/ready includes the leader-only check (Host-001); site seed-node config points at the gRPC port (Host-004). |
| 2 | Akka.NET conventions | ☑ | CoordinatedShutdown, receptionist registration, singleton scoping all correct. HOCON built by raw string interpolation (Host-006); StartAsync returns before actors are confirmed running (Host-009). |
| 3 | Concurrency & thread safety | ☑ | Blocking GetAwaiter().GetResult() on a hosted-service startup thread (Host-005). DeadLetterMonitorActor state is actor-confined — no issues. |
| 4 | Error handling & resilience | ☑ | Top-level try/catch logs fatal and rethrows. No retry around DB migration / readiness preconditions (Host-010). |
| 5 | Security | ☑ | Plaintext DB password, LDAP service-account password and dev JWT key checked into appsettings.Central.json (Host-003). |
| 6 | Performance & resource management | ☑ | No undisposed resources. Inbound API script compilation is a synchronous startup loop — acceptable. |
| 7 | Design-document adherence | ☑ | REQ-HOST-6 mandates Akka.Persistence config but none exists and no persistent actors exist — doc is stale (Host-002). REQ-HOST-4 GrpcPort-≠-RemotingPort rule not enforced (Host-007). |
| 8 | Code organization & conventions | ☑ | MachineDataDb validated/declared but never consumed (Host-008). LoggingOptions.MinimumLevel is dead (Host-011). |
| 9 | Testing coverage | ☑ | Strong suite; no test asserts /health/ready excludes active-node, which is why Host-001 slipped through (noted in Host-001). |
| 10 | Documentation & comments | ☑ | Comments are accurate. REQ-HOST-6 in the design doc is the main stale-doc item (Host-002). |
Findings
Host-001 — /health/ready includes the leader-only active-node check
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | src/ScadaLink.Host/Program.cs:135-145 |
Description
/health/ready is mapped with MapHealthChecks("/health/ready", ...) and no
Predicate, so it executes every registered check: database, akka-cluster
and active-node. ActiveNodeHealthCheck (Health/ActiveNodeHealthCheck.cs:38)
returns Unhealthy on any node that is not the cluster leader. As a result a
standby central node that is fully operational (cluster member Up, database
reachable) still returns 503 on /health/ready. This contradicts REQ-HOST-4a,
which defines readiness as cluster membership + DB connectivity + singletons —
explicitly not leadership. /health/active is the endpoint intended to report
leadership. A load balancer using /health/ready to decide whether a node may
serve traffic will permanently treat the standby as unready, defeating failover
readiness. No test covers this: HealthCheckTests.HealthReady_Endpoint_ReturnsResponse
only asserts a response is returned, not the standby semantics.
Recommendation
Add a Predicate to the /health/ready mapping that excludes the active-node
check, e.g. Predicate = check => check.Name != "active-node" (or tag the readiness
checks and filter by tag). Add a regression test asserting a non-leader node returns
200 on /health/ready.
Resolution
Unresolved.
Host-002 — Akka.Persistence required by REQ-HOST-6 is not configured and not used
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | src/ScadaLink.Host/Actors/AkkaHostedService.cs:70-108 |
Description
REQ-HOST-6 states the Host "must configure the Akka.NET actor system using
Akka.Hosting with ... Persistence: Configured with the appropriate journal and
snapshot store (SQL for central, SQLite for site)." The HOCON built in
AkkaHostedService.StartAsync contains no akka.persistence section, no journal and
no snapshot-store plugin, and ScadaLink.Host.csproj references neither
Akka.Persistence.Hosting nor any persistence plugin (the design doc Dependencies
list Akka.Persistence.Hosting). A repo-wide search finds no PersistentActor /
ReceivePersistentActor subclasses — the system deliberately uses custom SQLite
storage services instead. The code is internally consistent, but the design document
is stale: it mandates a subsystem that does not exist. This is a documented-vs-actual
drift that will mislead future maintainers and any audit against REQ-HOST-6.
Recommendation
Update Component-Host.md REQ-HOST-6 and the Dependencies list to remove the
Akka.Persistence requirement (or explicitly state persistence is provided by
component-owned SQLite storage, not Akka.Persistence). If persistence is intended,
add the plugin packages and HOCON. Either way, code and doc must agree.
Resolution
Unresolved.
Host-003 — Secrets committed in plaintext in appsettings.Central.json
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | src/ScadaLink.Host/appsettings.Central.json:20-31 |
Description
appsettings.Central.json contains real-looking secrets in plaintext, checked into
source control: SQL Server passwords in the ConfigurationDb / MachineDataDb
connection strings (Password=ScadaLink_Dev1#), an LDAP service-account password
(LdapServiceAccountPassword: "password"), and a JWT signing key
(JwtSigningKey: "scadalink-dev-jwt-signing-key-..."). Even though these are
intended as development defaults, shipping them in the default config invites them
being reused verbatim in production, and a committed JWT signing key allows anyone
with repo access to forge session tokens. TrustServerCertificate=true additionally
disables TLS validation for the SQL connection.
Recommendation
Move all secrets out of committed appsettings*.json into environment variables,
user-secrets, or a secret store. Keep only non-sensitive structural defaults in the
file and document the required environment variables. At minimum add a clear comment
that these values are dev-only and must be overridden, and rotate the JWT key per
environment.
Resolution
Unresolved.
Host-004 — Site seed-node list points at the gRPC port, not a remoting port
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | src/ScadaLink.Host/appsettings.Site.json:10-19 |
Description
The shipped site config sets Node:RemotingPort = 8082 and Node:GrpcPort = 8083,
but Cluster:SeedNodes is ["akka.tcp://scadalink@localhost:8082", "akka.tcp://scadalink@localhost:8083"]. The second seed node targets 8083, which
is the Kestrel HTTP/2 gRPC port — not an Akka remoting endpoint. A node attempting to
join via that seed will try to establish an Akka.Remote TCP association against the
gRPC listener and fail. StartupValidator only checks that ≥2 seed nodes exist
(StartupValidator.cs:54-56), so this misconfiguration passes validation silently.
For the single-node dev site it is harmless (the first seed succeeds), but it is an
incorrect example that will be copied into multi-node site configs.
Recommendation
Correct the site seed-node list to reference the two site nodes' remoting ports
(e.g. 8082 and 8084), never the gRPC port. Consider extending StartupValidator
to reject a seed node whose port equals this node's GrpcPort.
Resolution
Unresolved.
Host-005 — Blocking sync-over-async (GetAwaiter().GetResult()) inside StartAsync
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Location | src/ScadaLink.Host/Actors/AkkaHostedService.cs:345 |
Description
RegisterSiteActors calls storeAndForwardService.StartAsync().GetAwaiter().GetResult()
synchronously, blocking inside the IHostedService.StartAsync path. StartAsync is
itself declared synchronous (returns Task.CompletedTask), so the work cannot be
awaited cleanly. Blocking on async work risks thread-pool starvation during startup
and, if the awaited operation captures a synchronization context, deadlock. It also
hides exceptions behind an AggregateException wrapper.
Recommendation
Make AkkaHostedService.StartAsync genuinely async and await storeAndForwardService.StartAsync(cancellationToken). Propagate the
CancellationToken and let exceptions surface as the original type.
Resolution
Unresolved.
Host-006 — HOCON assembled by unescaped string interpolation
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Open |
| Location | src/ScadaLink.Host/Actors/AkkaHostedService.cs:70-108 |
Description
The Akka HOCON is built with an interpolated string that injects
_nodeOptions.NodeHostname, _clusterOptions.SeedNodes, the computed roles, and
SplitBrainResolverStrategy directly into the configuration text. Values are not
escaped. A hostname or seed-node string containing a quote, backslash, brace, or
comment sequence would corrupt the HOCON and produce a confusing parse error far from
the real cause; SplitBrainResolverStrategy is interpolated without quoting, so a
value with whitespace breaks the document. Building cluster configuration from raw
string concatenation is also harder to maintain than the typed Akka.Hosting builder
the design doc (REQ-HOST-6) actually calls for ("via Akka.Hosting").
Recommendation
Prefer the Akka.Hosting AddAkka(...) builder with strongly-typed WithRemoting,
WithClustering, and split-brain-resolver configuration instead of hand-built HOCON.
If HOCON must be retained, validate/escape interpolated values (especially hostname
and seed nodes) before substitution.
Resolution
Unresolved.
Host-007 — REQ-HOST-4 rule "GrpcPort ≠ RemotingPort" is not enforced
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Location | src/ScadaLink.Host/StartupValidator.cs:43-47 |
Description
REQ-HOST-4 requires: "Site nodes must have GrpcPort in valid port range (1–65535)
and different from RemotingPort." StartupValidator validates the GrpcPort
range but never compares it to RemotingPort. A site config that sets both ports to
the same value passes validation and then fails opaquely at runtime when Kestrel and
Akka.Remote both try to bind the port. The GrpcPort range check is also skipped
entirely when the key is absent (grpcPortStr != null), relying on the
NodeOptions default of 8083 — acceptable, but the equality rule is the missing
piece.
Recommendation
Add a check in the role == "Site" block: if GrpcPort (resolved, including the
8083 default) equals RemotingPort, add an error
"ScadaLink:Node:GrpcPort must differ from RemotingPort".
Resolution
Unresolved.
Host-008 — MachineDataDb is validated and declared but never consumed
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | src/ScadaLink.Host/StartupValidator.cs:33-34, src/ScadaLink.Host/DatabaseOptions.cs:6 |
Description
StartupValidator requires a non-empty ScadaLink:Database:MachineDataDb connection
string for Central nodes, and DatabaseOptions exposes a MachineDataDb property,
but a repo-wide search shows the value is never read anywhere outside the Host module
— only ConfigurationDb is passed to AddConfigurationDatabase
(Program.cs:83-85). The Host therefore fails startup if MachineDataDb is missing
even though nothing uses it. This is either dead configuration that should be removed
or a missing wiring (a machine-data DbContext that was never registered).
Recommendation
Determine whether a machine-data store is actually required. If yes, wire it into the
relevant component's DI registration. If no, remove the MachineDataDb validation
rule, the DatabaseOptions property, and the key from appsettings.Central.json.
Resolution
Unresolved.
Host-009 — StartAsync reports success before role actors are confirmed running
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Open |
| Location | src/ScadaLink.Host/Actors/AkkaHostedService.cs:127-141 |
Description
StartAsync creates actors with ActorOf (a fire-and-forget operation — the actor's
PreStart runs asynchronously on its own thread) and then returns
Task.CompletedTask. For site nodes, grpcServer.SetReady(_actorSystem) is called
synchronously at the end of RegisterSiteActors, marking the gRPC server ready even
though SiteCommunicationActor, the deployment-manager singleton, and the
ClusterClient may not yet have completed their PreStart/initial-contact handshake.
REQ-HOST-7 requires "Actor system and SiteStreamManager ... initialized before gRPC
begins accepting connections" — SiteStreamManager.Initialize is awaited-equivalent,
but the broader actor graph is not. The window is small and the gRPC server still
rejects streams until SetReady, so impact is limited, but readiness is being
asserted optimistically.
Recommendation
If strict ordering matters, gate SetReady on confirmation that
SiteCommunicationActor is fully initialized (e.g. an Ask round-trip or a
readiness message), or document explicitly that gRPC readiness only guarantees the
actor system exists, not that the cluster handshake has completed.
Resolution
Unresolved.
Host-010 — No retry/backoff around startup preconditions (DB migration, readiness)
| Severity | Low |
| Category | Error handling & resilience |
| Status | Open |
| Location | src/ScadaLink.Host/Program.cs:112-125 |
Description
On Central startup the Host opens a DI scope and calls
MigrationHelper.ApplyOrValidateMigrationsAsync directly. If the SQL Server is not
yet reachable (common in container orchestration where the DB and app start
together), the call throws, the top-level catch logs Fatal, and the process
exits. There is no bounded retry/backoff to tolerate a database that is briefly
unavailable at boot. The design intent (REQ-HOST-4a, readiness gating, 503 until
ready) is about serving traffic, but the migration step happens before the host
even runs and has no such tolerance.
Recommendation
Wrap the migration/validation step in a bounded retry with exponential backoff (e.g.
Polly), or move schema apply behind the readiness gate so the process stays up and
reports 503 until the database becomes reachable.
Resolution
Unresolved.
Host-011 — LoggingOptions.MinimumLevel is dead configuration
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | src/ScadaLink.Host/LoggingOptions.cs:5, src/ScadaLink.Host/Program.cs:42-50 |
Description
LoggingOptions exposes a MinimumLevel property bound from ScadaLink:Logging
(SiteServiceRegistration.BindSharedOptions), and both appsettings.Central.json
and appsettings.Site.json set "Logging": { "MinimumLevel": "Information" }.
However Serilog is configured purely via ReadFrom.Configuration(configuration),
which reads the standard Serilog section — not ScadaLink:Logging. The
LoggingOptions.MinimumLevel value is never read by any code, so changing it has no
effect. This is misleading: an operator editing ScadaLink:Logging:MinimumLevel
expecting a log-level change will see nothing happen.
Recommendation
Either consume LoggingOptions.MinimumLevel when configuring the Serilog
LoggerConfiguration (e.g. set MinimumLevel.Is(...) from it), or remove the option
class and the ScadaLink:Logging sections and rely solely on the Serilog
configuration section. Keep one mechanism, not two.
Resolution
Unresolved.