Files
ScadaBridge/code-reviews
Joseph Doherty 6ae0fea558 fix(error-handling): close Theme 4 — 18 cancellation / fire-and-forget findings
Async cancellation hygiene, fire-and-forget observability, retry/shutdown
semantics, and audit-row coverage across 9 modules. Highlights:

Cancellation & lifecycle:
- AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the
  captured SyncContext that risked sync-over-async deadlock.
- AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS,
  threaded through drain paths instead of CancellationToken.None.
- Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls.
- Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM
  during the bounded-retry window aborts cleanly.

Cursor / retry / counter correctness:
- AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since`
  when any row's idempotent insert is still being retried (per-EventId
  retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical
  abandon). No more silent abandonment of permanently-failing rows.
- ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's
  SPLIT loop — by class-doc construction the catch could only mask real
  failures and let the next iteration create permanent partition holes.
- HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot
  per-interval counters before sending, restore via new
  ISiteHealthCollector.AddIntervalCounters on transport failure so counts
  aren't silently lost.

Fire-and-forget / shutdown waits:
- InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks
  via OnlyOnFaulted continuation (Warning log; response unchanged).
- SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep
  with a bounded SweepShutdownWaitTimeout (10s).

Leak / refactor:
- Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its
  own try/catch so a throw doesn't leak the relay actor or _activeStreams
  entry.
- Comm-022: VERIFIED already-closed by Comm-016's dead-code purge.
- CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync
  (auth-failure exit-code contract unified).

Defensive / validation:
- CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config
  prints a warning and returns defaults instead of crashing the CLI.
- Host-022: ParseLevel emits stderr one-shot warning for unrecognised
  MinimumLevel instead of silently coercing to Information.
- ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the
  per-call CTS is the sole timeout source (was clipped to 100s by .NET).
- Security-020: New SecurityOptionsValidator (IValidateOptions) rejects
  empty LdapServer/LdapSearchBase with ValidateOnStart.
- DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/
  DeleteTimedOut audit entries (mirrors DeployFailed pattern).

Plus reconciled stale per-module Open-findings counters that had drifted
from prior sessions.

20+ new regression tests across 11 test projects; build clean; affected
suites all green. README regenerated: 75 open (was 93).
2026-05-28 07:13:28 -04:00
..

Code Reviews

Comprehensive, per-module code reviews of the ScadaLink codebase. Each module (one buildable project under src/) has its own folder containing a findings.md. This README is the aggregated index — the single place to see all outstanding work.

Generated by regen-readme.py from the per-module findings.md files. Do not edit by hand — edit the findings files and re-run the script.

How it works

  • Reviews are performed one module at a time against a fixed checklist.
  • Every finding is recorded in the module's findings.md with a severity and status.
  • Findings are never deleted — they are closed by changing their status, keeping a full audit trail.
  • This README aggregates every pending finding (Open / In Progress) across all modules.

See REVIEW-PROCESS.md for the full procedure: the review checklist, severity definitions, finding format, and how to mark items resolved.

Layout

code-reviews/
├── README.md            # this file — process overview + pending findings
├── REVIEW-PROCESS.md     # how to perform a review and track findings
├── regen-readme.py       # regenerates this README from the findings files
├── _template/findings.md # copy-this template for a module review
└── <Module>/findings.md  # one folder per src/ project

Baseline review — 2026-05-16

All 19 modules were reviewed at commit 9c60592 (241 findings: 6 Critical, 46 High, 100 Medium, 89 Low). The tables below track what remains open as findings are resolved and re-triaged; findings discovered after the baseline are appended to their module file and counted in Total.

Severity Open findings
Critical 0
High 0
Medium 25
Low 50
Total 75

Module Status

Module Last reviewed Commit Open (C/H/M/L) Open Total
AuditLog 2026-05-28 1eb6e97 0/0/2/4 6 11
CLI 2026-05-28 1eb6e97 0/0/1/2 3 23
CentralUI 2026-05-28 1eb6e97 0/0/0/5 5 33
ClusterInfrastructure 2026-05-28 1eb6e97 0/0/0/3 3 14
Commons 2026-05-28 1eb6e97 0/0/0/5 5 23
Communication 2026-05-28 1eb6e97 0/0/1/1 2 22
ConfigurationDatabase 2026-05-28 1eb6e97 0/0/2/2 4 24
DataConnectionLayer 2026-05-28 1eb6e97 0/0/0/0 0 22
DeploymentManager 2026-05-28 1eb6e97 0/0/0/4 4 24
ExternalSystemGateway 2026-05-28 1eb6e97 0/0/1/1 2 23
HealthMonitoring 2026-05-28 1eb6e97 0/0/0/2 2 23
Host 2026-05-28 1eb6e97 0/0/1/3 4 22
InboundAPI 2026-05-28 1eb6e97 0/0/1/2 3 25
ManagementService 2026-05-28 1eb6e97 0/0/2/1 3 23
NotificationOutbox 2026-05-28 1eb6e97 0/0/1/2 3 10
NotificationService 2026-05-28 1eb6e97 0/0/2/2 4 25
Security 2026-05-28 1eb6e97 0/0/0/1 1 21
SiteCallAudit 2026-05-28 1eb6e97 0/0/2/2 4 6
SiteEventLogging 2026-05-28 1eb6e97 0/0/0/3 3 23
SiteRuntime 2026-05-28 1eb6e97 0/0/2/0 2 26
StoreAndForward 2026-05-28 1eb6e97 0/0/3/2 5 24
TemplateEngine 2026-05-28 1eb6e97 0/0/3/0 3 22
Transport 2026-05-28 1eb6e97 0/0/1/3 4 12

Pending Findings

Every Open / In Progress finding across all modules, highest severity first. Resolved findings drop off this list but remain recorded in their module's findings.md (see REVIEW-PROCESS.md §4–§5). Full detail — description, location, recommendation — lives in the module's findings.md.

Critical (0)

None open.

High (0)

None open.

Medium (25)

ID Module Title
AuditLog-001 AuditLog Combined-telemetry transport is plumbed end-to-end but never invoked in production
AuditLog-005 AuditLog GetBacklogStatsAsync holds the SQLite hot-path write lock for the full COUNT+MIN scan
CLI-019 CLI bundle export decodes the entire base64 bundle into memory before writing
Communication-017 Communication _inProgressDeployments grows unboundedly — successful deployments are never cleaned up
ConfigurationDatabase-016 ConfigurationDatabase InboundApiRepository.GetApiKeyByValueAsync hashes the candidate with the unpeppered ApiKeyHasher.Default
ConfigurationDatabase-017 ConfigurationDatabase Stub-attach delete on DeploymentRecord bypasses optimistic concurrency
ExternalSystemGateway-020 ExternalSystemGateway JsonElementToParameterValue silently downcasts non-Int64 JSON numbers to double, losing precision for decimal SQL parameters on retry
Host-016 Host Site CentralContactPoints second entry targets the site's own remoting port
InboundAPI-025 InboundAPI AuditWriteMiddleware runs against the entire /api/* branch — emits spurious ApiInbound audit rows for /api/audit/query and /api/audit/export
ManagementService-020 ManagementService UpdateSmtpConfig returns and audits the SMTP Credentials field verbatim
ManagementService-021 ManagementService Transport bundle handlers have zero test coverage
NotificationOutbox-005 NotificationOutbox Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries
NotificationService-020 NotificationService NS-001 fix superseded; AkkaHostedService would register two competing Notification S&F handlers if both code paths ran
NotificationService-024 NotificationService No test affirms the central-only invariant; the orphaned-path tests give a false coverage signal
SiteCallAudit-001 SiteCallAudit SupervisorStrategy override is dead code; XML claims Resume that is not enforced
SiteCallAudit-003 SiteCallAudit OnUpsertAsync does not refresh IngestedAtUtc; direct-write callers must remember to stamp it
SiteRuntime-021 SiteRuntime HandleDeployArtifacts updates DataConnections in SQLite but never sends CreateConnectionCommand to the DCL
SiteRuntime-022 SiteRuntime AuditingDbCommand.DbConnection.set uses reflection to read AuditingDbConnection._inner
StoreAndForward-019 StoreAndForward Notifications park after DefaultMaxRetries exhaustion, contradicting "retried until central acks"
StoreAndForward-020 StoreAndForward RetryParkedMessageAsync skips standby replication when the message is deleted between local update and re-load
StoreAndForward-021 StoreAndForward Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime
TemplateEngine-018 TemplateEngine DiffService reports no entries for added/removed/changed connections
TemplateEngine-019 TemplateEngine TemplateResolver.BuildInheritanceChain still uses the 0-as-no-parent sentinel that was removed from CycleDetector
TemplateEngine-020 TemplateEngine Create* audit entries are written with EntityId = "0" before SaveChangesAsync populates the real key
Transport-010 Transport Critical Overwrite + cross-cutting paths uncovered by tests

Low (50)

ID Module Title
AuditLog-003 AuditLog AuditLogIngestActor.OnIngestAsync uses CreateScope, but OnCachedTelemetryAsync uses CreateAsyncScope — and only one disposes asynchronously
AuditLog-007 AuditLog INodeIdentityProvider resolution mixes GetService and GetRequiredService inconsistently across AddAuditLog registrations
AuditLog-008 AuditLog Test composition roots that omit IAuditPayloadFilter silently pass UNREDACTED payloads through the writer chain
AuditLog-011 AuditLog AddAuditLogHealthMetricsBridge and AddAuditLogCentralMaintenance are non-idempotent and register hosted services on every call
CLI-020 CLI bundle export success-envelope parse is unguarded
CLI-022 CLI CommandTreeTests excludes the two new command groups
CentralUI-029 CentralUI ConfigurationAuditLog uses JS.InvokeAsync<int>("eval", ...) instead of a dedicated JS module
CentralUI-030 CentralUI SandboxConsoleCapture's per-call StringWriter is not thread-safe under intra-script concurrency
CentralUI-031 CentralUI TransportImport buffers the full bundle bytes in component state
CentralUI-032 CentralUI AuditResultsGrid paging is forward-only, no Previous button
CentralUI-033 CentralUI Drill-in / query-string code paths for the new Transport + SiteCalls pages are untested
ClusterInfrastructure-011 ClusterInfrastructure SectionName constant is decorative — no binding site references it
ClusterInfrastructure-013 ClusterInfrastructure Test uses catastrophic config values without an inline-intent comment
ClusterInfrastructure-014 ClusterInfrastructure AddClusterInfrastructureActors is dead surface — no caller, no behaviour
Commons-016 Commons BundleSession.Locked uses a magic 3 rather than a named constant
Commons-018 Commons IOperationTrackingStore and IPartitionMaintenance are at the root of Interfaces/ instead of Interfaces/Services/
Commons-020 Commons Transport types and new Audit-message types have no unit tests in ScadaLink.Commons.Tests
Commons-021 Commons ExternalCallResult.Response has a benign lazy-parse race
Commons-023 Commons Trailing-optional SourceNode on positional records mixes additive evolution patterns
Communication-020 Communication SiteAddressCacheLoaded carries mutable Dictionary/List types
ConfigurationDatabase-021 ConfigurationDatabase SwitchOutPartitionAsync interpolates monthBoundary / staging table name into raw SQL
ConfigurationDatabase-024 ConfigurationDatabase Missing test coverage for SPLIT-RANGE failure-continuation and production-shape rowversion delete
DeploymentManager-021 DeploymentManager ResolveSiteIdentifierAsync silently substitutes the DB id when the site row is missing
DeploymentManager-022 DeploymentManager Pending and InProgress are written back-to-back with no intervening work
DeploymentManager-023 DeploymentManager BuildDeployArtifactsCommandAsync re-queries system-wide artifacts once per site
DeploymentManager-024 DeploymentManager Test probe actors hold mutable static state across tests
ExternalSystemGateway-021 ExternalSystemGateway ApplyAuth silently sends an unauthenticated request on unknown AuthType, empty AuthConfiguration, or malformed Basic config
HealthMonitoring-021 HealthMonitoring CentralSiteId = "central" reserved constant silently collides with a real site named "central"
HealthMonitoring-022 HealthMonitoring CentralHealthReportLoopTests uses real-time PeriodicTimer + Task.Delay; flake-prone on slow CI
Host-018 Host Shipped per-role configs omit NodeOptions.NodeName, leaving SourceNode null
Host-020 Host MinimumLevel.Is silently overrides any operator-set Serilog:MinimumLevel
Host-021 Host Microsoft Logging:LogLevel section in appsettings.json is dead config under Serilog
InboundAPI-019 InboundAPI EnableBuffering() called unconditionally on every request, including bodyless requests
InboundAPI-023 InboundAPI EndpointExtensions.HandleInboundApiRequest composition wiring has no test coverage
ManagementService-023 ManagementService HandleQueryDeployments unfiltered branch is N+1 on instance lookup
NotificationOutbox-006 NotificationOutbox ResolveAdapters rebuilds the NotificationType → adapter dictionary on every dispatch sweep
NotificationOutbox-008 NotificationOutbox FallbackMaxRetries / FallbackRetryDelay path is unreachable in production AND untested
NotificationService-022 NotificationService MailKitSmtpClientWrapper holds a long-lived SmtpClient; combined with per-send factory, the design comment about pooling is contradicted
NotificationService-025 NotificationService CredentialRedactor over-masks: any 4-character credential component is masked anywhere it appears, including unrelated log text
Security-021 Security RequireHttpsCookie=false dev opt-out has no warning path — an HTTP production deployment silently transmits the JWT bearer credential in cleartext
SiteCallAudit-002 SiteCallAudit Singleton failover does not wait for in-flight async upserts
SiteCallAudit-006 SiteCallAudit Stuck-only paging test does not exercise the multi-page boundary with an interleaved non-stuck row at the cursor
SiteEventLogging-018 SiteEventLogging FailedWriteCount is exposed but never consumed by Health Monitoring
SiteEventLogging-022 SiteEventLogging Cache=Shared is redundant for a single-connection logger
SiteEventLogging-023 SiteEventLogging Concurrent-stress test uses a non-volatile stop flag
StoreAndForward-022 StoreAndForward NotifyCachedCallObserverAsync silently drops the entire audit lifecycle when the message id is not a parseable TrackedOperationId
StoreAndForward-023 StoreAndForward siteId silently defaults to empty when no IStoreAndForwardSiteContext is registered, degrading audit telemetry correlation
Transport-008 Transport PreviewAsync issues an N+1 GetTemplateWithChildrenAsync per matching template name
Transport-009 Transport IAuditCorrelationContext.BundleImportId is mutated on the same scoped instance the AuditService reads
Transport-012 Transport "Bundle Import" filter promised in design doc not surfaced in Configuration Audit Log Viewer UI