Files
ScadaBridge/code-reviews/README.md
T
Joseph Doherty 291274ae76 fix(notifications): close OAuth2 SMTP + dispatcher resilience gaps (5 findings)
NS-021/NO-001: thread FromAddress into XOAUTH2 so M365 stops rejecting
sends with 535 5.7.3. Added an additive oauth2UserName parameter on
ISmtpClientWrapper.AuthenticateAsync; both NotificationService and
NotificationOutbox now pass config.FromAddress.

NO-002: clamp non-positive SmtpConfiguration.MaxRetries/RetryDelay to the
1-min / 10-attempt fallback with a Warning so a misconfigured row no
longer parks transient failures on the first attempt or burn-loops.

NO-003: route a lifecycle-scoped CancellationToken from the
NotificationOutboxActor through the dispatch sweep into the adapter so
in-flight SMTP sends abort on PostStop instead of blocking
CoordinatedShutdown for the full SMTP timeout per row.

NO-004: await the central audit writer inside the existing try/catch
instead of fire-and-forget so the audit task can't outlive the per-sweep
DI scope and writer faults reach the operator log instead of being
silently dropped.

Two AuditLog integration tests seeded RetryDelay = TimeSpan.Zero to force
immediate re-claim on the second tick; updated them to 1 ms so they keep
the same intent without tripping the NO-002 clamp.
2026-05-28 03:54:43 -04:00

30 KiB

Code Reviews

Comprehensive, per-module code reviews of the ScadaLink codebase. Each module (one buildable project under src/) has its own folder containing a findings.md. This README is the aggregated index — the single place to see all outstanding work.

Generated by regen-readme.py from the per-module findings.md files. Do not edit by hand — edit the findings files and re-run the script.

How it works

  • Reviews are performed one module at a time against a fixed checklist.
  • Every finding is recorded in the module's findings.md with a severity and status.
  • Findings are never deleted — they are closed by changing their status, keeping a full audit trail.
  • This README aggregates every pending finding (Open / In Progress) across all modules.

See REVIEW-PROCESS.md for the full procedure: the review checklist, severity definitions, finding format, and how to mark items resolved.

Layout

code-reviews/
├── README.md            # this file — process overview + pending findings
├── REVIEW-PROCESS.md     # how to perform a review and track findings
├── regen-readme.py       # regenerates this README from the findings files
├── _template/findings.md # copy-this template for a module review
└── <Module>/findings.md  # one folder per src/ project

Baseline review — 2026-05-16

All 19 modules were reviewed at commit 9c60592 (241 findings: 6 Critical, 46 High, 100 Medium, 89 Low). The tables below track what remains open as findings are resolved and re-triaged; findings discovered after the baseline are appended to their module file and counted in Total.

Severity Open findings
Critical 0
High 13
Medium 56
Low 90
Total 159

Module Status

Module Last reviewed Commit Open (C/H/M/L) Open Total
AuditLog 2026-05-28 1eb6e97 0/0/3/8 11 11
CLI 2026-05-28 1eb6e97 0/0/2/4 6 23
CentralUI 2026-05-28 1eb6e97 0/0/2/5 7 33
ClusterInfrastructure 2026-05-28 1eb6e97 0/0/0/4 4 14
Commons 2026-05-28 1eb6e97 0/0/3/6 9 23
Communication 2026-05-28 1eb6e97 0/1/1/5 7 22
ConfigurationDatabase 2026-05-28 1eb6e97 0/1/4/5 10 24
DataConnectionLayer 2026-05-28 1eb6e97 0/1/4/0 5 22
DeploymentManager 2026-05-28 1eb6e97 0/1/1/5 7 24
ExternalSystemGateway 2026-05-28 1eb6e97 0/1/2/3 6 23
HealthMonitoring 2026-05-28 1eb6e97 0/0/2/5 7 23
Host 2026-05-28 1eb6e97 0/0/2/5 7 22
InboundAPI 2026-05-28 1eb6e97 0/1/3/4 8 25
ManagementService 2026-05-28 1eb6e97 0/0/2/2 4 23
NotificationOutbox 2026-05-28 1eb6e97 0/0/3/3 6 10
NotificationService 2026-05-28 1eb6e97 0/1/2/3 6 25
Security 2026-05-28 1eb6e97 0/0/0/2 2 21
SiteCallAudit 2026-05-28 1eb6e97 0/0/2/4 6 6
SiteEventLogging 2026-05-28 1eb6e97 0/1/2/6 9 23
SiteRuntime 2026-05-28 1eb6e97 0/0/4/3 7 26
StoreAndForward 2026-05-28 1eb6e97 0/1/3/3 7 24
TemplateEngine 2026-05-28 1eb6e97 0/1/4/1 6 22
Transport 2026-05-28 1eb6e97 0/3/5/4 12 12

Pending Findings

Every Open / In Progress finding across all modules, highest severity first. Resolved findings drop off this list but remain recorded in their module's findings.md (see REVIEW-PROCESS.md §4–§5). Full detail — description, location, recommendation — lives in the module's findings.md.

Critical (0)

None open.

High (13)

ID Module Title
Communication-016 Communication HandleConnectionStateChanged is dead code — the documented disconnect-cleanup workflow never fires
ConfigurationDatabase-015 ConfigurationDatabase NotificationOutboxRepository.InsertIfNotExistsAsync is a check-then-act race with no duplicate-key catch
DataConnectionLayer-018 DataConnectionLayer Concurrent subscribes for the same tag from different instances orphan an adapter subscription handle
DeploymentManager-018 DeploymentManager Reconciliation force-sets Enabled, overwriting an intentional Disabled after central failover
ExternalSystemGateway-018 ExternalSystemGateway DeliverBufferedAsync lets JsonException propagate, turning a corrupt buffered row into a permanent retry-forever poison message
InboundAPI-022 InboundAPI IActiveNodeGate has no production registration in Host — standby-node gating is silently disabled in production
NotificationService-019 NotificationService NotificationDeliveryService and INotificationDeliveryService are orphaned by the central-only redesign
SiteEventLogging-016 SiteEventLogging From/To filters compare non-normalised ISO 8601 strings against UTC-stored timestamps
StoreAndForward-018 StoreAndForward Notification corrupt-payload parks the buffered message, contradicting the "notifications do not park" design invariant
TemplateEngine-017 TemplateEngine Revision hash and diff both ignore Description and Connections, defeating staleness detection for real deployment changes
Transport-001 Transport Template Overwrite never syncs attributes / alarms / scripts
Transport-002 Transport ExternalSystem Overwrite never syncs methods
Transport-003 Transport Unlock lockout is enforced only client-side; server session is never marked Locked

Medium (56)

ID Module Title
AuditLog-001 AuditLog Combined-telemetry transport is plumbed end-to-end but never invoked in production
AuditLog-004 AuditLog SiteAuditReconciliationActor advances cursor even on per-row insert failure, silently abandoning permanently-failing rows
AuditLog-005 AuditLog GetBacklogStatsAsync holds the SQLite hot-path write lock for the full COUNT+MIN scan
CLI-017 CLI BundleCommands.RunBundleCommandAsync duplicates ExecuteCommandAsync and breaks the auth exit-code contract
CLI-019 CLI bundle export decodes the entire base64 bundle into memory before writing
CentralUI-026 CentralUI AuditFilterBar From/To filters treat browser-local datetimes as UTC
CentralUI-027 CentralUI Same UTC misinterpretation in SiteCallsReport, NotificationReport, and EventLogs
Commons-015 Commons EncryptionMetadata accepts any algorithm string and any iteration count
Commons-017 Commons Component-Commons.md is significantly stale (audit enums, new entities, new repositories, new service interfaces, new folders)
Commons-019 Commons New *Utc-suffixed DateTime columns on AuditEvent / SiteCall are not enforced as UTC; inconsistent with Notification's DateTimeOffset
Communication-017 Communication _inProgressDeployments grows unboundedly — successful deployments are never cleaned up
ConfigurationDatabase-016 ConfigurationDatabase InboundApiRepository.GetApiKeyByValueAsync hashes the candidate with the unpeppered ApiKeyHasher.Default
ConfigurationDatabase-017 ConfigurationDatabase Stub-attach delete on DeploymentRecord bypasses optimistic concurrency
ConfigurationDatabase-018 ConfigurationDatabase DateTime-typed *Utc columns on AuditEvent / SiteCall carry no DateTimeKind enforcement
ConfigurationDatabase-019 ConfigurationDatabase EnsureLookaheadAsync swallows non-idempotent SPLIT failures and continues, creating partition holes
DataConnectionLayer-019 DataConnectionLayer OpcUaDataConnection._subscriptionHandles is a plain Dictionary<,> mutated from concurrent thread-pool continuations
DataConnectionLayer-020 DataConnectionLayer HandleSubscribeCompleted double-counts _totalSubscribed when a previously-unresolved tag is resolved by a different instance's subscribe
DataConnectionLayer-021 DataConnectionLayer HandleSubscribeCompleted re-creates and leaks _subscriptionsByInstance entry when the instance unsubscribed mid-flight
DataConnectionLayer-022 DataConnectionLayer HandleSubscribeCompleted and HandleTagResolutionFailed reset the tag-resolution retry timer on every call via StartPeriodicTimer, starving the retry under subscribe bursts
DeploymentManager-019 DeploymentManager Lifecycle command timeout writes no audit entry
ExternalSystemGateway-019 ExternalSystemGateway HttpClient.Timeout is not set; DefaultHttpTimeout > 100s is silently clipped by the framework default
ExternalSystemGateway-020 ExternalSystemGateway JsonElementToParameterValue silently downcasts non-Int64 JSON numbers to double, losing precision for decimal SQL parameters on retry
HealthMonitoring-017 HealthMonitoring HealthReportSender resets interval counters before Send; transport failures silently drop the interval's error counts
HealthMonitoring-019 HealthMonitoring SiteAuditTelemetryStalled and CentralAuditWriteFailures design-doc metrics have no HealthMonitoring-side surface
Host-016 Host Site CentralContactPoints second entry targets the site's own remoting port
Host-017 Host Site-shutdown ordering from REQ-HOST-7 is not wired
InboundAPI-018 InboundAPI AuditWriteMiddleware fires WriteAsync as _ = task — faulted async writes are unobserved
InboundAPI-021 InboundAPI ParentExecutionId correlation flows only through Call; attribute reads/writes lose the inbound→site execution-tree link
InboundAPI-025 InboundAPI AuditWriteMiddleware runs against the entire /api/* branch — emits spurious ApiInbound audit rows for /api/audit/query and /api/audit/export
ManagementService-020 ManagementService UpdateSmtpConfig returns and audits the SMTP Credentials field verbatim
ManagementService-021 ManagementService Transport bundle handlers have zero test coverage
NotificationOutbox-005 NotificationOutbox Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries
NotificationOutbox-007 NotificationOutbox NotificationOutboxOptions.DispatchBatchSize, DeliveredKpiWindow, and PurgeInterval are not in the design document
NotificationOutbox-010 NotificationOutbox Comment claims PipeTo is not used "because the writer never throws"; the surrounding try/catch is dead-letter for the documented failure mode
NotificationService-020 NotificationService NS-001 fix superseded; AkkaHostedService would register two competing Notification S&F handlers if both code paths ran
NotificationService-024 NotificationService No test affirms the central-only invariant; the orphaned-path tests give a false coverage signal
SiteCallAudit-001 SiteCallAudit SupervisorStrategy override is dead code; XML claims Resume that is not enforced
SiteCallAudit-003 SiteCallAudit OnUpsertAsync does not refresh IngestedAtUtc; direct-write callers must remember to stamp it
SiteEventLogging-015 SiteEventLogging Background write queue is unbounded; can grow without limit under sustained writer slowness
SiteEventLogging-017 SiteEventLogging Central client's PageSize is unbounded; defeats the "configurable page size" design rationale
SiteRuntime-020 SiteRuntime Second DeployInstanceCommand arriving during a pending redeploy races the still-terminating actor on its name
SiteRuntime-021 SiteRuntime HandleDeployArtifacts updates DataConnections in SQLite but never sends CreateConnectionCommand to the DCL
SiteRuntime-022 SiteRuntime AuditingDbCommand.DbConnection.set uses reflection to read AuditingDbConnection._inner
SiteRuntime-024 SiteRuntime OperationTrackingStore serialises all writes through one connection + SemaphoreSlim, and Dispose() does sync-over-async
StoreAndForward-019 StoreAndForward Notifications park after DefaultMaxRetries exhaustion, contradicting "retried until central acks"
StoreAndForward-020 StoreAndForward RetryParkedMessageAsync skips standby replication when the message is deleted between local update and re-load
StoreAndForward-021 StoreAndForward Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime
TemplateEngine-018 TemplateEngine DiffService reports no entries for added/removed/changed connections
TemplateEngine-019 TemplateEngine TemplateResolver.BuildInheritanceChain still uses the 0-as-no-parent sentinel that was removed from CycleDetector
TemplateEngine-020 TemplateEngine Create* audit entries are written with EntityId = "0" before SaveChangesAsync populates the real key
TemplateEngine-021 TemplateEngine MoveTemplateAsync skips folder cycle and sibling-name-collision validation
Transport-004 Transport MaxUnlockAttemptsPerIpPerHour option is declared but never enforced
Transport-005 Transport Manifest fields outside ContentHash are not bound to the encrypted payload
Transport-006 Transport Bundle ZIP read has no per-entry size cap or entry-count cap (zip-bomb / decompression-bomb)
Transport-007 Transport Failed import sessions retain decrypted plaintext for the full 30-minute TTL
Transport-010 Transport Critical Overwrite + cross-cutting paths uncovered by tests

Low (90)

ID Module Title
AuditLog-002 AuditLog SupervisorStrategy comments claim Resume semantics but code returns the default Restart decider
AuditLog-003 AuditLog AuditLogIngestActor.OnIngestAsync uses CreateScope, but OnCachedTelemetryAsync uses CreateAsyncScope — and only one disposes asynchronously
AuditLog-006 AuditLog SqliteAuditWriter.Dispose() does sync-over-async and may deadlock
AuditLog-007 AuditLog INodeIdentityProvider resolution mixes GetService and GetRequiredService inconsistently across AddAuditLog registrations
AuditLog-008 AuditLog Test composition roots that omit IAuditPayloadFilter silently pass UNREDACTED payloads through the writer chain
AuditLog-009 AuditLog SqliteAuditWriter.DisposeAsync comment claims _disposed is set early, but it isn't
AuditLog-010 AuditLog Actor drain paths accept a CancellationToken parameter but always pass CancellationToken.None downstream
AuditLog-011 AuditLog AddAuditLogHealthMetricsBridge and AddAuditLogCentralMaintenance are non-idempotent and register hosted services on every call
CLI-020 CLI bundle export success-envelope parse is unguarded
CLI-021 CLI CliConfig.Load crashes the CLI on a malformed config file
CLI-022 CLI CommandTreeTests excludes the two new command groups
CLI-023 CLI Component-CLI.md claims audit commands ride POST /management; implementation uses REST endpoints
CentralUI-029 CentralUI ConfigurationAuditLog uses JS.InvokeAsync<int>("eval", ...) instead of a dedicated JS module
CentralUI-030 CentralUI SandboxConsoleCapture's per-call StringWriter is not thread-safe under intra-script concurrency
CentralUI-031 CentralUI TransportImport buffers the full bundle bytes in component state
CentralUI-032 CentralUI AuditResultsGrid paging is forward-only, no Previous button
CentralUI-033 CentralUI Drill-in / query-string code paths for the new Transport + SiteCalls pages are untested
ClusterInfrastructure-011 ClusterInfrastructure SectionName constant is decorative — no binding site references it
ClusterInfrastructure-012 ClusterInfrastructure Validator accepts SeedNodes.Count == 1 despite design requiring both nodes as seeds
ClusterInfrastructure-013 ClusterInfrastructure Test uses catastrophic config values without an inline-intent comment
ClusterInfrastructure-014 ClusterInfrastructure AddClusterInfrastructureActors is dead surface — no caller, no behaviour
Commons-016 Commons BundleSession.Locked uses a magic 3 rather than a named constant
Commons-018 Commons IOperationTrackingStore and IPartitionMaintenance are at the root of Interfaces/ instead of Interfaces/Services/
Commons-020 Commons Transport types and new Audit-message types have no unit tests in ScadaLink.Commons.Tests
Commons-021 Commons ExternalCallResult.Response has a benign lazy-parse race
Commons-022 Commons IAuditCorrelationContext references an unresolvable BundleImporter.ApplyAsync cref; JSON-blob columns have no documented shape
Commons-023 Commons Trailing-optional SourceNode on positional records mixes additive evolution patterns
Communication-018 Communication Site heartbeats hard-code IsActive: true regardless of node role
Communication-019 Communication LoadSiteAddressesFromDb does not pass a CancellationToken to the repository
Communication-020 Communication SiteAddressCacheLoaded carries mutable Dictionary/List types
Communication-021 Communication SiteStreamGrpcServer.SubscribeInstance leaks the StreamRelayActor if Subscribe throws pre-try
Communication-022 Communication _debugSubscriptions keyed by caller-supplied correlation ID; reuse silently orphans the prior subscriber
ConfigurationDatabase-020 ConfigurationDatabase GetPartitionBoundariesOlderThanAsync returns DateTime with Kind=Unspecified
ConfigurationDatabase-021 ConfigurationDatabase SwitchOutPartitionAsync interpolates monthBoundary / staging table name into raw SQL
ConfigurationDatabase-022 ConfigurationDatabase Stale "WP-24 Stub level sufficient for diff/staleness support" XML comment on DeploymentManagerRepository
ConfigurationDatabase-023 ConfigurationDatabase AuditLog correlation-index name drifts from design doc (IX_AuditLog_CorrelationId vs IX_AuditLog_Correlation)
ConfigurationDatabase-024 ConfigurationDatabase Missing test coverage for SPLIT-RANGE failure-continuation and production-shape rowversion delete
DeploymentManager-020 DeploymentManager DeployReconciled audit attributes the action to the prior deployer, not the current user
DeploymentManager-021 DeploymentManager ResolveSiteIdentifierAsync silently substitutes the DB id when the site row is missing
DeploymentManager-022 DeploymentManager Pending and InProgress are written back-to-back with no intervening work
DeploymentManager-023 DeploymentManager BuildDeployArtifactsCommandAsync re-queries system-wide artifacts once per site
DeploymentManager-024 DeploymentManager Test probe actors hold mutable static state across tests
ExternalSystemGateway-021 ExternalSystemGateway ApplyAuth silently sends an unauthenticated request on unknown AuthType, empty AuthConfiguration, or malformed Basic config
ExternalSystemGateway-022 ExternalSystemGateway new HttpMethod(method.HttpMethod) accepts any string at runtime; an invalid HTTP verb fails only at call time
ExternalSystemGateway-023 ExternalSystemGateway PATCH HTTP method is supported by code but absent from the design doc; body-vs-query decision drifts from the documented set
HealthMonitoring-018 HealthMonitoring Same counter-reset-before-publish hazard in CentralHealthReportLoop
HealthMonitoring-020 HealthMonitoring MarkHeartbeat brings offline site back online with a stale LastHeartbeatAt when receivedAt <= existing.LastHeartbeatAt
HealthMonitoring-021 HealthMonitoring CentralSiteId = "central" reserved constant silently collides with a real site named "central"
HealthMonitoring-022 HealthMonitoring CentralHealthReportLoopTests uses real-time PeriodicTimer + Task.Delay; flake-prone on slow CI
HealthMonitoring-023 HealthMonitoring StoreAndForwardBufferDepths_IsEmptyPlaceholder test name is stale; it now covers the default-state contract, not a placeholder
Host-018 Host Shipped per-role configs omit NodeOptions.NodeName, leaving SourceNode null
Host-019 Host Migration StartupRetry call drops the host CancellationToken
Host-020 Host MinimumLevel.Is silently overrides any operator-set Serilog:MinimumLevel
Host-021 Host Microsoft Logging:LogLevel section in appsettings.json is dead config under Serilog
Host-022 Host ParseLevel silently coerces unrecognised MinimumLevel to Information
InboundAPI-019 InboundAPI EnableBuffering() called unconditionally on every request, including bodyless requests
InboundAPI-020 InboundAPI ContentType.Contains("json") is case-sensitive; application/JSON with no Content-Length skips body parsing
InboundAPI-023 InboundAPI EndpointExtensions.HandleInboundApiRequest composition wiring has no test coverage
InboundAPI-024 InboundAPI _knownBadMethods is unbounded — an attacker can grow the cache by spamming distinct method names against the audit middleware path
ManagementService-022 ManagementService Design doc is stale on Transport bundle commands, /api/audit/* endpoints, and CommandTimeout
ManagementService-023 ManagementService HandleQueryDeployments unfiltered branch is N+1 on instance lookup
NotificationOutbox-006 NotificationOutbox ResolveAdapters rebuilds the NotificationType → adapter dictionary on every dispatch sweep
NotificationOutbox-008 NotificationOutbox FallbackMaxRetries / FallbackRetryDelay path is unreachable in production AND untested
NotificationOutbox-009 NotificationOutbox StuckAgeThreshold XML-doc says "in-progress notification is re-claimed" — contradicts the design's display-only stuck detection
NotificationService-022 NotificationService MailKitSmtpClientWrapper holds a long-lived SmtpClient; combined with per-send factory, the design comment about pooling is contradicted
NotificationService-023 NotificationService XML docs on the orphaned classes still describe the removed site-delivery flow; misleading to maintainers
NotificationService-025 NotificationService CredentialRedactor over-masks: any 4-character credential component is masked anywhere it appears, including unrelated log text
Security-020 Security SecurityOptions has no startup validation for required fields (LdapServer, LdapSearchBase)
Security-021 Security RequireHttpsCookie=false dev opt-out has no warning path — an HTTP production deployment silently transmits the JWT bearer credential in cleartext
SiteCallAudit-002 SiteCallAudit Singleton failover does not wait for in-flight async upserts
SiteCallAudit-004 SiteCallAudit Reconciliation puller and daily terminal-purge scheduler still deferred; design-doc drift
SiteCallAudit-005 SiteCallAudit AckErrorMessage switch arm for SiteUnreachable returns ack message instead of throwing
SiteCallAudit-006 SiteCallAudit Stuck-only paging test does not exercise the multi-page boundary with an interleaved non-stuck row at the cursor
SiteEventLogging-018 SiteEventLogging FailedWriteCount is exposed but never consumed by Health Monitoring
SiteEventLogging-019 SiteEventLogging EventLogPurgeService runs on every host node; design says "active node"
SiteEventLogging-020 SiteEventLogging severity and eventType are unvalidated free-form strings; doc enumerates a set that is not enforced
SiteEventLogging-021 SiteEventLogging DateTimeOffset.Parse uses the current culture; can throw on non-default locales
SiteEventLogging-022 SiteEventLogging Cache=Shared is redundant for a single-connection logger
SiteEventLogging-023 SiteEventLogging Concurrent-stress test uses a non-volatile stop flag
SiteRuntime-023 SiteRuntime Convert.ToDouble(value) in trigger and alarm evaluation is locale-sensitive
SiteRuntime-025 SiteRuntime HandleSetStaticAttribute persists unknown attribute names as static overrides
SiteRuntime-026 SiteRuntime ReplicationMessages.cs public record types have no XML documentation
StoreAndForward-022 StoreAndForward NotifyCachedCallObserverAsync silently drops the entire audit lifecycle when the message id is not a parseable TrackedOperationId
StoreAndForward-023 StoreAndForward siteId silently defaults to empty when no IStoreAndForwardSiteContext is registered, degrading audit telemetry correlation
StoreAndForward-024 StoreAndForward StopAsync does not wait for an in-flight retry sweep, so disposed dependencies can be touched after shutdown
TemplateEngine-022 TemplateEngine LockEnforcer.ValidateLockChange enforces "once-locked-stays-locked" for IsLocked but not for LockedInDerived
Transport-008 Transport PreviewAsync issues an N+1 GetTemplateWithChildrenAsync per matching template name
Transport-009 Transport IAuditCorrelationContext.BundleImportId is mutated on the same scoped instance the AuditService reads
Transport-011 Transport Design doc's Step-1 manifest preview promises decryption-free preview, but LoadAsync reads and validates content before passphrase
Transport-012 Transport "Bundle Import" filter promised in design doc not surfaced in Configuration Audit Log Viewer UI