Files
scadalink-design/code-reviews/ConfigurationDatabase/findings.md

486 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — ConfigurationDatabase
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.ConfigurationDatabase` |
| Design doc | `docs/requirements/Component-ConfigurationDatabase.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 6 |
## Summary
The ConfigurationDatabase module is a focused, conventional EF Core data-access layer:
a single `ScadaLinkDbContext`, Fluent API entity configurations, eight repository
implementations of Commons-defined interfaces, an `IAuditService` implementation, an
`IInstanceLocator`, environment-aware migration handling, and design-time tooling
support. Overall structure adheres well to the design doc and the CLAUDE.md "Code
Organization" decisions — POCO entities and interfaces live in Commons, EF mappings and
implementations live here, Fluent API only, and optimistic concurrency is correctly
applied to `DeploymentRecord` via `rowversion`. The module is generally healthy.
The main themes across findings are: (1) a genuine logic bug in
`GetTemplateWithChildrenAsync`, which loads child templates and then discards them, so
the method does not deliver what its name implies; (2) secret-bearing columns (SMTP
credentials, external-system auth config, database connection strings) persisted in
plaintext with no encryption-at-rest; (3) a hardcoded SQL `sa` connection string with a
password literal embedded in `DesignTimeDbContextFactory`; (4) the no-arg
`AddConfigurationDatabase()` overload, which silently registers nothing, making a
misconfigured central node fail late and opaquely; and (5) audit-trail robustness gaps —
`AuditService` can throw on serializing entities with navigation cycles, rolling back
the whole business operation, and the design doc's claim that audit `Id` is `Long/GUID`
disagrees with the `int` entity. Test coverage is good for the repositories that have
tests (Security, CentralUI, audit, concurrency, seed data, data protection) but several
repositories (`TemplateEngineRepository`, `DeploymentManagerRepository`,
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`,
`SiteRepository`, `InstanceLocator`) have little or no direct coverage.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ✓ | `GetTemplateWithChildrenAsync` discards loaded children (CD-001); `GetApprovedKeysForMethodAsync` CSV parsing is brittle (CD-008). |
| 2 | Akka.NET conventions | ✓ | No actors in this module; data-access layer only. No issues found. |
| 3 | Concurrency & thread safety | ✓ | DbContext correctly scoped; optimistic concurrency on `DeploymentRecord` correct. Repositories hold no shared mutable state. No issues found. |
| 4 | Error handling & resilience | ✓ | `WaitForDatabaseReadyAsync` is sound. No-arg DI overload fails late and silently (CD-003); audit JSON serialization failure handling (CD-007). |
| 5 | Security | ✓ | Hardcoded `sa` credential literal (CD-002); SMTP/DB-connection/auth secrets stored unencrypted (CD-004). |
| 6 | Performance & resource management | ✓ | `GetAllTemplatesAsync` / `GetTemplateTreeAsync` eager-load multiple collections without `AsSplitQuery` (CD-009). No N+1 in audited paths. |
| 7 | Design-document adherence | ✓ | Audit `Id` type mismatch vs design doc (CD-005); seed data uses `HasData` consistent with design. |
| 8 | Code organization & conventions | ✓ | Mostly clean. `Grpc*` address columns unbounded (CD-006); inconsistent null-guard on injected context (CD-011). |
| 9 | Testing coverage | ✓ | Several repositories and `InstanceLocator` lack direct tests (CD-010). |
| 10 | Documentation & comments | ✓ | `DeploymentManagerRepository` "WP-24 stub" XML comment is stale; noted in module context but not raised as a standalone finding. No issues found beyond items above. |
## Findings
### ConfigurationDatabase-001 — `GetTemplateWithChildrenAsync` loads child templates then discards them
| | |
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs:30-41` |
**Description**
`GetTemplateWithChildrenAsync` queries for all templates whose `ParentTemplateId`
equals the requested id, assigns the result to the local variable `children`, and
then returns `template` — the `children` list is never used, attached to the returned
object, or otherwise exposed. The method is therefore behaviourally identical to
`GetTemplateByIdAsync` but issues an extra database round-trip. Any caller relying on
the method name to obtain a template with its derived/child templates populated will
silently receive a template with no children, leading to incorrect template-resolution
or UI behaviour with no error.
**Recommendation**
Either populate the children onto the returned aggregate (e.g. project into a result
type that carries the children, or load them into a navigation collection that is
actually returned), or remove the dead query and the misleading method if children are
not in fact needed. If the navigation does not exist on the `Template` entity, add an
explicit result tuple/DTO so the loaded data reaches the caller.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). Root cause confirmed against source: the
method ran a `Where(t => t.ParentTemplateId == id)` query, assigned the result to a
local `children` variable, and never used it — a misleading no-op that also issued an
extra database round-trip per call.
Triage of the three callers (`FlatteningPipeline.BuildTemplateChainAsync`,
`ManagementActor.HandleGetTemplate`, `ManagementActor.HandleValidateTemplate`) showed
none consume derived/sub-templates; they all need the template's *member* collections
(Attributes/Alarms/Scripts/Compositions), which `GetTemplateByIdAsync` already
eager-loads. The `Template` entity has no child-templates navigation collection, and
adding one (plus changing the interface signature) would require editing
`ScadaLink.Commons`, which is outside this module's scope.
Fix applied the recommendation's secondary option: removed the dead query so the
method no longer misleads or wastes a round-trip, and added an XML doc comment
clarifying that "children" means the template's member collections. The method now
honestly delegates to `GetTemplateByIdAsync`. Regression tests added in
`TemplateEngineRepositoryTests.cs`:
`GetTemplateWithChildrenAsync_ReturnsTemplateWithAllMemberCollectionsPopulated`,
`GetTemplateWithChildrenAsync_PreservesParentTemplateId_ForInheritanceChainWalk`, and
`GetTemplateWithChildrenAsync_ReturnsNull_WhenTemplateDoesNotExist` — pinning the
template-aggregate contract the callers depend on.
### ConfigurationDatabase-002 — Hardcoded `sa` connection string with embedded password literal
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.ConfigurationDatabase/DesignTimeDbContextFactory.cs:21-22` |
**Description**
`DesignTimeDbContextFactory` falls back to a literal connection string
`"Server=localhost,1433;Database=ScadaLink_Config;User Id=sa;Password=YourPassword;TrustServerCertificate=True"`
when no configured connection string is found. Embedding a credential literal (even a
placeholder) in source code is a poor pattern: it is committed to version control,
encourages copy-paste of `sa`/`TrustServerCertificate=True` into real environments, and
the fallback can mask a genuine misconfiguration during `dotnet ef` operations by
silently pointing tooling at an unintended database.
**Recommendation**
Remove the hardcoded fallback. If no connection string is resolved from configuration
or environment, throw a clear `InvalidOperationException` instructing the developer to
set `ScadaLink:Database:ConfigurationDb` (or an environment variable). At minimum, read
the design-time connection string from an environment variable rather than a literal,
and never use `sa`.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the factory
fell back to a literal `User Id=sa;Password=YourPassword;...` connection string when no
configured value was found. Removed the hardcoded fallback entirely. The factory now
resolves the connection string from the Host's appsettings files or, when those are not
present, from the `SCADALINK_DESIGNTIME_CONNECTIONSTRING` environment variable, and
throws a clear `InvalidOperationException` (naming both the config key and the env var)
when neither yields a value. Also hardened `SetBasePath` to be applied only when the
`ScadaLink.Host` directory exists, so the factory degrades cleanly instead of throwing
`DirectoryNotFoundException` when run from a context without a sibling Host folder.
Regression tests added in `DesignTimeDbContextFactoryTests.cs`:
`CreateDbContext_NoConnectionStringConfigured_ThrowsClearException`,
`CreateDbContext_ConnectionStringFromEnvironmentVariable_IsUsed`, and
`DesignTimeDbContextFactory_SourceContainsNoHardcodedSaCredential`.
### ConfigurationDatabase-003 — No-arg `AddConfigurationDatabase()` silently registers nothing
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.ConfigurationDatabase/ServiceCollectionExtensions.cs:44-49` |
**Description**
The parameterless `AddConfigurationDatabase()` overload is a deliberate no-op "retained
for backward compatibility during migration." If a central node is wired up with this
overload by mistake, no `ScadaLinkDbContext`, repositories, `IAuditService`, or
`IInstanceLocator` are registered. The failure does not surface at startup; it surfaces
much later as opaque DI resolution exceptions the first time any consumer requests a
repository — far from the actual misconfiguration. The XML comment also refers to
"Phase 0 stubs," which is stale relative to the current state of the module.
**Recommendation**
Either delete the no-op overload now that the connection-string overload exists, or
mark it `[Obsolete]` with an error-level message so misuse is a compile-time failure.
If a true "site node" no-op is genuinely required, give it an explicit, self-documenting
name (e.g. `AddConfigurationDatabaseNoOp()`), and remove the stale "Phase 0" wording.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the
parameterless `AddConfigurationDatabase()` overload returned `services` unchanged,
registering no `DbContext`, repositories, `IAuditService`, or `IInstanceLocator`.
Applied the recommendation's first option: the overload is now marked
`[Obsolete(..., error: true)]` so any source reference is a compile-time failure, and
its body throws `InvalidOperationException` with an actionable message as
defence-in-depth (covering reflection-based invocation or suppressed warnings). The
stale "Phase 0 stubs / backward compatibility" XML comment was replaced with one
explaining the obsoletion. The pre-existing
`ServiceRegistrationTests.AddConfigurationDatabase_NoArgs_DoesNotThrow` test in
`UnitTest1.cs`, which encoded the old buggy no-op contract, was updated to
`AddConfigurationDatabase_NoArgs_FailsFast` to assert the corrected behaviour.
New regression tests added in `ServiceCollectionExtensionsTests.cs`:
`AddConfigurationDatabase_NoArgOverload_FailsFastWithClearMessage` and
`AddConfigurationDatabase_NoArgOverload_IsMarkedObsoleteAsError`.
### ConfigurationDatabase-004 — Secret-bearing columns stored in plaintext with no protection
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/NotificationConfiguration.cs:56-57`, `src/ScadaLink.ConfigurationDatabase/Configurations/ExternalSystemConfiguration.cs:25-26,75-77` |
**Description**
`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`, and
`DatabaseConnectionDefinition.ConnectionString` all hold authentication secrets (SMTP
OAuth2 client secrets / passwords, external-system API keys or Basic Auth credentials,
and database passwords respectively). They are mapped as ordinary string columns and
persisted verbatim. Anyone with read access to the configuration database — including
audit-log JSON if these entities are serialized into `AfterStateJson` — obtains the
plaintext secrets. The design doc does not call out encryption-at-rest for these
fields, so the design is also silent on a real risk.
**Recommendation**
Apply encryption to these fields, e.g. an EF Core value converter backed by ASP.NET
Data Protection (the module already configures `IDataProtectionKeyContext`), or rely on
SQL Server Always Encrypted / column encryption. Separately, ensure `IAuditService`
callers never pass these secret-bearing entities (or that the serializer redacts the
fields) so secrets do not leak into `AuditLogEntry.AfterStateJson`. Update the design
doc to state the chosen at-rest protection.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`, and
`DatabaseConnectionDefinition.ConnectionString` were mapped as ordinary `nvarchar(4000)`
columns and persisted verbatim.
Implemented the recommendation's first option — an in-module EF Core value converter
backed by ASP.NET Data Protection, which the module already uses
(`IDataProtectionKeyContext`, `AddDataProtection().PersistKeysToDbContext`). Added
`EncryptedStringConverter` (purpose-scoped `IDataProtector`; `Protect` on write,
`Unprotect` on read; null-safe; surfaces a clear message on a `CryptographicException`).
`ScadaLinkDbContext` gained an `(options, IDataProtectionProvider)` constructor and
applies the converter to the three secret columns in `OnModelCreating`; the DI
registration in `ServiceCollectionExtensions` now constructs the context with the
registered provider. The secret columns were widened to `HasMaxLength(8000)` (EF maps
this to `nvarchar(max)` on SQL Server) so ciphertext expansion cannot truncate the
value; migration `20260517010521_EncryptSecretColumns` carries the column-type change.
Regression tests added in `SecretEncryptionTests.cs` verify the raw column value is
never the plaintext secret and that EF transparently decrypts on read, for all three
columns plus a null round-trip.
The encryption scheme itself is fully in-module; the only remaining cross-cutting item
is a documentation gap — the design doc does not yet state encryption-at-rest for these
fields. That doc update is outside this module's editable scope (constraint: edit only
`src/ScadaLink.ConfigurationDatabase`, the tests, and this file) and is surfaced here
for a follow-up to `docs/requirements/Component-ConfigurationDatabase.md`. The audit
secret-leak concern is mitigated separately by CD-007's serializer hardening; whether
callers should additionally redact secret-bearing entities before passing them to
`IAuditService` is a caller-side concern in other modules and is also surfaced for
follow-up. The code fix in this module is complete.
### ConfigurationDatabase-005 — Audit `Id` type disagrees with the design doc
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/AuditConfiguration.cs:11` (entity `src/ScadaLink.Commons/Entities/Audit/AuditLogEntry.cs`) |
**Description**
The design doc's Audit Entry Schema table specifies `Id` as `Long / GUID`, and notes
the audit table is append-only and retained indefinitely. The actual `AuditLogEntry`
entity uses an `int` identity key. For a never-purged, append-only table that
accumulates one row per save operation across the system lifetime, a 32-bit identity
risks overflow over a long deployment horizon, and the code drifts from the documented
schema.
**Recommendation**
Change `AuditLogEntry.Id` to `long` (and the corresponding migration column to
`bigint`) to match the design doc and remove the overflow risk, or — if `int` is
intentional — update the design doc's schema table to say `int` and justify it.
Resolve the discrepancy in one direction.
**Resolution**
_Unresolved._
### ConfigurationDatabase-006 — `Site.GrpcNodeAAddress` / `GrpcNodeBAddress` columns are unbounded
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/SiteConfiguration.cs:24-25` |
**Description**
`SiteConfiguration` explicitly sets `HasMaxLength(500)` for `NodeAAddress` and
`NodeBAddress`, but the entity also has `GrpcNodeAAddress` and `GrpcNodeBAddress`
(added per the gRPC streaming design decision) which are not configured at all. With no
length set, EF Core maps them to `nvarchar(max)`. This is inconsistent with the sibling
address columns, wastes the opportunity to constrain input, and `nvarchar(max)` columns
cannot be indexed and have different storage/performance characteristics.
**Recommendation**
Add `builder.Property(s => s.GrpcNodeAAddress).HasMaxLength(500);` and the same for
`GrpcNodeBAddress`, matching the existing `NodeAAddress`/`NodeBAddress` mapping, and
generate a migration to alter the column types.
**Resolution**
_Unresolved._
### ConfigurationDatabase-007 — `AuditService` does not handle JSON-serialization failure of arbitrary `afterState`
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.ConfigurationDatabase/Services/AuditService.cs:28-30` |
**Description**
`LogAsync` serializes the caller-supplied `afterState` object with
`JsonSerializer.Serialize(afterState)` using default options. EF entity POCOs commonly
have navigation properties; serializing an entity that has loaded navigations (e.g. a
`Template` with `Attributes`/`Scripts`, or any entity with a cycle) will throw
`JsonException` for a reference cycle or produce a very large payload. Because audit
writes are designed to commit in the same transaction as the change, a serialization
exception thrown here will roll back the *entire* business operation — a template
update fails because its audit entry could not be serialized. This couples audit
robustness to the shape of every entity passed in.
**Recommendation**
Configure `JsonSerializerOptions` with `ReferenceHandler.IgnoreCycles` (or
`Preserve`) and a sensible `MaxDepth`, and consider serializing a projected
DTO/snapshot rather than the live tracked entity. Decide explicitly whether an audit
serialization failure should fail the operation or be logged and degraded gracefully,
and document that decision against the design doc's transactional-guarantee section.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: `LogAsync`
called `JsonSerializer.Serialize(afterState)` with default options, so any `afterState`
graph containing a reference cycle threw `JsonException` — and because the audit entry
commits in the same transaction as the change it records, that exception rolled back
the entire business operation.
Fix applied per the recommendation: `AuditService` now serializes via a static
`JsonSerializerOptions` configured with `ReferenceHandler.IgnoreCycles` and
`MaxDepth = 32`. The serialization is additionally wrapped in a `SerializeAfterState`
helper that catches a residual `JsonException`/`NotSupportedException` and substitutes a
small diagnostic placeholder JSON (`AuditSerializationError` + `StateType`) — an explicit
decision that an audit-serialization failure must **degrade gracefully** and never roll
back the audited operation. The audit entry is always recorded; the design doc's
transactional-guarantee section ("if the change succeeds, the audit entry is always
recorded") is thereby honoured even for pathological state objects. Regression test
added in `AuditServiceTests.cs`:
`LogAsync_AfterStateWithReferenceCycle_DoesNotThrow_AndDoesNotRollBackOperation`.
### ConfigurationDatabase-008 — `GetApprovedKeysForMethodAsync` CSV parsing silently drops malformed ids
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/InboundApiRepository.cs:46-58` |
**Description**
`ApiMethod.ApprovedApiKeyIds` is stored as a comma-separated string of integer ids.
`GetApprovedKeysForMethodAsync` splits it, maps each token with
`int.TryParse(...) ? id : -1`, then filters with `id > 0`. Any token that fails to
parse, or a legitimately negative/zero id, is silently discarded. If `ApprovedApiKeyIds`
becomes corrupt (e.g. a stray name instead of an id), the method quietly returns fewer
approved keys than expected, which for an API-key authorization path means a method may
unexpectedly reject a key that should be approved. Storing a relational many-to-many as
a CSV string in a column is itself fragile (no FK integrity, no cascade on key delete).
**Recommendation**
Short term: log a warning when a token fails to parse instead of silently dropping it,
so corruption is observable. Longer term: replace the CSV column with a proper join
table (`ApiMethodApprovedKey`) with foreign keys to `ApiMethod` and `ApiKey`, which
gives referential integrity and correct cascade behaviour when an API key is deleted.
**Resolution**
_Unresolved._
### ConfigurationDatabase-009 — Multi-collection eager loads issue cartesian-product queries
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs:43-51,53-61`, `src/ScadaLink.ConfigurationDatabase/Repositories/CentralUiRepository.cs:45-55` |
**Description**
`GetAllTemplatesAsync`, `GetTemplatesComposingAsync`, and `GetTemplateTreeAsync` each
`Include` three-to-four sibling collections (`Attributes`, `Alarms`, `Scripts`,
`Compositions`) in a single query. EF Core's default single-query strategy produces a
cartesian-product join across those collections, so a template with N attributes, M
alarms, and K scripts yields N×M×K rows that EF must then de-duplicate. For templates
with many members this materially inflates the result set and query time.
`GetInstanceByIdAsync`/`GetAllInstancesAsync` have the same shape with three
collections.
**Recommendation**
Add `.AsSplitQuery()` to these multi-collection-include queries (or set
`UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery)` globally in
`AddConfigurationDatabase`) so each collection is loaded with a separate query and the
cartesian explosion is avoided.
**Resolution**
_Unresolved._
### ConfigurationDatabase-010 — Several repositories and `InstanceLocator` lack direct test coverage
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs`, `Repositories/DeploymentManagerRepository.cs`, `Repositories/ExternalSystemRepository.cs`, `Repositories/InboundApiRepository.cs`, `Repositories/NotificationRepository.cs`, `Repositories/SiteRepository.cs`, `Services/InstanceLocator.cs` |
**Description**
The test project covers `SecurityRepository`, `CentralUiRepository`, `AuditService`,
optimistic concurrency, seed data, and Data Protection persistence. There are no direct
tests for `TemplateEngineRepository` (the largest repository, and the one with the
CD-001 bug, which a test would have caught), `DeploymentManagerRepository` (including
its `Local`-then-stub delete fallback and the `DeleteInstanceAsync`
restrict-FK-cleanup logic), `ExternalSystemRepository`, `InboundApiRepository` (notably
`GetApprovedKeysForMethodAsync` CSV parsing — CD-008), `NotificationRepository`,
`SiteRepository` (including its stub-attach delete path), or `InstanceLocator`.
**Recommendation**
Add repository-level tests using the existing `SqliteTestHelper` pattern, covering at
minimum: CRUD round-trips, the stub-attach delete fallbacks in
`DeploymentManagerRepository`/`SiteRepository`, `DeleteInstanceAsync`'s explicit
deployment-record cleanup, `GetApprovedKeysForMethodAsync` with valid/malformed CSV,
and `InstanceLocator.GetSiteIdForInstanceAsync` for found/not-found cases.
**Resolution**
_Unresolved._
### ConfigurationDatabase-011 — Inconsistent constructor null-guarding across repositories/services
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/ExternalSystemRepository.cs:11-14`, `Repositories/InboundApiRepository.cs:11-14`, `Repositories/NotificationRepository.cs:11-14`, `Services/InstanceLocator.cs:13-16` |
**Description**
`SecurityRepository`, `CentralUiRepository`, `TemplateEngineRepository`,
`DeploymentManagerRepository`, `SiteRepository`, and `AuditService` all guard their
injected `ScadaLinkDbContext` with `?? throw new ArgumentNullException(...)`.
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`, and
`InstanceLocator` assign the constructor argument directly with no guard. This is a
minor consistency/maintainability issue: although the DI container will not normally
supply null, the divergence makes the codebase look unfinished and means a future
hand-constructed instance fails with a less informative `NullReferenceException` later.
**Recommendation**
Apply the same `?? throw new ArgumentNullException(nameof(context))` guard in the four
inconsistent constructors so all data-access types behave uniformly.
**Resolution**
_Unresolved._