46cb6965ac
Security-sensitive batch, handled main-thread for careful judgment on secret-leak and pepper-bypass paths. Secret leak / pepper bypass: - CD-016 (pepper bypass): InboundApiRepository's GetApiKeyByValueAsync no longer hashes the candidate with the unpeppered ApiKeyHasher.Default — ctor takes a lazy Func<IApiKeyHasher> accessor (lazy so test composition roots without a pepper still bring up the repository), and the DI registration wires sp.GetService<IApiKeyHasher>() so the production peppered hasher matches the stored KeyHash. Regression test asserts positive (peppered roundtrip) AND negative (Default hasher misses the same key — proving the lookup uses the injected hasher). - MgmtSvc-020 (SMTP credential leak): UpdateSmtpConfig/ListSmtpConfigs now project through SmtpConfigPublicShape so the response payload and audit-row afterState never carry the Credentials field — only a HasCredentials bool. The SMTP password / OAuth2 client secret no longer leaves the Admin-only UpdateSmtpConfig boundary the caller already supplied it to. Redaction: - AuditLog-008 (test-fixture under-redact): new SafeDefaultAuditPayloadFilter (stateless singleton) does HTTP header redaction for the always-sensitive defaults (Authorization, X-Api-Key, Cookie, Set-Cookie). FallbackAuditWriter, CentralAuditWriter, and AuditLogIngestActor (both ingest paths) default to it instead of null — composition roots that bypass AddAuditLog can no longer write unredacted auth headers to the audit store. - NotifService-025 (over-mask): CredentialRedactor.Scrub now only masks the last colon-separated component (password / clientSecret) AND only if it's >= 12 chars (typical password heuristic). Short user names like "root" no longer become global redaction tokens that eat unrelated diagnostic text. The full packed string is always masked regardless of length. 3 new negative tests pin the no-over-mask contract. Audit-row correctness / fail-loud: - InboundAPI-025: Program.cs UseWhen predicate now excludes /api/audit, /api/management, /api/centralui, /api/script-analysis AND requires POST — the AuditWriteMiddleware no longer emits spurious ApiInbound rows for audit-log query/export endpoints (write-on-read recursion broken). - ESG-021: ApplyAuth now logs Warning (not silent) on empty AuthConfiguration for apikey/basic, unknown AuthType, and malformed Basic config. AuthConfiguration value NEVER logged. AuthType=none remains silent (documented unauthenticated sentinel). - Security-021: AddSecurity now logs a startup Warning when RequireHttpsCookie=false — an HTTP-only deployment that previously transmitted the cookie-embedded JWT silently in cleartext is now audible in the log. Defensive: - CD-021: SwitchOutPartitionAsync's monthBoundary format string now yyyy-MM-dd HH:mm:ss.fffffff (datetime2(7) precision) so a future sub-second / non-midnight boundary doesn't silently round to the wrong partition. Plus reconciled stale per-module Open-findings counters that had drifted from earlier sessions (AuditLog, CD, ESG, IAPI, MgmtSvc, NotifService, Security). Build clean; all affected test projects green (Host 208, ConfigDB 242, ESG 69, IAPI 151, MgmtSvc 100, NotifService 55, Security 85, AuditLog 247/248 — 1 pre-existing date-sensitive integration test flake on PartitionPurgeTests, unrelated). README regenerated: 46 open (was 54).
1374 lines
80 KiB
Markdown
1374 lines
80 KiB
Markdown
# Code Review — ConfigurationDatabase
|
||
|
||
| Field | Value |
|
||
|-------|-------|
|
||
| Module | `src/ScadaLink.ConfigurationDatabase` |
|
||
| Design doc | `docs/requirements/Component-ConfigurationDatabase.md` |
|
||
| Status | Reviewed |
|
||
| Last reviewed | 2026-05-28 |
|
||
| Reviewer | claude-agent |
|
||
| Commit reviewed | `1eb6e97` |
|
||
| Open findings | 1 |
|
||
|
||
## Summary
|
||
|
||
The ConfigurationDatabase module is a focused, conventional EF Core data-access layer:
|
||
a single `ScadaLinkDbContext`, Fluent API entity configurations, eight repository
|
||
implementations of Commons-defined interfaces, an `IAuditService` implementation, an
|
||
`IInstanceLocator`, environment-aware migration handling, and design-time tooling
|
||
support. Overall structure adheres well to the design doc and the CLAUDE.md "Code
|
||
Organization" decisions — POCO entities and interfaces live in Commons, EF mappings and
|
||
implementations live here, Fluent API only, and optimistic concurrency is correctly
|
||
applied to `DeploymentRecord` via `rowversion`. The module is generally healthy.
|
||
|
||
The main themes across findings are: (1) a genuine logic bug in
|
||
`GetTemplateWithChildrenAsync`, which loads child templates and then discards them, so
|
||
the method does not deliver what its name implies; (2) secret-bearing columns (SMTP
|
||
credentials, external-system auth config, database connection strings) persisted in
|
||
plaintext with no encryption-at-rest; (3) a hardcoded SQL `sa` connection string with a
|
||
password literal embedded in `DesignTimeDbContextFactory`; (4) the no-arg
|
||
`AddConfigurationDatabase()` overload, which silently registers nothing, making a
|
||
misconfigured central node fail late and opaquely; and (5) audit-trail robustness gaps —
|
||
`AuditService` can throw on serializing entities with navigation cycles, rolling back
|
||
the whole business operation, and the design doc's claim that audit `Id` is `Long/GUID`
|
||
disagrees with the `int` entity. Test coverage is good for the repositories that have
|
||
tests (Security, CentralUI, audit, concurrency, seed data, data protection) but several
|
||
repositories (`TemplateEngineRepository`, `DeploymentManagerRepository`,
|
||
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`,
|
||
`SiteRepository`, `InstanceLocator`) have little or no direct coverage.
|
||
|
||
#### Re-review 2026-05-17 (commit `39d737e`)
|
||
|
||
Re-reviewed the module at commit `39d737e`. All eleven findings from the initial
|
||
review (`9c60592`) remain `Resolved` — the secret-column encryption
|
||
(`EncryptedStringConverter` + `ApplySecretColumnEncryption`), the fail-fast no-arg
|
||
DI overload, the `AsSplitQuery` conversions, the audit cycle-safe serializer, and the
|
||
added repository test coverage all verified present and consistent with their
|
||
resolutions. Three new findings were recorded. The most material is that the
|
||
encryption work done for CD-004 left one bearer credential out of scope:
|
||
`ApiKey.KeyValue` — the inbound-API authentication secret — is still persisted in
|
||
plaintext (`ConfigurationDatabase-012`); it cannot use the same non-deterministic
|
||
Data Protection converter because authentication looks the key up *by value*, so it
|
||
needs a hash-based scheme instead. The second is a resilience gap in the encryption
|
||
plumbing itself: `ApplySecretColumnEncryption` silently substitutes a throwaway
|
||
`EphemeralDataProtectionProvider` whenever no provider is supplied, so any context
|
||
constructed via the single-argument constructor on a write path would encrypt
|
||
secrets with a key discarded at process exit, yielding permanently undecryptable
|
||
ciphertext with no error (`ConfigurationDatabase-013`). The third is a minor
|
||
inconsistency — a redundant cast on one of the three `HasConversion` calls
|
||
(`ConfigurationDatabase-014`). The module is otherwise healthy and the prior fixes
|
||
hold up well.
|
||
|
||
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
||
|
||
Re-reviewed the module at commit `1eb6e97`. All fourteen prior findings remain
|
||
`Resolved`; their fixes still hold (encryption converter, fail-fast guard,
|
||
peppered API-key hash, ephemeral-fallback hardening, etc.). The module has
|
||
grown since the last review — new code includes Audit Log (#23) raw-SQL
|
||
paths in `AuditLogRepository` (partition-switch purge, recursive
|
||
execution-tree CTE, KPI snapshot, partition-boundary discovery), the
|
||
`AuditLogPartitionMaintenance` SPLIT-RANGE roll-forward implementation, the
|
||
`AuditCorrelationContext` scoped service that stamps `BundleImportId`, the
|
||
`SiteCallAuditRepository` monotonic-rank upsert, and the
|
||
`NotificationOutboxRepository` per-site KPI surface — and most of the new
|
||
findings are concentrated in those raw-SQL paths and in latent gaps left
|
||
behind by the CD-012 hash migration.
|
||
|
||
Ten new findings were recorded. The most material is
|
||
`ConfigurationDatabase-015`: a check-then-act race in
|
||
`NotificationOutboxRepository.InsertIfNotExistsAsync` with no duplicate-key
|
||
catch — unlike the sibling Audit Log / Site Call ingest paths, a concurrent
|
||
ack-after-persist on the same `NotificationId` will surface as an
|
||
unhandled `DbUpdateException` and break the at-least-once site→central
|
||
handoff. `ConfigurationDatabase-016` flags that
|
||
`InboundApiRepository.GetApiKeyByValueAsync` hashes the candidate with
|
||
`ApiKeyHasher.Default` (unpeppered) while the production create-path uses
|
||
the configured peppered hasher — any future caller (or test that exercises
|
||
the method) will silently fail to find a real key; the production
|
||
`ApiKeyValidator` happens not to call it, but the method is a publicly
|
||
exposed `IInboundApiRepository` member and a latent bug.
|
||
`ConfigurationDatabase-017` records that the `DeleteDeploymentRecordAsync`
|
||
stub-attach delete bypasses the documented optimistic-concurrency rule on
|
||
`DeploymentRecord.RowVersion` — the SQLite tests pass because the test
|
||
fixture re-maps `RowVersion` as a nullable concurrency token, but in
|
||
production this is likely to throw `DbUpdateConcurrencyException`.
|
||
`ConfigurationDatabase-018` records the `DateTime`-typed `*Utc` columns on
|
||
`AuditEvent` and `SiteCall` re-emerge as `Kind=Unspecified` on read; the
|
||
sibling Commons module flagged the same pattern as Commons-019, and
|
||
`AuditLogPartitionMaintenance.GetMaxBoundaryAsync` already defends against
|
||
it with an explicit `SpecifyKind(Utc)` — but `GetPartitionBoundariesOlderThanAsync`
|
||
does not (`ConfigurationDatabase-020`). `ConfigurationDatabase-019` is the
|
||
SPLIT-RANGE loop in `AuditLogPartitionMaintenance.EnsureLookaheadAsync`
|
||
swallowing every `SqlException` as a Warning and continuing — a genuine
|
||
failure (permissions, deadlock, transient) leaves a missing boundary and
|
||
the next iteration cheerfully splits the following month, creating a hole.
|
||
`ConfigurationDatabase-021` is a low-severity hardening concern around
|
||
`SwitchOutPartitionAsync`'s raw-SQL interpolation of `monthBoundaryStr` /
|
||
`stagingTableName` (currently safe by construction, but truncates fractional
|
||
seconds). `ConfigurationDatabase-022` is the stale "WP-24 stub" XML comment
|
||
on `DeploymentManagerRepository`. `ConfigurationDatabase-023` is a
|
||
design-doc-adherence drift on `IX_AuditLog_CorrelationId` (design says
|
||
`IX_AuditLog_Correlation`). `ConfigurationDatabase-024` is missing test
|
||
coverage for the SPLIT-RANGE failure-continuation behaviour and for the
|
||
production-shape stub-attach delete with a real rowversion.
|
||
|
||
## Checklist coverage
|
||
|
||
| # | Category | Examined | Notes |
|
||
|---|----------|----------|-------|
|
||
| 1 | Correctness & logic bugs | ✓ | `GetTemplateWithChildrenAsync` discards loaded children (CD-001); `GetApprovedKeysForMethodAsync` CSV parsing is brittle (CD-008). |
|
||
| 2 | Akka.NET conventions | ✓ | No actors in this module; data-access layer only. No issues found. |
|
||
| 3 | Concurrency & thread safety | ✓ | DbContext correctly scoped; optimistic concurrency on `DeploymentRecord` correct. Repositories hold no shared mutable state. No issues found. |
|
||
| 4 | Error handling & resilience | ✓ | `WaitForDatabaseReadyAsync` is sound. No-arg DI overload fails late and silently (CD-003, resolved); audit JSON serialization failure handling (CD-007, resolved). Re-review: ephemeral Data Protection fallback can silently produce undecryptable ciphertext (CD-013). |
|
||
| 5 | Security | ✓ | Hardcoded `sa` credential literal (CD-002, resolved); SMTP/DB-connection/auth secrets unencrypted (CD-004, resolved). Re-review: `ApiKey.KeyValue` bearer credential still stored in plaintext (CD-012). |
|
||
| 6 | Performance & resource management | ✓ | `GetAllTemplatesAsync` / `GetTemplateTreeAsync` eager-load multiple collections without `AsSplitQuery` (CD-009, resolved). No N+1 in audited paths. Re-review: no new issues. |
|
||
| 7 | Design-document adherence | ✓ | Audit `Id` type mismatch vs design doc (CD-005, resolved); seed data uses `HasData` consistent with design. Re-review: no new issues. |
|
||
| 8 | Code organization & conventions | ✓ | Mostly clean. `Grpc*` address columns unbounded (CD-006, resolved); inconsistent null-guard on injected context (CD-011, resolved). Re-review: redundant/inconsistent cast on one `HasConversion` call (CD-014). |
|
||
| 9 | Testing coverage | ✓ | Several repositories and `InstanceLocator` lack direct tests (CD-010). |
|
||
| 10 | Documentation & comments | ✓ | `DeploymentManagerRepository` "WP-24 stub" XML comment is stale; noted in module context but not raised as a standalone finding. No issues found beyond items above. |
|
||
|
||
_Re-review (2026-05-28, `1eb6e97`):_
|
||
|
||
| # | Category | Examined | Notes |
|
||
|---|----------|----------|-------|
|
||
| 1 | Correctness & logic bugs | ✓ | `GetPartitionBoundariesOlderThanAsync` returns `DateTimeKind.Unspecified` (CD-020). `GetApiKeyByValueAsync` hashes with the unpeppered default (CD-016). |
|
||
| 2 | Akka.NET conventions | ✓ | No actors in this module. No issues found. |
|
||
| 3 | Concurrency & thread safety | ✓ | `NotificationOutboxRepository.InsertIfNotExistsAsync` check-then-act has no duplicate-key catch (CD-015). Stub-attach delete bypasses documented optimistic concurrency on `DeploymentRecord.RowVersion` (CD-017). |
|
||
| 4 | Error handling & resilience | ✓ | `AuditLogPartitionMaintenance.EnsureLookaheadAsync` swallows non-idempotent SPLIT failures and continues (CD-019). |
|
||
| 5 | Security | ✓ | `SwitchOutPartitionAsync` interpolates a `DateTime` string and a GUID-suffixed identifier into raw SQL — safe by construction but pattern is risky (CD-021). |
|
||
| 6 | Performance & resource management | ✓ | No new issues found. |
|
||
| 7 | Design-document adherence | ✓ | Index name drift: design says `IX_AuditLog_Correlation`, code uses `IX_AuditLog_CorrelationId` (CD-023). |
|
||
| 8 | Code organization & conventions | ✓ | `DateTime *Utc` columns on `AuditEvent` / `SiteCall` carry no `DateTimeKind` enforcement (CD-018). |
|
||
| 9 | Testing coverage | ✓ | No tests for SPLIT failure continuation and no production-shape rowversion stub-attach test (CD-024). |
|
||
| 10 | Documentation & comments | ✓ | Stale "WP-24 stub" XML comment on `DeploymentManagerRepository` (CD-022). |
|
||
|
||
## Findings
|
||
|
||
### ConfigurationDatabase-001 — `GetTemplateWithChildrenAsync` loads child templates then discards them
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | High |
|
||
| Category | Correctness & logic bugs |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs:30-41` |
|
||
|
||
**Description**
|
||
|
||
`GetTemplateWithChildrenAsync` queries for all templates whose `ParentTemplateId`
|
||
equals the requested id, assigns the result to the local variable `children`, and
|
||
then returns `template` — the `children` list is never used, attached to the returned
|
||
object, or otherwise exposed. The method is therefore behaviourally identical to
|
||
`GetTemplateByIdAsync` but issues an extra database round-trip. Any caller relying on
|
||
the method name to obtain a template with its derived/child templates populated will
|
||
silently receive a template with no children, leading to incorrect template-resolution
|
||
or UI behaviour with no error.
|
||
|
||
**Recommendation**
|
||
|
||
Either populate the children onto the returned aggregate (e.g. project into a result
|
||
type that carries the children, or load them into a navigation collection that is
|
||
actually returned), or remove the dead query and the misleading method if children are
|
||
not in fact needed. If the navigation does not exist on the `Template` entity, add an
|
||
explicit result tuple/DTO so the loaded data reaches the caller.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit `<pending>`). Root cause confirmed against source: the
|
||
method ran a `Where(t => t.ParentTemplateId == id)` query, assigned the result to a
|
||
local `children` variable, and never used it — a misleading no-op that also issued an
|
||
extra database round-trip per call.
|
||
|
||
Triage of the three callers (`FlatteningPipeline.BuildTemplateChainAsync`,
|
||
`ManagementActor.HandleGetTemplate`, `ManagementActor.HandleValidateTemplate`) showed
|
||
none consume derived/sub-templates; they all need the template's *member* collections
|
||
(Attributes/Alarms/Scripts/Compositions), which `GetTemplateByIdAsync` already
|
||
eager-loads. The `Template` entity has no child-templates navigation collection, and
|
||
adding one (plus changing the interface signature) would require editing
|
||
`ScadaLink.Commons`, which is outside this module's scope.
|
||
|
||
Fix applied the recommendation's secondary option: removed the dead query so the
|
||
method no longer misleads or wastes a round-trip, and added an XML doc comment
|
||
clarifying that "children" means the template's member collections. The method now
|
||
honestly delegates to `GetTemplateByIdAsync`. Regression tests added in
|
||
`TemplateEngineRepositoryTests.cs`:
|
||
`GetTemplateWithChildrenAsync_ReturnsTemplateWithAllMemberCollectionsPopulated`,
|
||
`GetTemplateWithChildrenAsync_PreservesParentTemplateId_ForInheritanceChainWalk`, and
|
||
`GetTemplateWithChildrenAsync_ReturnsNull_WhenTemplateDoesNotExist` — pinning the
|
||
template-aggregate contract the callers depend on.
|
||
|
||
### ConfigurationDatabase-002 — Hardcoded `sa` connection string with embedded password literal
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/DesignTimeDbContextFactory.cs:21-22` |
|
||
|
||
**Description**
|
||
|
||
`DesignTimeDbContextFactory` falls back to a literal connection string
|
||
`"Server=localhost,1433;Database=ScadaLink_Config;User Id=sa;Password=YourPassword;TrustServerCertificate=True"`
|
||
when no configured connection string is found. Embedding a credential literal (even a
|
||
placeholder) in source code is a poor pattern: it is committed to version control,
|
||
encourages copy-paste of `sa`/`TrustServerCertificate=True` into real environments, and
|
||
the fallback can mask a genuine misconfiguration during `dotnet ef` operations by
|
||
silently pointing tooling at an unintended database.
|
||
|
||
**Recommendation**
|
||
|
||
Remove the hardcoded fallback. If no connection string is resolved from configuration
|
||
or environment, throw a clear `InvalidOperationException` instructing the developer to
|
||
set `ScadaLink:Database:ConfigurationDb` (or an environment variable). At minimum, read
|
||
the design-time connection string from an environment variable rather than a literal,
|
||
and never use `sa`.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the factory
|
||
fell back to a literal `User Id=sa;Password=YourPassword;...` connection string when no
|
||
configured value was found. Removed the hardcoded fallback entirely. The factory now
|
||
resolves the connection string from the Host's appsettings files or, when those are not
|
||
present, from the `SCADALINK_DESIGNTIME_CONNECTIONSTRING` environment variable, and
|
||
throws a clear `InvalidOperationException` (naming both the config key and the env var)
|
||
when neither yields a value. Also hardened `SetBasePath` to be applied only when the
|
||
`ScadaLink.Host` directory exists, so the factory degrades cleanly instead of throwing
|
||
`DirectoryNotFoundException` when run from a context without a sibling Host folder.
|
||
Regression tests added in `DesignTimeDbContextFactoryTests.cs`:
|
||
`CreateDbContext_NoConnectionStringConfigured_ThrowsClearException`,
|
||
`CreateDbContext_ConnectionStringFromEnvironmentVariable_IsUsed`, and
|
||
`DesignTimeDbContextFactory_SourceContainsNoHardcodedSaCredential`.
|
||
|
||
### ConfigurationDatabase-003 — No-arg `AddConfigurationDatabase()` silently registers nothing
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Error handling & resilience |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/ServiceCollectionExtensions.cs:44-49` |
|
||
|
||
**Description**
|
||
|
||
The parameterless `AddConfigurationDatabase()` overload is a deliberate no-op "retained
|
||
for backward compatibility during migration." If a central node is wired up with this
|
||
overload by mistake, no `ScadaLinkDbContext`, repositories, `IAuditService`, or
|
||
`IInstanceLocator` are registered. The failure does not surface at startup; it surfaces
|
||
much later as opaque DI resolution exceptions the first time any consumer requests a
|
||
repository — far from the actual misconfiguration. The XML comment also refers to
|
||
"Phase 0 stubs," which is stale relative to the current state of the module.
|
||
|
||
**Recommendation**
|
||
|
||
Either delete the no-op overload now that the connection-string overload exists, or
|
||
mark it `[Obsolete]` with an error-level message so misuse is a compile-time failure.
|
||
If a true "site node" no-op is genuinely required, give it an explicit, self-documenting
|
||
name (e.g. `AddConfigurationDatabaseNoOp()`), and remove the stale "Phase 0" wording.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the
|
||
parameterless `AddConfigurationDatabase()` overload returned `services` unchanged,
|
||
registering no `DbContext`, repositories, `IAuditService`, or `IInstanceLocator`.
|
||
Applied the recommendation's first option: the overload is now marked
|
||
`[Obsolete(..., error: true)]` so any source reference is a compile-time failure, and
|
||
its body throws `InvalidOperationException` with an actionable message as
|
||
defence-in-depth (covering reflection-based invocation or suppressed warnings). The
|
||
stale "Phase 0 stubs / backward compatibility" XML comment was replaced with one
|
||
explaining the obsoletion. The pre-existing
|
||
`ServiceRegistrationTests.AddConfigurationDatabase_NoArgs_DoesNotThrow` test in
|
||
`UnitTest1.cs`, which encoded the old buggy no-op contract, was updated to
|
||
`AddConfigurationDatabase_NoArgs_FailsFast` to assert the corrected behaviour.
|
||
New regression tests added in `ServiceCollectionExtensionsTests.cs`:
|
||
`AddConfigurationDatabase_NoArgOverload_FailsFastWithClearMessage` and
|
||
`AddConfigurationDatabase_NoArgOverload_IsMarkedObsoleteAsError`.
|
||
|
||
### ConfigurationDatabase-004 — Secret-bearing columns stored in plaintext with no protection
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/NotificationConfiguration.cs:56-57`, `src/ScadaLink.ConfigurationDatabase/Configurations/ExternalSystemConfiguration.cs:25-26,75-77` |
|
||
|
||
**Description**
|
||
|
||
`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`, and
|
||
`DatabaseConnectionDefinition.ConnectionString` all hold authentication secrets (SMTP
|
||
OAuth2 client secrets / passwords, external-system API keys or Basic Auth credentials,
|
||
and database passwords respectively). They are mapped as ordinary string columns and
|
||
persisted verbatim. Anyone with read access to the configuration database — including
|
||
audit-log JSON if these entities are serialized into `AfterStateJson` — obtains the
|
||
plaintext secrets. The design doc does not call out encryption-at-rest for these
|
||
fields, so the design is also silent on a real risk.
|
||
|
||
**Recommendation**
|
||
|
||
Apply encryption to these fields, e.g. an EF Core value converter backed by ASP.NET
|
||
Data Protection (the module already configures `IDataProtectionKeyContext`), or rely on
|
||
SQL Server Always Encrypted / column encryption. Separately, ensure `IAuditService`
|
||
callers never pass these secret-bearing entities (or that the serializer redacts the
|
||
fields) so secrets do not leak into `AuditLogEntry.AfterStateJson`. Update the design
|
||
doc to state the chosen at-rest protection.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
|
||
`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`, and
|
||
`DatabaseConnectionDefinition.ConnectionString` were mapped as ordinary `nvarchar(4000)`
|
||
columns and persisted verbatim.
|
||
|
||
Implemented the recommendation's first option — an in-module EF Core value converter
|
||
backed by ASP.NET Data Protection, which the module already uses
|
||
(`IDataProtectionKeyContext`, `AddDataProtection().PersistKeysToDbContext`). Added
|
||
`EncryptedStringConverter` (purpose-scoped `IDataProtector`; `Protect` on write,
|
||
`Unprotect` on read; null-safe; surfaces a clear message on a `CryptographicException`).
|
||
`ScadaLinkDbContext` gained an `(options, IDataProtectionProvider)` constructor and
|
||
applies the converter to the three secret columns in `OnModelCreating`; the DI
|
||
registration in `ServiceCollectionExtensions` now constructs the context with the
|
||
registered provider. The secret columns were widened to `HasMaxLength(8000)` (EF maps
|
||
this to `nvarchar(max)` on SQL Server) so ciphertext expansion cannot truncate the
|
||
value; migration `20260517010521_EncryptSecretColumns` carries the column-type change.
|
||
Regression tests added in `SecretEncryptionTests.cs` verify the raw column value is
|
||
never the plaintext secret and that EF transparently decrypts on read, for all three
|
||
columns plus a null round-trip.
|
||
|
||
The encryption scheme itself is fully in-module; the only remaining cross-cutting item
|
||
is a documentation gap — the design doc does not yet state encryption-at-rest for these
|
||
fields. That doc update is outside this module's editable scope (constraint: edit only
|
||
`src/ScadaLink.ConfigurationDatabase`, the tests, and this file) and is surfaced here
|
||
for a follow-up to `docs/requirements/Component-ConfigurationDatabase.md`. The audit
|
||
secret-leak concern is mitigated separately by CD-007's serializer hardening; whether
|
||
callers should additionally redact secret-bearing entities before passing them to
|
||
`IAuditService` is a caller-side concern in other modules and is also surfaced for
|
||
follow-up. The code fix in this module is complete.
|
||
|
||
### ConfigurationDatabase-005 — Audit `Id` type disagrees with the design doc
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Design-document adherence |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/AuditConfiguration.cs:11` (entity `src/ScadaLink.Commons/Entities/Audit/AuditLogEntry.cs`) |
|
||
|
||
**Description**
|
||
|
||
The design doc's Audit Entry Schema table specifies `Id` as `Long / GUID`, and notes
|
||
the audit table is append-only and retained indefinitely. The actual `AuditLogEntry`
|
||
entity uses an `int` identity key. For a never-purged, append-only table that
|
||
accumulates one row per save operation across the system lifetime, a 32-bit identity
|
||
risks overflow over a long deployment horizon, and the code drifts from the documented
|
||
schema.
|
||
|
||
**Recommendation**
|
||
|
||
Change `AuditLogEntry.Id` to `long` (and the corresponding migration column to
|
||
`bigint`) to match the design doc and remove the overflow risk, or — if `int` is
|
||
intentional — update the design doc's schema table to say `int` and justify it.
|
||
Resolve the discrepancy in one direction.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the
|
||
`AuditLogEntry` entity declares `int Id`, while the design doc's Audit Entry Schema
|
||
table said `Long / GUID`. The entity lives in `ScadaLink.Commons`
|
||
(`src/ScadaLink.Commons/Entities/Audit/AuditLogEntry.cs`), which is outside this
|
||
module's editable scope, so the discrepancy was resolved by aligning the design doc to
|
||
the code — the recommendation's second option. The schema table now records `Id` as
|
||
`int (identity)` with an explicit justification: a 32-bit identity matches the key type
|
||
of every other entity in the schema (uniform repository/query code), and at a sustained
|
||
100 rows/second the `int` range is not exhausted for roughly 680 years, so the
|
||
indefinite-retention policy poses no realistic overflow risk; if a future deployment
|
||
ever approaches the limit the column can be widened to `bigint` via a migration without
|
||
a schema redesign. No regression test is meaningful for a documentation alignment; the
|
||
existing `AuditConfiguration` (`HasKey(a => a.Id)`) and the audit repository tests
|
||
already exercise the `int` key end to end.
|
||
|
||
### ConfigurationDatabase-006 — `Site.GrpcNodeAAddress` / `GrpcNodeBAddress` columns are unbounded
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/SiteConfiguration.cs:24-25` |
|
||
|
||
**Description**
|
||
|
||
`SiteConfiguration` explicitly sets `HasMaxLength(500)` for `NodeAAddress` and
|
||
`NodeBAddress`, but the entity also has `GrpcNodeAAddress` and `GrpcNodeBAddress`
|
||
(added per the gRPC streaming design decision) which are not configured at all. With no
|
||
length set, EF Core maps them to `nvarchar(max)`. This is inconsistent with the sibling
|
||
address columns, wastes the opportunity to constrain input, and `nvarchar(max)` columns
|
||
cannot be indexed and have different storage/performance characteristics.
|
||
|
||
**Recommendation**
|
||
|
||
Add `builder.Property(s => s.GrpcNodeAAddress).HasMaxLength(500);` and the same for
|
||
`GrpcNodeBAddress`, matching the existing `NodeAAddress`/`NodeBAddress` mapping, and
|
||
generate a migration to alter the column types.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
|
||
`SiteConfiguration` configured `NodeAAddress`/`NodeBAddress` with `HasMaxLength(500)` but
|
||
left `GrpcNodeAAddress`/`GrpcNodeBAddress` unconfigured, so EF mapped them to
|
||
`nvarchar(max)` — inconsistent with the sibling columns and non-indexable. Applied the
|
||
recommendation: added `builder.Property(s => s.GrpcNodeAAddress).HasMaxLength(500)` and
|
||
the same for `GrpcNodeBAddress`. Generated migration
|
||
`20260517020720_BoundGrpcNodeAddressLength` altering both columns from `nvarchar(max)`
|
||
to `nvarchar(500)` (the model snapshot was updated to match). Regression tests added in
|
||
`SchemaConfigurationTests.cs`:
|
||
`GrpcNodeAddressColumns_AreLengthBoundedTo500` (theory over both columns, asserting the
|
||
EF model metadata reports `MaxLength == 500`) and
|
||
`GrpcNodeAddressColumns_MatchSiblingNodeAddressBounds` (asserting the gRPC columns share
|
||
the bound of the `NodeAAddress`/`NodeBAddress` siblings).
|
||
|
||
### ConfigurationDatabase-007 — `AuditService` does not handle JSON-serialization failure of arbitrary `afterState`
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Error handling & resilience |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Services/AuditService.cs:28-30` |
|
||
|
||
**Description**
|
||
|
||
`LogAsync` serializes the caller-supplied `afterState` object with
|
||
`JsonSerializer.Serialize(afterState)` using default options. EF entity POCOs commonly
|
||
have navigation properties; serializing an entity that has loaded navigations (e.g. a
|
||
`Template` with `Attributes`/`Scripts`, or any entity with a cycle) will throw
|
||
`JsonException` for a reference cycle or produce a very large payload. Because audit
|
||
writes are designed to commit in the same transaction as the change, a serialization
|
||
exception thrown here will roll back the *entire* business operation — a template
|
||
update fails because its audit entry could not be serialized. This couples audit
|
||
robustness to the shape of every entity passed in.
|
||
|
||
**Recommendation**
|
||
|
||
Configure `JsonSerializerOptions` with `ReferenceHandler.IgnoreCycles` (or
|
||
`Preserve`) and a sensible `MaxDepth`, and consider serializing a projected
|
||
DTO/snapshot rather than the live tracked entity. Decide explicitly whether an audit
|
||
serialization failure should fail the operation or be logged and degraded gracefully,
|
||
and document that decision against the design doc's transactional-guarantee section.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: `LogAsync`
|
||
called `JsonSerializer.Serialize(afterState)` with default options, so any `afterState`
|
||
graph containing a reference cycle threw `JsonException` — and because the audit entry
|
||
commits in the same transaction as the change it records, that exception rolled back
|
||
the entire business operation.
|
||
|
||
Fix applied per the recommendation: `AuditService` now serializes via a static
|
||
`JsonSerializerOptions` configured with `ReferenceHandler.IgnoreCycles` and
|
||
`MaxDepth = 32`. The serialization is additionally wrapped in a `SerializeAfterState`
|
||
helper that catches a residual `JsonException`/`NotSupportedException` and substitutes a
|
||
small diagnostic placeholder JSON (`AuditSerializationError` + `StateType`) — an explicit
|
||
decision that an audit-serialization failure must **degrade gracefully** and never roll
|
||
back the audited operation. The audit entry is always recorded; the design doc's
|
||
transactional-guarantee section ("if the change succeeds, the audit entry is always
|
||
recorded") is thereby honoured even for pathological state objects. Regression test
|
||
added in `AuditServiceTests.cs`:
|
||
`LogAsync_AfterStateWithReferenceCycle_DoesNotThrow_AndDoesNotRollBackOperation`.
|
||
|
||
### ConfigurationDatabase-008 — `GetApprovedKeysForMethodAsync` CSV parsing silently drops malformed ids
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/InboundApiRepository.cs:46-58` |
|
||
|
||
**Description**
|
||
|
||
`ApiMethod.ApprovedApiKeyIds` is stored as a comma-separated string of integer ids.
|
||
`GetApprovedKeysForMethodAsync` splits it, maps each token with
|
||
`int.TryParse(...) ? id : -1`, then filters with `id > 0`. Any token that fails to
|
||
parse, or a legitimately negative/zero id, is silently discarded. If `ApprovedApiKeyIds`
|
||
becomes corrupt (e.g. a stray name instead of an id), the method quietly returns fewer
|
||
approved keys than expected, which for an API-key authorization path means a method may
|
||
unexpectedly reject a key that should be approved. Storing a relational many-to-many as
|
||
a CSV string in a column is itself fragile (no FK integrity, no cascade on key delete).
|
||
|
||
**Recommendation**
|
||
|
||
Short term: log a warning when a token fails to parse instead of silently dropping it,
|
||
so corruption is observable. Longer term: replace the CSV column with a proper join
|
||
table (`ApiMethodApprovedKey`) with foreign keys to `ApiMethod` and `ApiKey`, which
|
||
gives referential integrity and correct cascade behaviour when an API key is deleted.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
|
||
`GetApprovedKeysForMethodAsync` mapped each CSV token with
|
||
`int.TryParse(...) ? id : -1` then filtered `id > 0`, so any unparseable (or
|
||
non-positive) token was discarded with no signal — a corrupt `ApprovedApiKeyIds` value
|
||
silently approves fewer keys than intended, an authorization-relevant outcome.
|
||
|
||
Applied the recommendation's short-term fix: the parse loop was rewritten to log a
|
||
warning for every token that fails to parse to a positive integer, naming the method id
|
||
and the offending token, so corruption is observable in logs. Valid ids still resolve
|
||
normally. `InboundApiRepository` gained an optional `ILogger<InboundApiRepository>`
|
||
constructor parameter (defaulting to `NullLogger`, matching the `MigrationHelper`
|
||
pattern) and the project now references `Microsoft.Extensions.Logging.Abstractions`. The
|
||
longer-term join-table redesign would change the `ApiMethod` entity / schema and the
|
||
`IInboundApiRepository` contract (Commons, out of this module's scope) and is left as a
|
||
future schema-design item. Regression tests added in `InboundApiRepositoryTests.cs`:
|
||
`GetApprovedKeysForMethod_WithMalformedCsvToken_LogsWarningAndDropsToken`,
|
||
`GetApprovedKeysForMethod_WithValidCsv_ReturnsAllKeys`, and
|
||
`GetApprovedKeysForMethod_WithNullOrEmptyCsv_ReturnsEmptyWithoutWarning` (using a
|
||
capturing `ILogger` to assert the warning is emitted only on malformed input).
|
||
|
||
### ConfigurationDatabase-009 — Multi-collection eager loads issue cartesian-product queries
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Performance & resource management |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs:43-51,53-61`, `src/ScadaLink.ConfigurationDatabase/Repositories/CentralUiRepository.cs:45-55` |
|
||
|
||
**Description**
|
||
|
||
`GetAllTemplatesAsync`, `GetTemplatesComposingAsync`, and `GetTemplateTreeAsync` each
|
||
`Include` three-to-four sibling collections (`Attributes`, `Alarms`, `Scripts`,
|
||
`Compositions`) in a single query. EF Core's default single-query strategy produces a
|
||
cartesian-product join across those collections, so a template with N attributes, M
|
||
alarms, and K scripts yields N×M×K rows that EF must then de-duplicate. For templates
|
||
with many members this materially inflates the result set and query time.
|
||
`GetInstanceByIdAsync`/`GetAllInstancesAsync` have the same shape with three
|
||
collections.
|
||
|
||
**Recommendation**
|
||
|
||
Add `.AsSplitQuery()` to these multi-collection-include queries (or set
|
||
`UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery)` globally in
|
||
`AddConfigurationDatabase`) so each collection is loaded with a separate query and the
|
||
cartesian explosion is avoided.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
|
||
`GetAllTemplatesAsync` and `GetTemplatesComposingAsync` (`TemplateEngineRepository`) and
|
||
`GetTemplateTreeAsync` (`CentralUiRepository`) each `Include` three-to-four sibling
|
||
collections in a single query, producing a cartesian-product join. The same shape was
|
||
also present in `GetTemplateByIdAsync`, `GetInstanceByIdAsync`, `GetAllInstancesAsync`,
|
||
`GetInstancesBySiteIdAsync`, and `GetInstanceByUniqueNameAsync`.
|
||
|
||
Applied the recommendation's per-query option: `.AsSplitQuery()` was added to every
|
||
multi-collection-include query in `TemplateEngineRepository` (eight call sites) and to
|
||
`GetTemplateTreeAsync` in `CentralUiRepository`, so each collection loads with its own
|
||
query and the cartesian explosion is avoided. Per-query `AsSplitQuery()` was preferred
|
||
over a global `UseQuerySplittingBehavior` so single-collection queries elsewhere keep
|
||
the cheaper single-query plan. Split queries change query *shape* only, not results;
|
||
regression tests added in `SchemaConfigurationTests.cs` pin that behaviour:
|
||
`GetAllTemplatesAsync_WithMultipleMembersPerCollection_LoadsAllWithoutDuplication`
|
||
(a template with 3 attributes, 2 alarms, 4 scripts must return exactly those counts —
|
||
not a 24-row cartesian product) and
|
||
`GetTemplateByIdAsync_WithMultipleMembers_LoadsAllCollections`.
|
||
|
||
### ConfigurationDatabase-010 — Several repositories and `InstanceLocator` lack direct test coverage
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs`, `Repositories/DeploymentManagerRepository.cs`, `Repositories/ExternalSystemRepository.cs`, `Repositories/InboundApiRepository.cs`, `Repositories/NotificationRepository.cs`, `Repositories/SiteRepository.cs`, `Services/InstanceLocator.cs` |
|
||
|
||
**Description**
|
||
|
||
The test project covers `SecurityRepository`, `CentralUiRepository`, `AuditService`,
|
||
optimistic concurrency, seed data, and Data Protection persistence. There are no direct
|
||
tests for `TemplateEngineRepository` (the largest repository, and the one with the
|
||
CD-001 bug, which a test would have caught), `DeploymentManagerRepository` (including
|
||
its `Local`-then-stub delete fallback and the `DeleteInstanceAsync`
|
||
restrict-FK-cleanup logic), `ExternalSystemRepository`, `InboundApiRepository` (notably
|
||
`GetApprovedKeysForMethodAsync` CSV parsing — CD-008), `NotificationRepository`,
|
||
`SiteRepository` (including its stub-attach delete path), or `InstanceLocator`.
|
||
|
||
**Recommendation**
|
||
|
||
Add repository-level tests using the existing `SqliteTestHelper` pattern, covering at
|
||
minimum: CRUD round-trips, the stub-attach delete fallbacks in
|
||
`DeploymentManagerRepository`/`SiteRepository`, `DeleteInstanceAsync`'s explicit
|
||
deployment-record cleanup, `GetApprovedKeysForMethodAsync` with valid/malformed CSV,
|
||
and `InstanceLocator.GetSiteIdForInstanceAsync` for found/not-found cases.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Direct repository/service tests were added using
|
||
the existing `SqliteTestHelper` pattern. `InboundApiRepositoryTests.cs` covers
|
||
`InboundApiRepository` (API-key/method CRUD round-trips and the
|
||
`GetApprovedKeysForMethodAsync` valid/malformed/empty-CSV cases — see CD-008).
|
||
`RepositoryCoverageTests.cs` adds `ExternalSystemRepositoryTests` (definition/method CRUD,
|
||
parent-filtered method query, database-connection delete), `NotificationRepositoryTests`
|
||
(notification-list-with-recipients and SMTP-configuration round-trips, list delete),
|
||
`SiteRepositoryTests` (site/identifier round-trip plus the stub-attach delete fallback
|
||
exercised for both `DeleteSiteAsync` and `DeleteDataConnectionAsync` by clearing the
|
||
ChangeTracker, and the site-filtered instance query), `DeploymentManagerRepositoryTests`
|
||
(deployment-record CRUD and `GetCurrentDeploymentStatusAsync` ordering, the stub-attach
|
||
`DeleteDeploymentRecordAsync` fallback, and `DeleteInstanceAsync`'s explicit
|
||
Restrict-FK deployment-record cleanup), and `InstanceLocatorTests`
|
||
(`GetSiteIdForInstanceAsync` for the found and not-found cases). `TemplateEngineRepository`
|
||
gained the CD-001 and CD-009 regression tests
|
||
(`TemplateEngineRepositoryTests.cs`, `SchemaConfigurationTests.cs`). A constructor
|
||
null-guard test was added for each of the five repositories/services covered, doubling
|
||
as the CD-011 regression guard. The full module suite is green.
|
||
|
||
### ConfigurationDatabase-011 — Inconsistent constructor null-guarding across repositories/services
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/ExternalSystemRepository.cs:11-14`, `Repositories/InboundApiRepository.cs:11-14`, `Repositories/NotificationRepository.cs:11-14`, `Services/InstanceLocator.cs:13-16` |
|
||
|
||
**Description**
|
||
|
||
`SecurityRepository`, `CentralUiRepository`, `TemplateEngineRepository`,
|
||
`DeploymentManagerRepository`, `SiteRepository`, and `AuditService` all guard their
|
||
injected `ScadaLinkDbContext` with `?? throw new ArgumentNullException(...)`.
|
||
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`, and
|
||
`InstanceLocator` assign the constructor argument directly with no guard. This is a
|
||
minor consistency/maintainability issue: although the DI container will not normally
|
||
supply null, the divergence makes the codebase look unfinished and means a future
|
||
hand-constructed instance fails with a less informative `NullReferenceException` later.
|
||
|
||
**Recommendation**
|
||
|
||
Apply the same `?? throw new ArgumentNullException(nameof(context))` guard in the four
|
||
inconsistent constructors so all data-access types behave uniformly.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
|
||
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`, and
|
||
`InstanceLocator` assigned the injected `ScadaLinkDbContext` directly with no null
|
||
guard, diverging from `SecurityRepository`/`CentralUiRepository`/`TemplateEngineRepository`/
|
||
`DeploymentManagerRepository`/`SiteRepository`/`AuditService`. Applied the recommendation:
|
||
all four constructors now use `context ?? throw new ArgumentNullException(nameof(context))`
|
||
(`InboundApiRepository`'s guard was added as part of its CD-008 constructor change), so
|
||
every data-access type behaves uniformly and a hand-constructed instance fails with an
|
||
informative exception at construction rather than a later `NullReferenceException`.
|
||
Regression: `Constructor_NullContext_Throws` tests were added for all four affected types
|
||
(`InboundApiRepositoryTests.cs`, `RepositoryCoverageTests.cs`).
|
||
|
||
### ConfigurationDatabase-012 — Inbound-API `ApiKey.KeyValue` bearer credential stored in plaintext
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/InboundApiConfiguration.cs:17-19` |
|
||
|
||
**Description**
|
||
|
||
`ApiKey.KeyValue` is the bearer credential presented in the `X-API-Key` header to
|
||
authenticate Inbound API requests (HighLevelReqs §7.2–7.3). It is mapped as an
|
||
ordinary `nvarchar(500)` column with a unique index and persisted verbatim. Anyone
|
||
with read access to the configuration database — or to any `AuditLogEntry.AfterStateJson`
|
||
into which an `ApiKey` entity is serialized — obtains live API credentials in cleartext.
|
||
|
||
`ConfigurationDatabase-004` introduced encryption-at-rest for the other secret-bearing
|
||
columns (SMTP credentials, external-system auth config, database connection strings)
|
||
but explicitly scoped `ApiKey.KeyValue` out. The omission is genuine: the
|
||
`EncryptedStringConverter` built for CD-004 is backed by ASP.NET Data Protection, which
|
||
is **non-deterministic** — the same plaintext encrypts to different ciphertext each
|
||
time — so it cannot be applied here, because `GetApprovedKeysForMethodAsync` and the
|
||
authentication path resolve a key by its value (`GetApiKeyByValueAsync` does
|
||
`FirstOrDefaultAsync(k => k.KeyValue == keyValue)`). A non-deterministic converter would
|
||
break that equality lookup. The result is that the one credential most exposed to
|
||
external callers is the one credential left unprotected.
|
||
|
||
**Recommendation**
|
||
|
||
Store a salted cryptographic hash of the key value instead of the plaintext (or
|
||
ciphertext): hash on create, and authenticate by hashing the presented key and
|
||
comparing. This keeps the equality lookup working (the hash is deterministic) while
|
||
ensuring the database never holds a usable credential. The plaintext key would then
|
||
only ever be shown once, at creation time, to the Admin who created it. This requires
|
||
a coordinated change with the Inbound API / Security components and the `ApiKey`
|
||
entity in Commons; record the chosen scheme in
|
||
`docs/requirements/Component-ConfigurationDatabase.md` and the Inbound API design doc.
|
||
At minimum, ensure `ApiKey` entities are never passed to `IAuditService` without the
|
||
key value redacted.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-17 (commit `<pending>`) — approved cross-module change spanning
|
||
Commons + ConfigurationDatabase + InboundAPI + ManagementService. The inbound-API
|
||
bearer credential is now persisted as a deterministic HMAC-SHA256 keyed hash, never
|
||
as plaintext, so a configuration-database dump no longer yields usable API keys
|
||
while the constant-time by-value lookup still works.
|
||
|
||
Root cause confirmed against source: `ApiKey.KeyValue` was a plaintext
|
||
`nvarchar(500)` column with a unique index, holding the live `X-API-Key` credential.
|
||
|
||
Design implemented — **deterministic keyed hash** (the recommendation's first option):
|
||
|
||
- **Hash scheme.** HMAC-SHA256 over the UTF-8 key bytes, keyed with a server-side
|
||
*pepper*; the digest is stored Base64-encoded. No per-row random salt — an API key
|
||
is already a high-entropy random token, and a random salt would break the
|
||
deterministic by-value lookup the authentication path relies on. The pepper instead
|
||
binds every hash to the deployment. Implemented as `IApiKeyHasher` / `ApiKeyHasher`
|
||
in `ScadaLink.Commons` (`Types/InboundApi/ApiKeyHasher.cs`); the constructor
|
||
rejects a missing or weak (`< 16`-char) pepper with `ArgumentException` — fail-fast.
|
||
- **Where the pepper lives.** `InboundApiOptions.ApiKeyPepper`, a component-owned
|
||
Options class already bound from the `ScadaLink:InboundApi` configuration section
|
||
(Options pattern); it is never hard-coded. `AddInboundAPI` registers `IApiKeyHasher`
|
||
via a factory that reads the bound options, so a missing/weak pepper fails the
|
||
deployment fast rather than degrading silently. (Operators must supply the pepper
|
||
via configuration / a secret store; `appsettings*.json` is intentionally left
|
||
without a value.)
|
||
- **Commons.** `ApiKey` drops the persisted `KeyValue` property and gains a persisted
|
||
`KeyHash` — the plaintext is never a field on the entity. `ApiKey.FromHash(name,
|
||
keyHash)` creates an entity from a precomputed hash (the production create-path);
|
||
the existing `ApiKey(name, keyValue)` constructor is retained as a convenience that
|
||
hashes its input with the unpeppered default hasher.
|
||
- **ConfigurationDatabase.** EF mapping maps `KeyHash` (`nvarchar(256)`, required,
|
||
unique index `IX_ApiKeys_KeyHash`); migration `20260517073000_HashApiKeyValue`
|
||
drops `KeyValue`/`IX_ApiKeys_KeyValue`, adds `KeyHash`, and deletes existing
|
||
`ApiKeys` rows (a one-way hash cannot recover plaintext). `GetApiKeyByValueAsync`
|
||
now hashes the supplied value and looks up by `KeyHash`.
|
||
- **InboundAPI.** `ApiKeyValidator` takes an `IApiKeyHasher`; `FindKeyConstantTime`
|
||
hashes the presented candidate and compares hashes via
|
||
`CryptographicOperations.FixedTimeEquals` — the comparison stays constant-time.
|
||
- **ManagementService.** `HandleCreateApiKey` generates a random 32-byte key, stores
|
||
only its peppered hash, and returns the plaintext to the caller exactly once;
|
||
`HandleListApiKeys` no longer exposes the hash (returns id/name/enabled only).
|
||
|
||
TDD regression tests (written first, confirmed failing, then passing):
|
||
`ApiKeyHasherTests` + `ApiKeyTests` (Commons — determinism, pepper fail-fast, no
|
||
plaintext property), `ApiKeyHashValidationTests` (InboundAPI — auth succeeds via
|
||
hash, wrong key / wrong-pepper fail), `ApiKeyCreationTests` (ManagementService — only
|
||
the hash is persisted, plaintext returned once), and `SchemaConfigurationTests`
|
||
additions (ConfigurationDatabase — `KeyHash` mapped/uniquely-indexed, no `KeyValue`
|
||
column). `dotnet build` of all five touched modules is clean and their test suites
|
||
are green.
|
||
|
||
**Operational consequence: every existing API key must be re-issued.** A hash is
|
||
one-way, so the `HashApiKeyValue` migration deletes existing key rows; after applying
|
||
it, operators must create new keys (CLI / Management API / Central UI), distribute
|
||
the one-time plaintext to callers, and re-approve the new keys on their methods.
|
||
|
||
Follow-up (outside this task's editable scope): record the hash scheme in
|
||
`docs/requirements/Component-ConfigurationDatabase.md` and the Inbound API design
|
||
doc, and update the Central UI API-keys page, which previously displayed a masked
|
||
`KeyValue`.
|
||
|
||
### ConfigurationDatabase-013 — Secret-column encryption silently falls back to an ephemeral (throwaway) key
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Error handling & resilience |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs:107-124` |
|
||
|
||
**Description**
|
||
|
||
`ApplySecretColumnEncryption` resolves the Data Protection provider as
|
||
`_dataProtectionProvider ?? new EphemeralDataProtectionProvider()`. The `??` fallback
|
||
is reached whenever the context is constructed via the single-argument
|
||
`ScadaLinkDbContext(DbContextOptions)` constructor — i.e. whenever no provider was
|
||
injected. An `EphemeralDataProtectionProvider` generates a key ring that lives only in
|
||
process memory and is discarded at process exit.
|
||
|
||
For design-time `dotnet ef` tooling this is harmless (the XML remark correctly notes
|
||
it only emits schema). The risk is on a *runtime write path*. The runtime currently
|
||
gets the provider-bearing context only because `AddConfigurationDatabase` adds an
|
||
`AddScoped` factory registration that overrides EF's activator-based registration.
|
||
That override is the single thing standing between correct behaviour and silent data
|
||
corruption: any future change that resolves a `ScadaLinkDbContext` through a path the
|
||
override does not cover — an `AddPooledDbContextFactory`/`IDbContextFactory<ScadaLinkDbContext>`
|
||
registration, a second `AddDbContext` call, a hand-constructed context in server code —
|
||
would construct the context with the single-arg constructor, encrypt secret columns
|
||
with a throwaway key, and persist ciphertext that becomes **permanently undecryptable
|
||
the moment the process restarts**. There is no exception, no warning; the failure only
|
||
surfaces later as `CryptographicException` on read (mis-attributed by
|
||
`EncryptedStringConverter` to "the stored value was not written by this system").
|
||
|
||
**Recommendation**
|
||
|
||
Do not silently substitute an ephemeral provider for write-capable contexts. Either:
|
||
(a) require the provider unconditionally and have design-time tooling pass an explicit
|
||
ephemeral provider so the intent is visible at the call site; or (b) keep the
|
||
single-arg constructor but mark contexts built without a real provider as
|
||
schema-only — e.g. record a flag and have the encrypting converter throw a clear
|
||
`InvalidOperationException` ("secret columns cannot be written without a configured
|
||
Data Protection key ring") on the first `Protect`, instead of producing throwaway
|
||
ciphertext. Also harden the DI wiring so a `ScadaLinkDbContext` cannot be resolved
|
||
through the EF-activator registration at all (e.g. register only the factory, or use
|
||
`AddDbContextFactory` with the explicit constructor).
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-17 (commit pending). Root cause confirmed against source:
|
||
`ApplySecretColumnEncryption` resolved the provider as
|
||
`_dataProtectionProvider ?? new EphemeralDataProtectionProvider()`, so a context built
|
||
via the single-argument constructor would silently encrypt secret columns with a
|
||
throwaway key and persist permanently undecryptable ciphertext with no error.
|
||
|
||
Applied the recommendation — a combination of options (a) and (b). The silent
|
||
ephemeral substitution was removed: a no-provider context now builds its model with an
|
||
internal `SchemaOnlyDataProtector` that satisfies schema generation but throws a clear
|
||
`InvalidOperationException` if it is ever asked to `Protect`/`Unprotect`. On top of
|
||
that, `SaveChanges`/`SaveChangesAsync` were overridden with a `GuardSecretWritesHaveAKeyRing`
|
||
check that fails fast — before any database round-trip — with a clear
|
||
`InvalidOperationException` (naming the offending `Entity.Property`) whenever a
|
||
no-provider context has a pending non-null write to one of the three secret-bearing
|
||
columns. A custom `SecretAwareModelCacheKeyFactory` (registered via
|
||
`OnConfiguring`/`ReplaceService`) folds the presence of a real provider into EF's model
|
||
cache key, so a write-capable and a schema-only context can no longer share one cached
|
||
model. Design-time `dotnet ef` tooling (which only emits schema) and the test fixture
|
||
both keep working — `SqliteTestDbContext` now passes an explicit
|
||
`EphemeralDataProtectionProvider`, making the test intent visible at the call site as
|
||
recommendation (a) asks. Regression tests added in `EphemeralEncryptionFallbackTests.cs`:
|
||
`SingleArgConstructor_WritingSecretColumn_FailsFast_DoesNotPersistThrowawayCiphertext`,
|
||
`SingleArgConstructor_WritingNonSecretColumn_Succeeds`, and
|
||
`ProviderConstructor_WritingSecretColumn_StillSucceeds`. The DI-hardening sub-point
|
||
(register only the factory) was not adopted — `AddConfigurationDatabase` already
|
||
overrides the EF-activator registration with a provider-bearing factory, and the
|
||
fail-fast guard now closes the residual gap for any other resolution path.
|
||
|
||
### ConfigurationDatabase-014 — Redundant, inconsistent cast on one `HasConversion` call
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs:121-123` |
|
||
|
||
**Description**
|
||
|
||
`ApplySecretColumnEncryption` calls `HasConversion(converter)` three times. The first
|
||
two (`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`)
|
||
pass `converter` directly; the third (`DatabaseConnectionDefinition.ConnectionString`)
|
||
casts it to `(Microsoft.EntityFrameworkCore.Storage.ValueConversion.ValueConverter)`.
|
||
`EncryptedStringConverter` already derives from `ValueConverter<string?, string?>`
|
||
(itself a `ValueConverter`), and the first two call sites compile fine without the
|
||
cast, so the cast is redundant. The inconsistency makes the code look as though the
|
||
third call needs special handling when it does not, and the fully-qualified type name
|
||
inline adds noise.
|
||
|
||
**Recommendation**
|
||
|
||
Remove the cast so all three calls read identically as `HasConversion(converter)`.
|
||
If a `ValueConverter`-typed reference is genuinely wanted, give it a local variable of
|
||
that type once and use it for all three.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-17 (commit pending). Partially re-triaged: the finding is correct
|
||
that the inline fully-qualified cast was inconsistent and noisy, but its premise that
|
||
the cast is *purely redundant* is wrong. `DatabaseConnectionDefinition.ConnectionString`
|
||
is a non-nullable `string`; `EncryptedStringConverter` is `ValueConverter<string?,
|
||
string?>`, so calling `HasConversion(converter)` with `converter` typed as
|
||
`EncryptedStringConverter` on that property raises a real `CS8620` nullability
|
||
error (verified by build). The cast to the non-generic `ValueConverter` base was
|
||
suppressing that — load-bearing, not noise. Applied the recommendation's second
|
||
option: the converter is now held once in a single `ValueConverter`-typed local
|
||
(with an explanatory comment), and all three `HasConversion(converter)` calls read
|
||
identically with no inline cast or fully-qualified name. Behaviour is unchanged, so
|
||
no behavioural regression test is meaningful (cf. CD-005); a forward guard was added
|
||
in `SchemaConfigurationTests.cs` —
|
||
`SecretColumns_AllHaveEncryptedStringConverterApplied` (theory over all three secret
|
||
columns) — asserting each column keeps an `EncryptedStringConverter`.
|
||
|
||
### ConfigurationDatabase-015 — `NotificationOutboxRepository.InsertIfNotExistsAsync` is a check-then-act race with no duplicate-key catch
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | High |
|
||
| Category | Concurrency & thread safety |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45` |
|
||
|
||
**Resolution** — rewrote `InsertIfNotExistsAsync` as a single raw-SQL
|
||
`IF NOT EXISTS (...) INSERT` matching the
|
||
`AuditLogRepository.InsertIfNotExistsAsync` and
|
||
`SiteCallAuditRepository.UpsertAsync` patterns, with a `SqlException`
|
||
catch on numbers 2601 (unique-index violation) and 2627
|
||
(primary-key/unique-constraint violation) returning `false`. Concurrent
|
||
losers are logged at Debug and treated as no-ops, eliminating the
|
||
site-retry livelock. Two SQLite-targeted assertions in
|
||
`RepositoryCoverageTests` were migrated to a new MS SQL-fixture file
|
||
`tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/NotificationOutboxRepositoryIntegrationTests.cs`,
|
||
which also adds a 50-way parallel race test verifying exactly one row
|
||
lands and no exception bubbles.
|
||
|
||
**Description**
|
||
|
||
`InsertIfNotExistsAsync` does `AnyAsync(x => x.NotificationId == n.NotificationId)`,
|
||
then — if false — `AddAsync` + `SaveChangesAsync`. There is a check-then-act window
|
||
between the two operations: two sessions can both pass the `AnyAsync` check and both
|
||
attempt the INSERT, and the loser surfaces as a uniqueness violation on the
|
||
`NotificationId` primary key wrapped in a `DbUpdateException` / `SqlException` (error
|
||
2627). The site→central handoff for notifications is documented as **at-least-once
|
||
with ack-after-persist plus insert-if-not-exists**; collisions on the same
|
||
`NotificationId` are therefore not a "should never happen" but the *expected* contention
|
||
mode. As written, the second concurrent ack throws, fails the site→central
|
||
acknowledgement, and the site retries the same row again on its next forward — a
|
||
livelock if the contending pair keeps racing.
|
||
|
||
The sibling raw-SQL `IF NOT EXISTS … INSERT` paths in `AuditLogRepository.InsertIfNotExistsAsync`
|
||
(see SqlErrorUniqueIndexViolation / SqlErrorPrimaryKeyViolation handling at
|
||
`AuditLogRepository.cs:74-89`) and `SiteCallAuditRepository.UpsertAsync`
|
||
(`SiteCallAuditRepository.cs:87-96`) explicitly catch errors 2601/2627 and treat the
|
||
loser as a no-op — exactly the right pattern for "first-write-wins idempotent ingest".
|
||
This repository alone does not.
|
||
|
||
**Recommendation**
|
||
|
||
Either (a) rewrite the body as a single raw-SQL `IF NOT EXISTS … INSERT` and apply the
|
||
same 2601/2627 catch-and-log-Debug pattern the AuditLog and SiteCall repositories use,
|
||
or (b) wrap the existing flow in a try/catch around `SaveChangesAsync` that inspects
|
||
the inner `SqlException.Number` and returns `false` (i.e. "another writer won the race")
|
||
on 2601/2627. Option (a) is preferable because it collapses the two round-trips to one
|
||
and matches the established idempotent-ingest pattern used elsewhere in the module.
|
||
Add a regression test that simulates two concurrent `InsertIfNotExistsAsync` calls
|
||
(using two open contexts) for the same `NotificationId` and asserts neither call
|
||
throws and exactly one row lands.
|
||
|
||
### ConfigurationDatabase-016 — `InboundApiRepository.GetApiKeyByValueAsync` hashes the candidate with the unpeppered `ApiKeyHasher.Default`
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/InboundApiRepository.cs:35-39` |
|
||
|
||
**Resolution (2026-05-28):** Took option (a) — `InboundApiRepository` ctor now
|
||
accepts `Func<IApiKeyHasher>? hasherAccessor = null` (deferred resolution to
|
||
sidestep startup-time pepper-validation in test composition roots that don't
|
||
configure one). `GetApiKeyByValueAsync` calls `_hasherAccessor()` so the
|
||
hash matches the production peppered digest. `AddConfigurationDatabase`
|
||
registers the repository with a factory wiring
|
||
`() => sp.GetService<IApiKeyHasher>() ?? ApiKeyHasher.Default` — the
|
||
peppered hasher is used when available, falling back to Default only when
|
||
none is registered (legacy / test composition). Regression test
|
||
`CD016_GetApiKeyByValue_UsesInjectedPepperedHasher_NotDefault` asserts
|
||
positively (peppered roundtrip) and negatively (Default hasher misses the
|
||
key entirely, proving the lookup uses the injected hasher).
|
||
|
||
**Description**
|
||
|
||
`GetApiKeyByValueAsync` resolves an API key by its presented plaintext value by hashing
|
||
the candidate and looking up `KeyHash`. The hash, however, is computed with the static
|
||
`ApiKeyHasher.Default` (the fixed, deployment-independent unpeppered hasher used for
|
||
tests). Production key creation uses the DI-registered, *peppered* `IApiKeyHasher`
|
||
constructed from `InboundApiOptions.ApiKeyPepper` (see CD-012 resolution and
|
||
`ApiKeyHasher.ctor(string pepper)`), so the stored `KeyHash` of any real key was
|
||
produced under the deployment pepper. Hashing the candidate with the unpeppered
|
||
`Default` yields a different digest, and the `WHERE KeyHash = @hash` lookup will never
|
||
match a real key.
|
||
|
||
The production `ApiKeyValidator` (InboundAPI module) deliberately does NOT call this
|
||
method — it fetches all keys and runs a constant-time comparison via the
|
||
DI-registered hasher (`ApiKeyValidator.cs:53-64`) — so the immediate
|
||
authentication path is unaffected. But `GetApiKeyByValueAsync` remains a publicly
|
||
exposed `IInboundApiRepository` member; any new caller (a future admin tool, a CLI
|
||
command, a test) that uses it under a peppered configuration will silently get a
|
||
`null` result for an existing, valid key, and almost certainly mis-route the failure
|
||
as "key not found".
|
||
|
||
**Recommendation**
|
||
|
||
Either (a) take `IApiKeyHasher` via constructor injection — alongside the existing
|
||
`ScadaLinkDbContext` and optional `ILogger` — and use it here so the repository
|
||
participates in the same peppered scheme as the rest of the system; or (b) delete
|
||
the method from both the implementation and `IInboundApiRepository` (Commons) on the
|
||
grounds that the production authentication path correctly avoids it for timing
|
||
reasons and there is no remaining valid caller. Add a regression test that constructs
|
||
the repository under a real `ApiKeyHasher("a-strong-pepper-value")`, inserts an
|
||
`ApiKey.FromHash(...)` using the same hasher, and asserts `GetApiKeyByValueAsync`
|
||
returns the row — under option (a) it should pass; under option (b) the method no
|
||
longer exists.
|
||
|
||
### ConfigurationDatabase-017 — Stub-attach delete on `DeploymentRecord` bypasses optimistic concurrency
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Concurrency & thread safety |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/DeploymentManagerRepository.cs:83-97` |
|
||
|
||
**Resolution (2026-05-28):**
|
||
`IDeploymentManagerRepository.DeleteDeploymentRecordAsync` now requires a `byte[] expectedRowVersion`
|
||
argument. The repository seeds `entry.OriginalValues["RowVersion"]` on both the Local-tracked and
|
||
stub-attach branches so EF emits `DELETE ... WHERE Id = @id AND RowVersion = @prior` and surfaces a
|
||
concurrent edit as `DbUpdateConcurrencyException`. A new SQLite regression test
|
||
`DeleteDeploymentRecord_StaleRowVersion_ThrowsConcurrencyException` in `ConcurrencyTests`
|
||
(backed by a dedicated `RowVersionConcurrencyTestDbContext` that keeps `RowVersion` as a
|
||
caller-managed concurrency token) asserts the exception fires on a stale token; the existing
|
||
`DeleteDeploymentRecord_ViaStubAttachPath_RemovesEntity` was updated to pass the fresh RowVersion.
|
||
|
||
**Description**
|
||
|
||
`DeploymentRecord` carries a SQL Server `rowversion` concurrency token (declared
|
||
in `DeploymentConfiguration` and confirmed by `ConcurrencyTests`), per the design
|
||
doc's "Optimistic concurrency is used on deployment status records". When
|
||
`DeleteDeploymentRecordAsync` falls into its stub-attach branch (no tracked entity
|
||
in `_dbContext.DeploymentRecords.Local` for the given id), it constructs
|
||
`new DeploymentRecord("stub", "stub") { Id = id }`, `Attach`es it, and `Remove`s it.
|
||
The stub's `RowVersion` is left at its default `null` (or `byte[0]`).
|
||
|
||
EF Core's SQL Server provider generates the delete as
|
||
`DELETE FROM DeploymentRecords WHERE Id = @id AND RowVersion = @stubRowVersion` — and
|
||
the stub rowversion is not the row's real rowversion, so on a real SQL Server (with
|
||
`IsRowVersion()` auto-populating the column) the WHERE never matches and `SaveChanges`
|
||
throws `DbUpdateConcurrencyException`. The path is exercised by
|
||
`RepositoryCoverageTests.DeleteDeploymentRecord_ViaStubAttachPath_RemovesEntity` —
|
||
but the test fixture remaps `RowVersion` as a nullable `IsConcurrencyToken()` column
|
||
without auto-population (`SqliteTestHelper.ConfigureForTests`), so the stored
|
||
RowVersion is null AND the stub's RowVersion is null AND the SQLite delete matches.
|
||
Production-shape behaviour is the opposite.
|
||
|
||
The same stub-attach pattern is used on `SystemArtifactDeploymentRecord`,
|
||
`Site`, and `DataConnection`. Those entities have no rowversion token, so the
|
||
production behaviour is correct for them — the issue is specific to
|
||
`DeploymentRecord`.
|
||
|
||
**Recommendation**
|
||
|
||
Replace the stub-attach branch in `DeleteDeploymentRecordAsync` with a real lookup —
|
||
`await _dbContext.DeploymentRecords.FindAsync([id], ct)` then `Remove` if non-null —
|
||
mirroring `DeleteInstanceAttributeOverrideAsync` and `DeleteDeployedSnapshotAsync`.
|
||
This loses the "delete by id without a read" micro-optimisation (a real concern only
|
||
in batched-delete loops) but restores the documented concurrency contract. If the
|
||
optimisation is genuinely required, attach a `DeploymentRecord` with the *caller's*
|
||
known RowVersion (the caller had to fetch the row at some point) and accept the
|
||
`DbUpdateConcurrencyException` as the correct concurrency signal. Add a regression
|
||
test under MS SQL (extend `RepositoryCoverageTests` with a SQL-Server-flavoured
|
||
fixture, or use `MsSqlMigrationFixture`) that asserts the stub-attach delete works
|
||
when the real RowVersion is supplied.
|
||
|
||
### ConfigurationDatabase-018 — `DateTime`-typed `*Utc` columns on `AuditEvent` / `SiteCall` carry no `DateTimeKind` enforcement
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Code organization & conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/AuditLogEntityTypeConfiguration.cs`, `Configurations/SiteCallEntityTypeConfiguration.cs` (mappings for `OccurredAtUtc`, `IngestedAtUtc`, `CreatedAtUtc`, `UpdatedAtUtc`, `TerminalAtUtc`) |
|
||
|
||
**Resolution (2026-05-28):** Added two private static `ValueConverter<DateTime, DateTime>` / `ValueConverter<DateTime?, DateTime?>` UTC-enforcing converters to `AuditLogEntityTypeConfiguration` and applied them to `AuditEvent.OccurredAtUtc` and `AuditEvent.IngestedAtUtc` via `HasConversion(...)`. The converter re-tags `DateTimeKind.Utc` on hydrate (where SQL Server's `datetime2` provider strips the Kind flag) and on write (so a producer-supplied `Kind=Unspecified` literal still lands as UTC in the model cache). Coordinates with the sibling `Commons-019` resolution (init-setter on `AuditEvent` re-tags Kind=Utc at construction). Regression test in `tests/ScadaLink.ConfigurationDatabase.Tests/Configurations/AuditLogEntityTypeConfigurationTests.cs::Configure_UtcConverter_HydratesOccurredAtUtcAsKindUtc` inserts an Unspecified-Kind value, re-reads through a cleared change-tracker, and asserts `Kind == Utc` on both columns. The `SiteCall` mapping is out of scope for this close (sibling component task).
|
||
|
||
**Description**
|
||
|
||
`AuditEvent.OccurredAtUtc` / `IngestedAtUtc` and `SiteCall.CreatedAtUtc` /
|
||
`UpdatedAtUtc` / `TerminalAtUtc` / `IngestedAtUtc` are declared as `DateTime` (not
|
||
`DateTimeOffset`) per the Audit Log #23 spec, with a UTC suffix convention. SQL Server's
|
||
`datetime2` provider strips the `Kind` flag on the wire — values inserted with
|
||
`DateTimeKind.Utc` round-trip as `DateTimeKind.Unspecified` on read. The EF mappings
|
||
add no `HasConversion(...)` to normalise the kind. The sibling Commons module just
|
||
flagged the same pattern as `Commons-019`; in this module the consequence is concrete:
|
||
|
||
- `AuditLogPartitionMaintenance.GetMaxBoundaryAsync` already defends with an explicit
|
||
`DateTime.SpecifyKind(dt, DateTimeKind.Utc)` (see `AuditLogPartitionMaintenance.cs:103-104`).
|
||
That defence is necessary precisely because the EF mapping does not enforce it.
|
||
- `AuditLogRepository.GetPartitionBoundariesOlderThanAsync` does NOT defend — it
|
||
returns `reader.GetDateTime(0)` directly with `Kind=Unspecified` (separate finding
|
||
CD-020).
|
||
- Downstream comparisons like `DateTime.UtcNow` (Kind=Utc) against a re-read
|
||
`OccurredAtUtc` (Kind=Unspecified) do not produce a runtime error, but any code
|
||
path that converts via `.ToLocalTime()` or `.ToUniversalTime()` will silently
|
||
interpret an unspecified-kind value as local time and produce wrong results.
|
||
|
||
**Recommendation**
|
||
|
||
Apply a value converter on every `DateTime`-typed `*Utc` column that re-tags the
|
||
`Kind` to `Utc` on read (and asserts/`SpecifyKind` on write to defend against an
|
||
accidental local-kind write). EF Core's built-in
|
||
`UtcValueConverter`-style pattern is a single line per column:
|
||
|
||
```csharp
|
||
builder.Property(e => e.OccurredAtUtc)
|
||
.HasConversion(
|
||
v => v,
|
||
v => DateTime.SpecifyKind(v, DateTimeKind.Utc));
|
||
```
|
||
|
||
Apply uniformly to `AuditEvent` (OccurredAtUtc, IngestedAtUtc), `SiteCall`
|
||
(CreatedAtUtc, UpdatedAtUtc, TerminalAtUtc, IngestedAtUtc), and any other
|
||
`DateTime *Utc` columns added later. Add a regression test that inserts a UTC row,
|
||
re-reads it in a fresh context, and asserts `Kind == DateTimeKind.Utc`. Coordinate
|
||
with the sibling `Commons-019` finding so the resolution is consistent across both
|
||
modules.
|
||
|
||
### ConfigurationDatabase-019 — `EnsureLookaheadAsync` swallows non-idempotent SPLIT failures and continues, creating partition holes
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Error handling & resilience |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Maintenance/AuditLogPartitionMaintenance.cs:181-199` |
|
||
|
||
**Resolution (2026-05-28):** Took option (a) — dropped the `try/catch (SqlException)`
|
||
around the per-month SPLIT loop entirely (and the now-unused
|
||
`using Microsoft.Data.SqlClient`). By class-doc construction the catch could never
|
||
fire for "boundary already exists" (the loop pre-reads max-boundary and only issues
|
||
SPLITs for strictly-greater months), so its only effect was to mask real failures
|
||
(permission revoked, deadlock victim, log full, filegroup full) and let the next
|
||
iteration split the following month — leaving a permanent partition hole. Now any
|
||
`SqlException` propagates out of `EnsureLookaheadAsync`, surfaces to the central
|
||
daily-tick hosted service as an Error, and the next tick retries from the same
|
||
boundary (at-least-once, no holes). Replaced the catch block with an inline comment
|
||
explaining the rationale so a future maintainer doesn't reintroduce it.
|
||
|
||
**Description**
|
||
|
||
`EnsureLookaheadAsync` loops one month at a time from `next` up to `horizon` and
|
||
issues `ALTER PARTITION SCHEME … NEXT USED` + `ALTER PARTITION FUNCTION … SPLIT RANGE`
|
||
per month. The class doc says idempotency is guaranteed by reading the max-boundary
|
||
first and only issuing SPLITs for strictly-greater months — so "boundary already
|
||
exists" (SQL Server msg 7708/7711) cannot occur by construction. Yet the loop wraps
|
||
each iteration in `catch (SqlException ex) { _logger.LogWarning(...); }` and
|
||
continues, with the rationale "the desired end state (boundary present) is satisfied
|
||
by either path."
|
||
|
||
That rationale is correct only for an "already-exists" error — which the pre-check
|
||
makes impossible. Any *other* `SqlException` — a permissions failure (the
|
||
`scadalink_audit_purger` role's `ALTER ON SCHEMA::dbo` revoked or not granted), a
|
||
deadlock victim, a transient connection drop, a transaction log full, an underlying
|
||
filegroup full — leaves the boundary genuinely **not** created, logs a Warning
|
||
(quiet by default in most appenders), and the next iteration tries to SPLIT the
|
||
following month. That split *can* succeed (it is a different range value), creating
|
||
a permanent **hole** in the partition layout: month N never had a partition created,
|
||
month N+1 does, so any future row in month N lands in the partition that previously
|
||
spanned both months and partition-switch purge for month N becomes impossible.
|
||
|
||
The class is the central singleton's daily-tick partition roll-forward, so the hole
|
||
persists until an operator notices it and rebuilds manually — by which point months
|
||
of audit retention may be locked behind the unsplit range.
|
||
|
||
**Recommendation**
|
||
|
||
Either (a) drop the `try/catch` entirely so any SPLIT failure aborts the loop and
|
||
surfaces to the hosted service (the next tick retries — at-least-once with no holes),
|
||
or (b) keep the catch but narrow it to ONLY the
|
||
"boundary-already-exists" errors (SQL Server msg 7708 and 7711) and log at Debug,
|
||
mirroring how `AuditLogRepository.InsertIfNotExistsAsync` narrowly catches 2601/2627.
|
||
Option (a) is preferable: by class-doc construction the catch should never fire, so
|
||
its only effect is to mask the real-failure case. Add tests that simulate a SPLIT
|
||
failure (e.g. a permission denial via a constrained test login) and assert the loop
|
||
aborts after the first failure with no further SPLITs.
|
||
|
||
### ConfigurationDatabase-020 — `GetPartitionBoundariesOlderThanAsync` returns `DateTime` with `Kind=Unspecified`
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs:378-387` |
|
||
|
||
**Resolution (2026-05-28):** Wrapped the `reader.GetDateTime(0)` read with `DateTime.SpecifyKind(..., DateTimeKind.Utc)` so each returned boundary now carries `Kind=Utc`, matching the explicit defensive pattern already in `AuditLogPartitionMaintenance.GetMaxBoundaryAsync`. Added an inline comment explaining the rationale (SQL Server `datetime2` strips Kind through ADO.NET; boundary values are stored UTC). With sibling CD-018 also closed, the EF read path now enforces UTC at the column level — the raw-ADO defence here is belt-and-braces for this method, which bypasses EF entirely.
|
||
|
||
**Description**
|
||
|
||
`GetPartitionBoundariesOlderThanAsync` reads `reader.GetDateTime(0)` and adds the
|
||
raw value to the returned list. SQL Server's `datetime2` materialises as
|
||
`DateTimeKind.Unspecified` on the ADO.NET side (see CD-019), so every returned
|
||
boundary has `Kind=Unspecified`. The sibling `AuditLogPartitionMaintenance.GetMaxBoundaryAsync`
|
||
(`AuditLogPartitionMaintenance.cs:103-104`) explicitly defends against this exact
|
||
issue by calling `DateTime.SpecifyKind(dt, DateTimeKind.Utc)` — exactly because EF /
|
||
ADO.NET strips the kind — but the repository method does not. Callers (the
|
||
`AuditLogPurgeActor`) that compare a returned boundary to `DateTime.UtcNow` get a
|
||
silently wrong comparison if they ever serialise to/from a string with a local-kind
|
||
assumption in between.
|
||
|
||
**Recommendation**
|
||
|
||
Wrap the read with `DateTime.SpecifyKind(reader.GetDateTime(0), DateTimeKind.Utc)`,
|
||
matching the explicit defensive pattern already in
|
||
`AuditLogPartitionMaintenance.GetMaxBoundaryAsync`. Better still: fix CD-019 (a value
|
||
converter on the column) so the defence at the read site is no longer required.
|
||
|
||
### ConfigurationDatabase-021 — `SwitchOutPartitionAsync` interpolates `monthBoundary` / staging table name into raw SQL
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs:192-338` |
|
||
|
||
**Resolution (2026-05-28):** Took the targeted (1) part of the recommendation —
|
||
the `monthBoundary` format string is now `"yyyy-MM-dd HH:mm:ss.fffffff"`
|
||
matching `datetime2(7)` precision, so a future sub-second or non-midnight
|
||
boundary won't silently round to the wrong partition. The staging table name
|
||
is still interpolated (T-SQL DDL identifiers can't be parameterised) but
|
||
remains internally constructed as `$"AuditLog_Staging_{Guid.NewGuid():N}"`
|
||
so SQL injection is structurally impossible. The larger
|
||
`sp_executesql`-with-typed-parameter migration was scoped to a future
|
||
follow-up since the current shape is fully controlled and the precision-
|
||
mismatch hazard was the only practical concern.
|
||
|
||
**Description**
|
||
|
||
`SwitchOutPartitionAsync` builds two large SQL batches via interpolated strings
|
||
(`sampleSql` and `sql`) that include `{monthBoundaryStr}` and `{stagingTableName}`
|
||
directly in the SQL text, and executes them via `ExecuteSqlRawAsync` /
|
||
`cmd.ExecuteScalarAsync`. Both values are constructed inside the method —
|
||
`monthBoundaryStr = monthBoundary.ToUniversalTime().ToString("yyyy-MM-dd HH:mm:ss")`
|
||
and `stagingTableName = $"AuditLog_Staging_{Guid.NewGuid():N}"` — and the formats are
|
||
fully controlled. SQL injection is therefore not possible as the code stands.
|
||
|
||
Two related concerns:
|
||
|
||
1. The format string `"yyyy-MM-dd HH:mm:ss"` truncates fractional seconds. The
|
||
partition function is seeded at `T00:00:00` exactly, so truncation happens to
|
||
produce the right boundary value today. A future change that adds a sub-second
|
||
boundary (or invokes `SwitchOutPartitionAsync` with a non-midnight value) would
|
||
silently round to the wrong partition with no error — and SWITCH PARTITION would
|
||
either fail loudly or succeed against the wrong month. Use
|
||
`"yyyy-MM-dd HH:mm:ss.fffffff"` to match the precision the migration seeds at,
|
||
and the rounding ambiguity disappears.
|
||
2. The pattern of "build a multi-statement DDL batch by string concatenation" is
|
||
robust today only by inspection. A code review tripwire — the CLAUDE.md note "the
|
||
data-access layer must not concatenate SQL" — would catch the pattern earlier;
|
||
converting the batch to a parameterised `sp_executesql` invocation (the inner
|
||
`EXEC sp_executesql @sql` already exists for the SWITCH itself) is the textbook
|
||
safe form even when the input is internally controlled.
|
||
|
||
**Recommendation**
|
||
|
||
(1) Switch `monthBoundaryStr`'s format to `"yyyy-MM-dd HH:mm:ss.fffffff"`. (2)
|
||
Optionally migrate the two batches to fully parameterised `sp_executesql` form so
|
||
the `monthBoundary` value flows as a typed `@boundary datetime2(7)` parameter
|
||
rather than as interpolated text — the only piece that genuinely *cannot* be
|
||
parameterised is the staging table identifier (DDL identifiers are not parameterisable
|
||
in T-SQL), but a server-side `QUOTENAME(@stagingTable)` wrapper covers it. Add a
|
||
regression test that supplies a non-midnight `monthBoundary` value and asserts the
|
||
boundary lookup resolves to the expected partition.
|
||
|
||
### ConfigurationDatabase-022 — Stale "WP-24 Stub level sufficient for diff/staleness support" XML comment on `DeploymentManagerRepository`
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Documentation & comments |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/DeploymentManagerRepository.cs:8-14` |
|
||
|
||
**Description**
|
||
|
||
The class-level XML doc on `DeploymentManagerRepository` reads "WP-24: Stub level
|
||
sufficient for diff/staleness support." WP-24 (Deployment Manager work-package) shipped
|
||
long ago; the repository now covers full `DeploymentRecord` CRUD,
|
||
`SystemArtifactDeploymentRecord` CRUD, `DeployedConfigSnapshot` CRUD, and an
|
||
`Instance` deletion path with explicit Restrict-FK cleanup
|
||
(`DeleteInstanceAsync` at line 210-229). The comment misleads a reader into
|
||
thinking the repository is incomplete and tempts them not to investigate further
|
||
before adding new behaviour. The same module-context observation was noted but
|
||
not raised in the prior review.
|
||
|
||
**Recommendation**
|
||
|
||
Remove the WP-24 line and rewrite the class doc to describe what the repository
|
||
actually does today: EF Core implementation of `IDeploymentManagerRepository`
|
||
covering deployment records, system-artifact deployment records, deployed config
|
||
snapshots, and the Restrict-FK-aware `DeleteInstanceAsync` for the
|
||
deployment pipeline. Cross-reference the optimistic-concurrency contract on
|
||
`DeploymentRecord.RowVersion`.
|
||
|
||
**Resolution (2026-05-28):**
|
||
|
||
Replaced the stale "WP-24: Stub level sufficient for diff/staleness support" XML
|
||
doc with an accurate one-paragraph summary covering `DeploymentRecord` CRUD
|
||
(plus the `RowVersion` optimistic-concurrency contract), `SystemArtifactDeploymentRecord`
|
||
CRUD, `DeployedConfigSnapshot` CRUD, and the Restrict-FK-aware `DeleteInstanceAsync`.
|
||
No behaviour change.
|
||
|
||
### ConfigurationDatabase-023 — `AuditLog` correlation-index name drifts from design doc (`IX_AuditLog_CorrelationId` vs `IX_AuditLog_Correlation`)
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Design-document adherence |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/AuditLogEntityTypeConfiguration.cs:99-101`, `Migrations/20260520142214_AddAuditLogTable.cs:103-107` |
|
||
|
||
**Description**
|
||
|
||
The Component-ConfigurationDatabase design doc lists the AuditLog indexes by name —
|
||
including `IX_AuditLog_Correlation (CorrelationId)` for the "drilldown from a single
|
||
operation" use case. The implemented index name is `IX_AuditLog_CorrelationId` (the
|
||
fluent-config `HasDatabaseName` call and the matching DDL in the migration both use
|
||
the `Id`-suffixed form). The names are syntactically valid SQL Server index names and
|
||
the index does the right work; the drift is cosmetic but it breaks scripted
|
||
maintenance ops that grep for the documented name (e.g. a runbook reindex script).
|
||
|
||
The other four documented index names (`IX_AuditLog_OccurredAtUtc`,
|
||
`IX_AuditLog_Site_Occurred`, `IX_AuditLog_Channel_Status_Occurred`,
|
||
`IX_AuditLog_Target_Occurred`, plus the post-design additions
|
||
`IX_AuditLog_Execution`, `IX_AuditLog_ParentExecution`, `IX_AuditLog_Node_Occurred`)
|
||
agree with the code.
|
||
|
||
**Recommendation**
|
||
|
||
Pick one direction. Updating the design doc to match the code is cheap (one word) and
|
||
preserves the existing migration; renaming the index in the database requires a new
|
||
migration that does `sp_rename`. Document-aligning is the lower-cost option and
|
||
matches the resolution pattern used for CD-005.
|
||
|
||
**Resolution (2026-05-28):**
|
||
|
||
Doc-aligns-to-code, preserving the existing migration. `docs/requirements/Component-AuditLog.md`
|
||
already lists the index as `IX_AuditLog_CorrelationId` (line 109). The stale
|
||
`IX_AuditLog_Correlation` reference in `docs/requirements/Component-ConfigurationDatabase.md`
|
||
line 64 (AuditLog table prose) was updated to `IX_AuditLog_CorrelationId`. Code unchanged;
|
||
index name remains `IX_AuditLog_CorrelationId`.
|
||
|
||
### ConfigurationDatabase-024 — Missing test coverage for SPLIT-RANGE failure-continuation and production-shape rowversion delete
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Status | Open |
|
||
| Location | `tests/ScadaLink.ConfigurationDatabase.Tests/Maintenance/AuditLogPartitionMaintenanceTests.cs`, `tests/.../RepositoryCoverageTests.cs:855-869` |
|
||
|
||
**Description**
|
||
|
||
`AuditLogPartitionMaintenanceTests` exercises the happy-path SPLIT-RANGE behaviour
|
||
(no-op, single-month, three-month, already-exists idempotency) but never simulates a
|
||
SPLIT *failure* — so the catch-and-continue behaviour flagged in CD-019 is
|
||
behaviourally untested. The class is a central singleton driving daily audit purge;
|
||
a regression that turned the failure path into a permanent hole would not surface in
|
||
the test suite.
|
||
|
||
Separately, `RepositoryCoverageTests.DeleteDeploymentRecord_ViaStubAttachPath_RemovesEntity`
|
||
covers the stub-attach delete path under the SQLite test fixture, but the fixture
|
||
remaps `RowVersion` as a nullable concurrency token (`SqliteTestHelper`), so it does
|
||
not exercise the production-shape `IsRowVersion()` auto-population — the actual
|
||
concurrency-token bug flagged in CD-018 cannot show up. There is an
|
||
`MsSqlMigrationFixture` in the test project already (used by the Audit Log migration
|
||
tests); the stub-attach delete deserves a parallel MS-SQL-flavoured test.
|
||
|
||
**Recommendation**
|
||
|
||
(1) Add an `AuditLogPartitionMaintenanceTests` case that constructs a context against
|
||
a constrained login (no `ALTER ON SCHEMA::dbo`), invokes `EnsureLookaheadAsync` for a
|
||
three-month gap, and asserts: only the partition boundaries created BEFORE the
|
||
permissions failure landed remain, and the call aborts cleanly without continuing to
|
||
later months. This pins down the resolution of CD-019. (2) Add a
|
||
`RepositoryCoverageTests` case that uses `MsSqlMigrationFixture` to insert a
|
||
`DeploymentRecord`, clear the change tracker, call `DeleteDeploymentRecordAsync`,
|
||
and assert the row is gone — pinning the resolution of CD-018. Both tests should be
|
||
`[SkippableFact]` so the suite still passes when no MS SQL Server is available.
|