Files
Joseph Doherty c899cb162c refactor: scrub residual ScadaLink refs → ScadaBridge (env vars, config keys, assembly name, SQL login)
Renames the 13 SCADALINK_* runtime env vars → SCADABRIDGE_*, the ScadaLink__
.NET config keys → ScadaBridge__, the stale ScadaLink.Host.exe assembly name
→ ZB.MOM.WW.ScadaBridge.Host.exe, the scadalink_app SQL login → scadabridge_app,
and residual identifiers/comments/docs. Migration records (prior rename
tooling/design, DB-rename helper, this scrub script) carved out.

Adds tools/scrub-scadalink-refs.sh.
2026-05-31 21:50:38 -04:00

1376 lines
82 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — ConfigurationDatabase
| Field | Value |
|-------|-------|
| Module | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase` |
| Design doc | `docs/requirements/Component-ConfigurationDatabase.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 0 |
## Summary
The ConfigurationDatabase module is a focused, conventional EF Core data-access layer:
a single `ScadaBridgeDbContext`, Fluent API entity configurations, eight repository
implementations of Commons-defined interfaces, an `IAuditService` implementation, an
`IInstanceLocator`, environment-aware migration handling, and design-time tooling
support. Overall structure adheres well to the design doc and the CLAUDE.md "Code
Organization" decisions — POCO entities and interfaces live in Commons, EF mappings and
implementations live here, Fluent API only, and optimistic concurrency is correctly
applied to `DeploymentRecord` via `rowversion`. The module is generally healthy.
The main themes across findings are: (1) a genuine logic bug in
`GetTemplateWithChildrenAsync`, which loads child templates and then discards them, so
the method does not deliver what its name implies; (2) secret-bearing columns (SMTP
credentials, external-system auth config, database connection strings) persisted in
plaintext with no encryption-at-rest; (3) a hardcoded SQL `sa` connection string with a
password literal embedded in `DesignTimeDbContextFactory`; (4) the no-arg
`AddConfigurationDatabase()` overload, which silently registers nothing, making a
misconfigured central node fail late and opaquely; and (5) audit-trail robustness gaps —
`AuditService` can throw on serializing entities with navigation cycles, rolling back
the whole business operation, and the design doc's claim that audit `Id` is `Long/GUID`
disagrees with the `int` entity. Test coverage is good for the repositories that have
tests (Security, CentralUI, audit, concurrency, seed data, data protection) but several
repositories (`TemplateEngineRepository`, `DeploymentManagerRepository`,
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`,
`SiteRepository`, `InstanceLocator`) have little or no direct coverage.
#### Re-review 2026-05-17 (commit `39d737e`)
Re-reviewed the module at commit `39d737e`. All eleven findings from the initial
review (`9c60592`) remain `Resolved` — the secret-column encryption
(`EncryptedStringConverter` + `ApplySecretColumnEncryption`), the fail-fast no-arg
DI overload, the `AsSplitQuery` conversions, the audit cycle-safe serializer, and the
added repository test coverage all verified present and consistent with their
resolutions. Three new findings were recorded. The most material is that the
encryption work done for CD-004 left one bearer credential out of scope:
`ApiKey.KeyValue` — the inbound-API authentication secret — is still persisted in
plaintext (`ConfigurationDatabase-012`); it cannot use the same non-deterministic
Data Protection converter because authentication looks the key up *by value*, so it
needs a hash-based scheme instead. The second is a resilience gap in the encryption
plumbing itself: `ApplySecretColumnEncryption` silently substitutes a throwaway
`EphemeralDataProtectionProvider` whenever no provider is supplied, so any context
constructed via the single-argument constructor on a write path would encrypt
secrets with a key discarded at process exit, yielding permanently undecryptable
ciphertext with no error (`ConfigurationDatabase-013`). The third is a minor
inconsistency — a redundant cast on one of the three `HasConversion` calls
(`ConfigurationDatabase-014`). The module is otherwise healthy and the prior fixes
hold up well.
#### Re-review 2026-05-28 (commit `1eb6e97`)
Re-reviewed the module at commit `1eb6e97`. All fourteen prior findings remain
`Resolved`; their fixes still hold (encryption converter, fail-fast guard,
peppered API-key hash, ephemeral-fallback hardening, etc.). The module has
grown since the last review — new code includes Audit Log (#23) raw-SQL
paths in `AuditLogRepository` (partition-switch purge, recursive
execution-tree CTE, KPI snapshot, partition-boundary discovery), the
`AuditLogPartitionMaintenance` SPLIT-RANGE roll-forward implementation, the
`AuditCorrelationContext` scoped service that stamps `BundleImportId`, the
`SiteCallAuditRepository` monotonic-rank upsert, and the
`NotificationOutboxRepository` per-site KPI surface — and most of the new
findings are concentrated in those raw-SQL paths and in latent gaps left
behind by the CD-012 hash migration.
Ten new findings were recorded. The most material is
`ConfigurationDatabase-015`: a check-then-act race in
`NotificationOutboxRepository.InsertIfNotExistsAsync` with no duplicate-key
catch — unlike the sibling Audit Log / Site Call ingest paths, a concurrent
ack-after-persist on the same `NotificationId` will surface as an
unhandled `DbUpdateException` and break the at-least-once site→central
handoff. `ConfigurationDatabase-016` flags that
`InboundApiRepository.GetApiKeyByValueAsync` hashes the candidate with
`ApiKeyHasher.Default` (unpeppered) while the production create-path uses
the configured peppered hasher — any future caller (or test that exercises
the method) will silently fail to find a real key; the production
`ApiKeyValidator` happens not to call it, but the method is a publicly
exposed `IInboundApiRepository` member and a latent bug.
`ConfigurationDatabase-017` records that the `DeleteDeploymentRecordAsync`
stub-attach delete bypasses the documented optimistic-concurrency rule on
`DeploymentRecord.RowVersion` — the SQLite tests pass because the test
fixture re-maps `RowVersion` as a nullable concurrency token, but in
production this is likely to throw `DbUpdateConcurrencyException`.
`ConfigurationDatabase-018` records the `DateTime`-typed `*Utc` columns on
`AuditEvent` and `SiteCall` re-emerge as `Kind=Unspecified` on read; the
sibling Commons module flagged the same pattern as Commons-019, and
`AuditLogPartitionMaintenance.GetMaxBoundaryAsync` already defends against
it with an explicit `SpecifyKind(Utc)` — but `GetPartitionBoundariesOlderThanAsync`
does not (`ConfigurationDatabase-020`). `ConfigurationDatabase-019` is the
SPLIT-RANGE loop in `AuditLogPartitionMaintenance.EnsureLookaheadAsync`
swallowing every `SqlException` as a Warning and continuing — a genuine
failure (permissions, deadlock, transient) leaves a missing boundary and
the next iteration cheerfully splits the following month, creating a hole.
`ConfigurationDatabase-021` is a low-severity hardening concern around
`SwitchOutPartitionAsync`'s raw-SQL interpolation of `monthBoundaryStr` /
`stagingTableName` (currently safe by construction, but truncates fractional
seconds). `ConfigurationDatabase-022` is the stale "WP-24 stub" XML comment
on `DeploymentManagerRepository`. `ConfigurationDatabase-023` is a
design-doc-adherence drift on `IX_AuditLog_CorrelationId` (design says
`IX_AuditLog_Correlation`). `ConfigurationDatabase-024` is missing test
coverage for the SPLIT-RANGE failure-continuation behaviour and for the
production-shape stub-attach delete with a real rowversion.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ✓ | `GetTemplateWithChildrenAsync` discards loaded children (CD-001); `GetApprovedKeysForMethodAsync` CSV parsing is brittle (CD-008). |
| 2 | Akka.NET conventions | ✓ | No actors in this module; data-access layer only. No issues found. |
| 3 | Concurrency & thread safety | ✓ | DbContext correctly scoped; optimistic concurrency on `DeploymentRecord` correct. Repositories hold no shared mutable state. No issues found. |
| 4 | Error handling & resilience | ✓ | `WaitForDatabaseReadyAsync` is sound. No-arg DI overload fails late and silently (CD-003, resolved); audit JSON serialization failure handling (CD-007, resolved). Re-review: ephemeral Data Protection fallback can silently produce undecryptable ciphertext (CD-013). |
| 5 | Security | ✓ | Hardcoded `sa` credential literal (CD-002, resolved); SMTP/DB-connection/auth secrets unencrypted (CD-004, resolved). Re-review: `ApiKey.KeyValue` bearer credential still stored in plaintext (CD-012). |
| 6 | Performance & resource management | ✓ | `GetAllTemplatesAsync` / `GetTemplateTreeAsync` eager-load multiple collections without `AsSplitQuery` (CD-009, resolved). No N+1 in audited paths. Re-review: no new issues. |
| 7 | Design-document adherence | ✓ | Audit `Id` type mismatch vs design doc (CD-005, resolved); seed data uses `HasData` consistent with design. Re-review: no new issues. |
| 8 | Code organization & conventions | ✓ | Mostly clean. `Grpc*` address columns unbounded (CD-006, resolved); inconsistent null-guard on injected context (CD-011, resolved). Re-review: redundant/inconsistent cast on one `HasConversion` call (CD-014). |
| 9 | Testing coverage | ✓ | Several repositories and `InstanceLocator` lack direct tests (CD-010). |
| 10 | Documentation & comments | ✓ | `DeploymentManagerRepository` "WP-24 stub" XML comment is stale; noted in module context but not raised as a standalone finding. No issues found beyond items above. |
_Re-review (2026-05-28, `1eb6e97`):_
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ✓ | `GetPartitionBoundariesOlderThanAsync` returns `DateTimeKind.Unspecified` (CD-020). `GetApiKeyByValueAsync` hashes with the unpeppered default (CD-016). |
| 2 | Akka.NET conventions | ✓ | No actors in this module. No issues found. |
| 3 | Concurrency & thread safety | ✓ | `NotificationOutboxRepository.InsertIfNotExistsAsync` check-then-act has no duplicate-key catch (CD-015). Stub-attach delete bypasses documented optimistic concurrency on `DeploymentRecord.RowVersion` (CD-017). |
| 4 | Error handling & resilience | ✓ | `AuditLogPartitionMaintenance.EnsureLookaheadAsync` swallows non-idempotent SPLIT failures and continues (CD-019). |
| 5 | Security | ✓ | `SwitchOutPartitionAsync` interpolates a `DateTime` string and a GUID-suffixed identifier into raw SQL — safe by construction but pattern is risky (CD-021). |
| 6 | Performance & resource management | ✓ | No new issues found. |
| 7 | Design-document adherence | ✓ | Index name drift: design says `IX_AuditLog_Correlation`, code uses `IX_AuditLog_CorrelationId` (CD-023). |
| 8 | Code organization & conventions | ✓ | `DateTime *Utc` columns on `AuditEvent` / `SiteCall` carry no `DateTimeKind` enforcement (CD-018). |
| 9 | Testing coverage | ✓ | No tests for SPLIT failure continuation and no production-shape rowversion stub-attach test (CD-024). |
| 10 | Documentation & comments | ✓ | Stale "WP-24 stub" XML comment on `DeploymentManagerRepository` (CD-022). |
## Findings
### ConfigurationDatabase-001 — `GetTemplateWithChildrenAsync` loads child templates then discards them
| | |
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs:30-41` |
**Description**
`GetTemplateWithChildrenAsync` queries for all templates whose `ParentTemplateId`
equals the requested id, assigns the result to the local variable `children`, and
then returns `template` — the `children` list is never used, attached to the returned
object, or otherwise exposed. The method is therefore behaviourally identical to
`GetTemplateByIdAsync` but issues an extra database round-trip. Any caller relying on
the method name to obtain a template with its derived/child templates populated will
silently receive a template with no children, leading to incorrect template-resolution
or UI behaviour with no error.
**Recommendation**
Either populate the children onto the returned aggregate (e.g. project into a result
type that carries the children, or load them into a navigation collection that is
actually returned), or remove the dead query and the misleading method if children are
not in fact needed. If the navigation does not exist on the `Template` entity, add an
explicit result tuple/DTO so the loaded data reaches the caller.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). Root cause confirmed against source: the
method ran a `Where(t => t.ParentTemplateId == id)` query, assigned the result to a
local `children` variable, and never used it — a misleading no-op that also issued an
extra database round-trip per call.
Triage of the three callers (`FlatteningPipeline.BuildTemplateChainAsync`,
`ManagementActor.HandleGetTemplate`, `ManagementActor.HandleValidateTemplate`) showed
none consume derived/sub-templates; they all need the template's *member* collections
(Attributes/Alarms/Scripts/Compositions), which `GetTemplateByIdAsync` already
eager-loads. The `Template` entity has no child-templates navigation collection, and
adding one (plus changing the interface signature) would require editing
`ZB.MOM.WW.ScadaBridge.Commons`, which is outside this module's scope.
Fix applied the recommendation's secondary option: removed the dead query so the
method no longer misleads or wastes a round-trip, and added an XML doc comment
clarifying that "children" means the template's member collections. The method now
honestly delegates to `GetTemplateByIdAsync`. Regression tests added in
`TemplateEngineRepositoryTests.cs`:
`GetTemplateWithChildrenAsync_ReturnsTemplateWithAllMemberCollectionsPopulated`,
`GetTemplateWithChildrenAsync_PreservesParentTemplateId_ForInheritanceChainWalk`, and
`GetTemplateWithChildrenAsync_ReturnsNull_WhenTemplateDoesNotExist` — pinning the
template-aggregate contract the callers depend on.
### ConfigurationDatabase-002 — Hardcoded `sa` connection string with embedded password literal
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/DesignTimeDbContextFactory.cs:21-22` |
**Description**
`DesignTimeDbContextFactory` falls back to a literal connection string
`"Server=localhost,1433;Database=ScadaBridge_Config;User Id=sa;Password=YourPassword;TrustServerCertificate=True"`
when no configured connection string is found. Embedding a credential literal (even a
placeholder) in source code is a poor pattern: it is committed to version control,
encourages copy-paste of `sa`/`TrustServerCertificate=True` into real environments, and
the fallback can mask a genuine misconfiguration during `dotnet ef` operations by
silently pointing tooling at an unintended database.
**Recommendation**
Remove the hardcoded fallback. If no connection string is resolved from configuration
or environment, throw a clear `InvalidOperationException` instructing the developer to
set `ScadaBridge:Database:ConfigurationDb` (or an environment variable). At minimum, read
the design-time connection string from an environment variable rather than a literal,
and never use `sa`.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the factory
fell back to a literal `User Id=sa;Password=YourPassword;...` connection string when no
configured value was found. Removed the hardcoded fallback entirely. The factory now
resolves the connection string from the Host's appsettings files or, when those are not
present, from the `SCADABRIDGE_DESIGNTIME_CONNECTIONSTRING` environment variable, and
throws a clear `InvalidOperationException` (naming both the config key and the env var)
when neither yields a value. Also hardened `SetBasePath` to be applied only when the
`ZB.MOM.WW.ScadaBridge.Host` directory exists, so the factory degrades cleanly instead of throwing
`DirectoryNotFoundException` when run from a context without a sibling Host folder.
Regression tests added in `DesignTimeDbContextFactoryTests.cs`:
`CreateDbContext_NoConnectionStringConfigured_ThrowsClearException`,
`CreateDbContext_ConnectionStringFromEnvironmentVariable_IsUsed`, and
`DesignTimeDbContextFactory_SourceContainsNoHardcodedSaCredential`.
### ConfigurationDatabase-003 — No-arg `AddConfigurationDatabase()` silently registers nothing
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/ServiceCollectionExtensions.cs:44-49` |
**Description**
The parameterless `AddConfigurationDatabase()` overload is a deliberate no-op "retained
for backward compatibility during migration." If a central node is wired up with this
overload by mistake, no `ScadaBridgeDbContext`, repositories, `IAuditService`, or
`IInstanceLocator` are registered. The failure does not surface at startup; it surfaces
much later as opaque DI resolution exceptions the first time any consumer requests a
repository — far from the actual misconfiguration. The XML comment also refers to
"Phase 0 stubs," which is stale relative to the current state of the module.
**Recommendation**
Either delete the no-op overload now that the connection-string overload exists, or
mark it `[Obsolete]` with an error-level message so misuse is a compile-time failure.
If a true "site node" no-op is genuinely required, give it an explicit, self-documenting
name (e.g. `AddConfigurationDatabaseNoOp()`), and remove the stale "Phase 0" wording.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the
parameterless `AddConfigurationDatabase()` overload returned `services` unchanged,
registering no `DbContext`, repositories, `IAuditService`, or `IInstanceLocator`.
Applied the recommendation's first option: the overload is now marked
`[Obsolete(..., error: true)]` so any source reference is a compile-time failure, and
its body throws `InvalidOperationException` with an actionable message as
defence-in-depth (covering reflection-based invocation or suppressed warnings). The
stale "Phase 0 stubs / backward compatibility" XML comment was replaced with one
explaining the obsoletion. The pre-existing
`ServiceRegistrationTests.AddConfigurationDatabase_NoArgs_DoesNotThrow` test in
`UnitTest1.cs`, which encoded the old buggy no-op contract, was updated to
`AddConfigurationDatabase_NoArgs_FailsFast` to assert the corrected behaviour.
New regression tests added in `ServiceCollectionExtensionsTests.cs`:
`AddConfigurationDatabase_NoArgOverload_FailsFastWithClearMessage` and
`AddConfigurationDatabase_NoArgOverload_IsMarkedObsoleteAsError`.
### ConfigurationDatabase-004 — Secret-bearing columns stored in plaintext with no protection
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/NotificationConfiguration.cs:56-57`, `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/ExternalSystemConfiguration.cs:25-26,75-77` |
**Description**
`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`, and
`DatabaseConnectionDefinition.ConnectionString` all hold authentication secrets (SMTP
OAuth2 client secrets / passwords, external-system API keys or Basic Auth credentials,
and database passwords respectively). They are mapped as ordinary string columns and
persisted verbatim. Anyone with read access to the configuration database — including
audit-log JSON if these entities are serialized into `AfterStateJson` — obtains the
plaintext secrets. The design doc does not call out encryption-at-rest for these
fields, so the design is also silent on a real risk.
**Recommendation**
Apply encryption to these fields, e.g. an EF Core value converter backed by ASP.NET
Data Protection (the module already configures `IDataProtectionKeyContext`), or rely on
SQL Server Always Encrypted / column encryption. Separately, ensure `IAuditService`
callers never pass these secret-bearing entities (or that the serializer redacts the
fields) so secrets do not leak into `AuditLogEntry.AfterStateJson`. Update the design
doc to state the chosen at-rest protection.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`, and
`DatabaseConnectionDefinition.ConnectionString` were mapped as ordinary `nvarchar(4000)`
columns and persisted verbatim.
Implemented the recommendation's first option — an in-module EF Core value converter
backed by ASP.NET Data Protection, which the module already uses
(`IDataProtectionKeyContext`, `AddDataProtection().PersistKeysToDbContext`). Added
`EncryptedStringConverter` (purpose-scoped `IDataProtector`; `Protect` on write,
`Unprotect` on read; null-safe; surfaces a clear message on a `CryptographicException`).
`ScadaBridgeDbContext` gained an `(options, IDataProtectionProvider)` constructor and
applies the converter to the three secret columns in `OnModelCreating`; the DI
registration in `ServiceCollectionExtensions` now constructs the context with the
registered provider. The secret columns were widened to `HasMaxLength(8000)` (EF maps
this to `nvarchar(max)` on SQL Server) so ciphertext expansion cannot truncate the
value; migration `20260517010521_EncryptSecretColumns` carries the column-type change.
Regression tests added in `SecretEncryptionTests.cs` verify the raw column value is
never the plaintext secret and that EF transparently decrypts on read, for all three
columns plus a null round-trip.
The encryption scheme itself is fully in-module; the only remaining cross-cutting item
is a documentation gap — the design doc does not yet state encryption-at-rest for these
fields. That doc update is outside this module's editable scope (constraint: edit only
`src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase`, the tests, and this file) and is surfaced here
for a follow-up to `docs/requirements/Component-ConfigurationDatabase.md`. The audit
secret-leak concern is mitigated separately by CD-007's serializer hardening; whether
callers should additionally redact secret-bearing entities before passing them to
`IAuditService` is a caller-side concern in other modules and is also surfaced for
follow-up. The code fix in this module is complete.
### ConfigurationDatabase-005 — Audit `Id` type disagrees with the design doc
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/AuditConfiguration.cs:11` (entity `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Audit/AuditLogEntry.cs`) |
**Description**
The design doc's Audit Entry Schema table specifies `Id` as `Long / GUID`, and notes
the audit table is append-only and retained indefinitely. The actual `AuditLogEntry`
entity uses an `int` identity key. For a never-purged, append-only table that
accumulates one row per save operation across the system lifetime, a 32-bit identity
risks overflow over a long deployment horizon, and the code drifts from the documented
schema.
**Recommendation**
Change `AuditLogEntry.Id` to `long` (and the corresponding migration column to
`bigint`) to match the design doc and remove the overflow risk, or — if `int` is
intentional — update the design doc's schema table to say `int` and justify it.
Resolve the discrepancy in one direction.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: the
`AuditLogEntry` entity declares `int Id`, while the design doc's Audit Entry Schema
table said `Long / GUID`. The entity lives in `ZB.MOM.WW.ScadaBridge.Commons`
(`src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Audit/AuditLogEntry.cs`), which is outside this
module's editable scope, so the discrepancy was resolved by aligning the design doc to
the code — the recommendation's second option. The schema table now records `Id` as
`int (identity)` with an explicit justification: a 32-bit identity matches the key type
of every other entity in the schema (uniform repository/query code), and at a sustained
100 rows/second the `int` range is not exhausted for roughly 680 years, so the
indefinite-retention policy poses no realistic overflow risk; if a future deployment
ever approaches the limit the column can be widened to `bigint` via a migration without
a schema redesign. No regression test is meaningful for a documentation alignment; the
existing `AuditConfiguration` (`HasKey(a => a.Id)`) and the audit repository tests
already exercise the `int` key end to end.
### ConfigurationDatabase-006 — `Site.GrpcNodeAAddress` / `GrpcNodeBAddress` columns are unbounded
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/SiteConfiguration.cs:24-25` |
**Description**
`SiteConfiguration` explicitly sets `HasMaxLength(500)` for `NodeAAddress` and
`NodeBAddress`, but the entity also has `GrpcNodeAAddress` and `GrpcNodeBAddress`
(added per the gRPC streaming design decision) which are not configured at all. With no
length set, EF Core maps them to `nvarchar(max)`. This is inconsistent with the sibling
address columns, wastes the opportunity to constrain input, and `nvarchar(max)` columns
cannot be indexed and have different storage/performance characteristics.
**Recommendation**
Add `builder.Property(s => s.GrpcNodeAAddress).HasMaxLength(500);` and the same for
`GrpcNodeBAddress`, matching the existing `NodeAAddress`/`NodeBAddress` mapping, and
generate a migration to alter the column types.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
`SiteConfiguration` configured `NodeAAddress`/`NodeBAddress` with `HasMaxLength(500)` but
left `GrpcNodeAAddress`/`GrpcNodeBAddress` unconfigured, so EF mapped them to
`nvarchar(max)` — inconsistent with the sibling columns and non-indexable. Applied the
recommendation: added `builder.Property(s => s.GrpcNodeAAddress).HasMaxLength(500)` and
the same for `GrpcNodeBAddress`. Generated migration
`20260517020720_BoundGrpcNodeAddressLength` altering both columns from `nvarchar(max)`
to `nvarchar(500)` (the model snapshot was updated to match). Regression tests added in
`SchemaConfigurationTests.cs`:
`GrpcNodeAddressColumns_AreLengthBoundedTo500` (theory over both columns, asserting the
EF model metadata reports `MaxLength == 500`) and
`GrpcNodeAddressColumns_MatchSiblingNodeAddressBounds` (asserting the gRPC columns share
the bound of the `NodeAAddress`/`NodeBAddress` siblings).
### ConfigurationDatabase-007 — `AuditService` does not handle JSON-serialization failure of arbitrary `afterState`
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Services/AuditService.cs:28-30` |
**Description**
`LogAsync` serializes the caller-supplied `afterState` object with
`JsonSerializer.Serialize(afterState)` using default options. EF entity POCOs commonly
have navigation properties; serializing an entity that has loaded navigations (e.g. a
`Template` with `Attributes`/`Scripts`, or any entity with a cycle) will throw
`JsonException` for a reference cycle or produce a very large payload. Because audit
writes are designed to commit in the same transaction as the change, a serialization
exception thrown here will roll back the *entire* business operation — a template
update fails because its audit entry could not be serialized. This couples audit
robustness to the shape of every entity passed in.
**Recommendation**
Configure `JsonSerializerOptions` with `ReferenceHandler.IgnoreCycles` (or
`Preserve`) and a sensible `MaxDepth`, and consider serializing a projected
DTO/snapshot rather than the live tracked entity. Decide explicitly whether an audit
serialization failure should fail the operation or be logged and degraded gracefully,
and document that decision against the design doc's transactional-guarantee section.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source: `LogAsync`
called `JsonSerializer.Serialize(afterState)` with default options, so any `afterState`
graph containing a reference cycle threw `JsonException` — and because the audit entry
commits in the same transaction as the change it records, that exception rolled back
the entire business operation.
Fix applied per the recommendation: `AuditService` now serializes via a static
`JsonSerializerOptions` configured with `ReferenceHandler.IgnoreCycles` and
`MaxDepth = 32`. The serialization is additionally wrapped in a `SerializeAfterState`
helper that catches a residual `JsonException`/`NotSupportedException` and substitutes a
small diagnostic placeholder JSON (`AuditSerializationError` + `StateType`) — an explicit
decision that an audit-serialization failure must **degrade gracefully** and never roll
back the audited operation. The audit entry is always recorded; the design doc's
transactional-guarantee section ("if the change succeeds, the audit entry is always
recorded") is thereby honoured even for pathological state objects. Regression test
added in `AuditServiceTests.cs`:
`LogAsync_AfterStateWithReferenceCycle_DoesNotThrow_AndDoesNotRollBackOperation`.
### ConfigurationDatabase-008 — `GetApprovedKeysForMethodAsync` CSV parsing silently drops malformed ids
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/InboundApiRepository.cs:46-58` |
**Description**
`ApiMethod.ApprovedApiKeyIds` is stored as a comma-separated string of integer ids.
`GetApprovedKeysForMethodAsync` splits it, maps each token with
`int.TryParse(...) ? id : -1`, then filters with `id > 0`. Any token that fails to
parse, or a legitimately negative/zero id, is silently discarded. If `ApprovedApiKeyIds`
becomes corrupt (e.g. a stray name instead of an id), the method quietly returns fewer
approved keys than expected, which for an API-key authorization path means a method may
unexpectedly reject a key that should be approved. Storing a relational many-to-many as
a CSV string in a column is itself fragile (no FK integrity, no cascade on key delete).
**Recommendation**
Short term: log a warning when a token fails to parse instead of silently dropping it,
so corruption is observable. Longer term: replace the CSV column with a proper join
table (`ApiMethodApprovedKey`) with foreign keys to `ApiMethod` and `ApiKey`, which
gives referential integrity and correct cascade behaviour when an API key is deleted.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
`GetApprovedKeysForMethodAsync` mapped each CSV token with
`int.TryParse(...) ? id : -1` then filtered `id > 0`, so any unparseable (or
non-positive) token was discarded with no signal — a corrupt `ApprovedApiKeyIds` value
silently approves fewer keys than intended, an authorization-relevant outcome.
Applied the recommendation's short-term fix: the parse loop was rewritten to log a
warning for every token that fails to parse to a positive integer, naming the method id
and the offending token, so corruption is observable in logs. Valid ids still resolve
normally. `InboundApiRepository` gained an optional `ILogger<InboundApiRepository>`
constructor parameter (defaulting to `NullLogger`, matching the `MigrationHelper`
pattern) and the project now references `Microsoft.Extensions.Logging.Abstractions`. The
longer-term join-table redesign would change the `ApiMethod` entity / schema and the
`IInboundApiRepository` contract (Commons, out of this module's scope) and is left as a
future schema-design item. Regression tests added in `InboundApiRepositoryTests.cs`:
`GetApprovedKeysForMethod_WithMalformedCsvToken_LogsWarningAndDropsToken`,
`GetApprovedKeysForMethod_WithValidCsv_ReturnsAllKeys`, and
`GetApprovedKeysForMethod_WithNullOrEmptyCsv_ReturnsEmptyWithoutWarning` (using a
capturing `ILogger` to assert the warning is emitted only on malformed input).
### ConfigurationDatabase-009 — Multi-collection eager loads issue cartesian-product queries
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs:43-51,53-61`, `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/CentralUiRepository.cs:45-55` |
**Description**
`GetAllTemplatesAsync`, `GetTemplatesComposingAsync`, and `GetTemplateTreeAsync` each
`Include` three-to-four sibling collections (`Attributes`, `Alarms`, `Scripts`,
`Compositions`) in a single query. EF Core's default single-query strategy produces a
cartesian-product join across those collections, so a template with N attributes, M
alarms, and K scripts yields N×M×K rows that EF must then de-duplicate. For templates
with many members this materially inflates the result set and query time.
`GetInstanceByIdAsync`/`GetAllInstancesAsync` have the same shape with three
collections.
**Recommendation**
Add `.AsSplitQuery()` to these multi-collection-include queries (or set
`UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery)` globally in
`AddConfigurationDatabase`) so each collection is loaded with a separate query and the
cartesian explosion is avoided.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
`GetAllTemplatesAsync` and `GetTemplatesComposingAsync` (`TemplateEngineRepository`) and
`GetTemplateTreeAsync` (`CentralUiRepository`) each `Include` three-to-four sibling
collections in a single query, producing a cartesian-product join. The same shape was
also present in `GetTemplateByIdAsync`, `GetInstanceByIdAsync`, `GetAllInstancesAsync`,
`GetInstancesBySiteIdAsync`, and `GetInstanceByUniqueNameAsync`.
Applied the recommendation's per-query option: `.AsSplitQuery()` was added to every
multi-collection-include query in `TemplateEngineRepository` (eight call sites) and to
`GetTemplateTreeAsync` in `CentralUiRepository`, so each collection loads with its own
query and the cartesian explosion is avoided. Per-query `AsSplitQuery()` was preferred
over a global `UseQuerySplittingBehavior` so single-collection queries elsewhere keep
the cheaper single-query plan. Split queries change query *shape* only, not results;
regression tests added in `SchemaConfigurationTests.cs` pin that behaviour:
`GetAllTemplatesAsync_WithMultipleMembersPerCollection_LoadsAllWithoutDuplication`
(a template with 3 attributes, 2 alarms, 4 scripts must return exactly those counts —
not a 24-row cartesian product) and
`GetTemplateByIdAsync_WithMultipleMembers_LoadsAllCollections`.
### ConfigurationDatabase-010 — Several repositories and `InstanceLocator` lack direct test coverage
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/TemplateEngineRepository.cs`, `Repositories/DeploymentManagerRepository.cs`, `Repositories/ExternalSystemRepository.cs`, `Repositories/InboundApiRepository.cs`, `Repositories/NotificationRepository.cs`, `Repositories/SiteRepository.cs`, `Services/InstanceLocator.cs` |
**Description**
The test project covers `SecurityRepository`, `CentralUiRepository`, `AuditService`,
optimistic concurrency, seed data, and Data Protection persistence. There are no direct
tests for `TemplateEngineRepository` (the largest repository, and the one with the
CD-001 bug, which a test would have caught), `DeploymentManagerRepository` (including
its `Local`-then-stub delete fallback and the `DeleteInstanceAsync`
restrict-FK-cleanup logic), `ExternalSystemRepository`, `InboundApiRepository` (notably
`GetApprovedKeysForMethodAsync` CSV parsing — CD-008), `NotificationRepository`,
`SiteRepository` (including its stub-attach delete path), or `InstanceLocator`.
**Recommendation**
Add repository-level tests using the existing `SqliteTestHelper` pattern, covering at
minimum: CRUD round-trips, the stub-attach delete fallbacks in
`DeploymentManagerRepository`/`SiteRepository`, `DeleteInstanceAsync`'s explicit
deployment-record cleanup, `GetApprovedKeysForMethodAsync` with valid/malformed CSV,
and `InstanceLocator.GetSiteIdForInstanceAsync` for found/not-found cases.
**Resolution**
Resolved 2026-05-16 (commit pending). Direct repository/service tests were added using
the existing `SqliteTestHelper` pattern. `InboundApiRepositoryTests.cs` covers
`InboundApiRepository` (API-key/method CRUD round-trips and the
`GetApprovedKeysForMethodAsync` valid/malformed/empty-CSV cases — see CD-008).
`RepositoryCoverageTests.cs` adds `ExternalSystemRepositoryTests` (definition/method CRUD,
parent-filtered method query, database-connection delete), `NotificationRepositoryTests`
(notification-list-with-recipients and SMTP-configuration round-trips, list delete),
`SiteRepositoryTests` (site/identifier round-trip plus the stub-attach delete fallback
exercised for both `DeleteSiteAsync` and `DeleteDataConnectionAsync` by clearing the
ChangeTracker, and the site-filtered instance query), `DeploymentManagerRepositoryTests`
(deployment-record CRUD and `GetCurrentDeploymentStatusAsync` ordering, the stub-attach
`DeleteDeploymentRecordAsync` fallback, and `DeleteInstanceAsync`'s explicit
Restrict-FK deployment-record cleanup), and `InstanceLocatorTests`
(`GetSiteIdForInstanceAsync` for the found and not-found cases). `TemplateEngineRepository`
gained the CD-001 and CD-009 regression tests
(`TemplateEngineRepositoryTests.cs`, `SchemaConfigurationTests.cs`). A constructor
null-guard test was added for each of the five repositories/services covered, doubling
as the CD-011 regression guard. The full module suite is green.
### ConfigurationDatabase-011 — Inconsistent constructor null-guarding across repositories/services
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/ExternalSystemRepository.cs:11-14`, `Repositories/InboundApiRepository.cs:11-14`, `Repositories/NotificationRepository.cs:11-14`, `Services/InstanceLocator.cs:13-16` |
**Description**
`SecurityRepository`, `CentralUiRepository`, `TemplateEngineRepository`,
`DeploymentManagerRepository`, `SiteRepository`, and `AuditService` all guard their
injected `ScadaBridgeDbContext` with `?? throw new ArgumentNullException(...)`.
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`, and
`InstanceLocator` assign the constructor argument directly with no guard. This is a
minor consistency/maintainability issue: although the DI container will not normally
supply null, the divergence makes the codebase look unfinished and means a future
hand-constructed instance fails with a less informative `NullReferenceException` later.
**Recommendation**
Apply the same `?? throw new ArgumentNullException(nameof(context))` guard in the four
inconsistent constructors so all data-access types behave uniformly.
**Resolution**
Resolved 2026-05-16 (commit pending). Root cause confirmed against source:
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`, and
`InstanceLocator` assigned the injected `ScadaBridgeDbContext` directly with no null
guard, diverging from `SecurityRepository`/`CentralUiRepository`/`TemplateEngineRepository`/
`DeploymentManagerRepository`/`SiteRepository`/`AuditService`. Applied the recommendation:
all four constructors now use `context ?? throw new ArgumentNullException(nameof(context))`
(`InboundApiRepository`'s guard was added as part of its CD-008 constructor change), so
every data-access type behaves uniformly and a hand-constructed instance fails with an
informative exception at construction rather than a later `NullReferenceException`.
Regression: `Constructor_NullContext_Throws` tests were added for all four affected types
(`InboundApiRepositoryTests.cs`, `RepositoryCoverageTests.cs`).
### ConfigurationDatabase-012 — Inbound-API `ApiKey.KeyValue` bearer credential stored in plaintext
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/InboundApiConfiguration.cs:17-19` |
**Description**
`ApiKey.KeyValue` is the bearer credential presented in the `X-API-Key` header to
authenticate Inbound API requests (HighLevelReqs §7.27.3). It is mapped as an
ordinary `nvarchar(500)` column with a unique index and persisted verbatim. Anyone
with read access to the configuration database — or to any `AuditLogEntry.AfterStateJson`
into which an `ApiKey` entity is serialized — obtains live API credentials in cleartext.
`ConfigurationDatabase-004` introduced encryption-at-rest for the other secret-bearing
columns (SMTP credentials, external-system auth config, database connection strings)
but explicitly scoped `ApiKey.KeyValue` out. The omission is genuine: the
`EncryptedStringConverter` built for CD-004 is backed by ASP.NET Data Protection, which
is **non-deterministic** — the same plaintext encrypts to different ciphertext each
time — so it cannot be applied here, because `GetApprovedKeysForMethodAsync` and the
authentication path resolve a key by its value (`GetApiKeyByValueAsync` does
`FirstOrDefaultAsync(k => k.KeyValue == keyValue)`). A non-deterministic converter would
break that equality lookup. The result is that the one credential most exposed to
external callers is the one credential left unprotected.
**Recommendation**
Store a salted cryptographic hash of the key value instead of the plaintext (or
ciphertext): hash on create, and authenticate by hashing the presented key and
comparing. This keeps the equality lookup working (the hash is deterministic) while
ensuring the database never holds a usable credential. The plaintext key would then
only ever be shown once, at creation time, to the Admin who created it. This requires
a coordinated change with the Inbound API / Security components and the `ApiKey`
entity in Commons; record the chosen scheme in
`docs/requirements/Component-ConfigurationDatabase.md` and the Inbound API design doc.
At minimum, ensure `ApiKey` entities are never passed to `IAuditService` without the
key value redacted.
**Resolution**
Resolved 2026-05-17 (commit `<pending>`) — approved cross-module change spanning
Commons + ConfigurationDatabase + InboundAPI + ManagementService. The inbound-API
bearer credential is now persisted as a deterministic HMAC-SHA256 keyed hash, never
as plaintext, so a configuration-database dump no longer yields usable API keys
while the constant-time by-value lookup still works.
Root cause confirmed against source: `ApiKey.KeyValue` was a plaintext
`nvarchar(500)` column with a unique index, holding the live `X-API-Key` credential.
Design implemented — **deterministic keyed hash** (the recommendation's first option):
- **Hash scheme.** HMAC-SHA256 over the UTF-8 key bytes, keyed with a server-side
*pepper*; the digest is stored Base64-encoded. No per-row random salt — an API key
is already a high-entropy random token, and a random salt would break the
deterministic by-value lookup the authentication path relies on. The pepper instead
binds every hash to the deployment. Implemented as `IApiKeyHasher` / `ApiKeyHasher`
in `ZB.MOM.WW.ScadaBridge.Commons` (`Types/InboundApi/ApiKeyHasher.cs`); the constructor
rejects a missing or weak (`< 16`-char) pepper with `ArgumentException` — fail-fast.
- **Where the pepper lives.** `InboundApiOptions.ApiKeyPepper`, a component-owned
Options class already bound from the `ScadaBridge:InboundApi` configuration section
(Options pattern); it is never hard-coded. `AddInboundAPI` registers `IApiKeyHasher`
via a factory that reads the bound options, so a missing/weak pepper fails the
deployment fast rather than degrading silently. (Operators must supply the pepper
via configuration / a secret store; `appsettings*.json` is intentionally left
without a value.)
- **Commons.** `ApiKey` drops the persisted `KeyValue` property and gains a persisted
`KeyHash` — the plaintext is never a field on the entity. `ApiKey.FromHash(name,
keyHash)` creates an entity from a precomputed hash (the production create-path);
the existing `ApiKey(name, keyValue)` constructor is retained as a convenience that
hashes its input with the unpeppered default hasher.
- **ConfigurationDatabase.** EF mapping maps `KeyHash` (`nvarchar(256)`, required,
unique index `IX_ApiKeys_KeyHash`); migration `20260517073000_HashApiKeyValue`
drops `KeyValue`/`IX_ApiKeys_KeyValue`, adds `KeyHash`, and deletes existing
`ApiKeys` rows (a one-way hash cannot recover plaintext). `GetApiKeyByValueAsync`
now hashes the supplied value and looks up by `KeyHash`.
- **InboundAPI.** `ApiKeyValidator` takes an `IApiKeyHasher`; `FindKeyConstantTime`
hashes the presented candidate and compares hashes via
`CryptographicOperations.FixedTimeEquals` — the comparison stays constant-time.
- **ManagementService.** `HandleCreateApiKey` generates a random 32-byte key, stores
only its peppered hash, and returns the plaintext to the caller exactly once;
`HandleListApiKeys` no longer exposes the hash (returns id/name/enabled only).
TDD regression tests (written first, confirmed failing, then passing):
`ApiKeyHasherTests` + `ApiKeyTests` (Commons — determinism, pepper fail-fast, no
plaintext property), `ApiKeyHashValidationTests` (InboundAPI — auth succeeds via
hash, wrong key / wrong-pepper fail), `ApiKeyCreationTests` (ManagementService — only
the hash is persisted, plaintext returned once), and `SchemaConfigurationTests`
additions (ConfigurationDatabase — `KeyHash` mapped/uniquely-indexed, no `KeyValue`
column). `dotnet build` of all five touched modules is clean and their test suites
are green.
**Operational consequence: every existing API key must be re-issued.** A hash is
one-way, so the `HashApiKeyValue` migration deletes existing key rows; after applying
it, operators must create new keys (CLI / Management API / Central UI), distribute
the one-time plaintext to callers, and re-approve the new keys on their methods.
Follow-up (outside this task's editable scope): record the hash scheme in
`docs/requirements/Component-ConfigurationDatabase.md` and the Inbound API design
doc, and update the Central UI API-keys page, which previously displayed a masked
`KeyValue`.
### ConfigurationDatabase-013 — Secret-column encryption silently falls back to an ephemeral (throwaway) key
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/ScadaBridgeDbContext.cs:107-124` |
**Description**
`ApplySecretColumnEncryption` resolves the Data Protection provider as
`_dataProtectionProvider ?? new EphemeralDataProtectionProvider()`. The `??` fallback
is reached whenever the context is constructed via the single-argument
`ScadaBridgeDbContext(DbContextOptions)` constructor — i.e. whenever no provider was
injected. An `EphemeralDataProtectionProvider` generates a key ring that lives only in
process memory and is discarded at process exit.
For design-time `dotnet ef` tooling this is harmless (the XML remark correctly notes
it only emits schema). The risk is on a *runtime write path*. The runtime currently
gets the provider-bearing context only because `AddConfigurationDatabase` adds an
`AddScoped` factory registration that overrides EF's activator-based registration.
That override is the single thing standing between correct behaviour and silent data
corruption: any future change that resolves a `ScadaBridgeDbContext` through a path the
override does not cover — an `AddPooledDbContextFactory`/`IDbContextFactory<ScadaBridgeDbContext>`
registration, a second `AddDbContext` call, a hand-constructed context in server code —
would construct the context with the single-arg constructor, encrypt secret columns
with a throwaway key, and persist ciphertext that becomes **permanently undecryptable
the moment the process restarts**. There is no exception, no warning; the failure only
surfaces later as `CryptographicException` on read (mis-attributed by
`EncryptedStringConverter` to "the stored value was not written by this system").
**Recommendation**
Do not silently substitute an ephemeral provider for write-capable contexts. Either:
(a) require the provider unconditionally and have design-time tooling pass an explicit
ephemeral provider so the intent is visible at the call site; or (b) keep the
single-arg constructor but mark contexts built without a real provider as
schema-only — e.g. record a flag and have the encrypting converter throw a clear
`InvalidOperationException` ("secret columns cannot be written without a configured
Data Protection key ring") on the first `Protect`, instead of producing throwaway
ciphertext. Also harden the DI wiring so a `ScadaBridgeDbContext` cannot be resolved
through the EF-activator registration at all (e.g. register only the factory, or use
`AddDbContextFactory` with the explicit constructor).
**Resolution**
Resolved 2026-05-17 (commit pending). Root cause confirmed against source:
`ApplySecretColumnEncryption` resolved the provider as
`_dataProtectionProvider ?? new EphemeralDataProtectionProvider()`, so a context built
via the single-argument constructor would silently encrypt secret columns with a
throwaway key and persist permanently undecryptable ciphertext with no error.
Applied the recommendation — a combination of options (a) and (b). The silent
ephemeral substitution was removed: a no-provider context now builds its model with an
internal `SchemaOnlyDataProtector` that satisfies schema generation but throws a clear
`InvalidOperationException` if it is ever asked to `Protect`/`Unprotect`. On top of
that, `SaveChanges`/`SaveChangesAsync` were overridden with a `GuardSecretWritesHaveAKeyRing`
check that fails fast — before any database round-trip — with a clear
`InvalidOperationException` (naming the offending `Entity.Property`) whenever a
no-provider context has a pending non-null write to one of the three secret-bearing
columns. A custom `SecretAwareModelCacheKeyFactory` (registered via
`OnConfiguring`/`ReplaceService`) folds the presence of a real provider into EF's model
cache key, so a write-capable and a schema-only context can no longer share one cached
model. Design-time `dotnet ef` tooling (which only emits schema) and the test fixture
both keep working — `SqliteTestDbContext` now passes an explicit
`EphemeralDataProtectionProvider`, making the test intent visible at the call site as
recommendation (a) asks. Regression tests added in `EphemeralEncryptionFallbackTests.cs`:
`SingleArgConstructor_WritingSecretColumn_FailsFast_DoesNotPersistThrowawayCiphertext`,
`SingleArgConstructor_WritingNonSecretColumn_Succeeds`, and
`ProviderConstructor_WritingSecretColumn_StillSucceeds`. The DI-hardening sub-point
(register only the factory) was not adopted — `AddConfigurationDatabase` already
overrides the EF-activator registration with a provider-bearing factory, and the
fail-fast guard now closes the residual gap for any other resolution path.
### ConfigurationDatabase-014 — Redundant, inconsistent cast on one `HasConversion` call
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/ScadaBridgeDbContext.cs:121-123` |
**Description**
`ApplySecretColumnEncryption` calls `HasConversion(converter)` three times. The first
two (`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`)
pass `converter` directly; the third (`DatabaseConnectionDefinition.ConnectionString`)
casts it to `(Microsoft.EntityFrameworkCore.Storage.ValueConversion.ValueConverter)`.
`EncryptedStringConverter` already derives from `ValueConverter<string?, string?>`
(itself a `ValueConverter`), and the first two call sites compile fine without the
cast, so the cast is redundant. The inconsistency makes the code look as though the
third call needs special handling when it does not, and the fully-qualified type name
inline adds noise.
**Recommendation**
Remove the cast so all three calls read identically as `HasConversion(converter)`.
If a `ValueConverter`-typed reference is genuinely wanted, give it a local variable of
that type once and use it for all three.
**Resolution**
Resolved 2026-05-17 (commit pending). Partially re-triaged: the finding is correct
that the inline fully-qualified cast was inconsistent and noisy, but its premise that
the cast is *purely redundant* is wrong. `DatabaseConnectionDefinition.ConnectionString`
is a non-nullable `string`; `EncryptedStringConverter` is `ValueConverter<string?,
string?>`, so calling `HasConversion(converter)` with `converter` typed as
`EncryptedStringConverter` on that property raises a real `CS8620` nullability
error (verified by build). The cast to the non-generic `ValueConverter` base was
suppressing that — load-bearing, not noise. Applied the recommendation's second
option: the converter is now held once in a single `ValueConverter`-typed local
(with an explanatory comment), and all three `HasConversion(converter)` calls read
identically with no inline cast or fully-qualified name. Behaviour is unchanged, so
no behavioural regression test is meaningful (cf. CD-005); a forward guard was added
in `SchemaConfigurationTests.cs`
`SecretColumns_AllHaveEncryptedStringConverterApplied` (theory over all three secret
columns) — asserting each column keeps an `EncryptedStringConverter`.
### ConfigurationDatabase-015 — `NotificationOutboxRepository.InsertIfNotExistsAsync` is a check-then-act race with no duplicate-key catch
| | |
|--|--|
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45` |
**Resolution** — rewrote `InsertIfNotExistsAsync` as a single raw-SQL
`IF NOT EXISTS (...) INSERT` matching the
`AuditLogRepository.InsertIfNotExistsAsync` and
`SiteCallAuditRepository.UpsertAsync` patterns, with a `SqlException`
catch on numbers 2601 (unique-index violation) and 2627
(primary-key/unique-constraint violation) returning `false`. Concurrent
losers are logged at Debug and treated as no-ops, eliminating the
site-retry livelock. Two SQLite-targeted assertions in
`RepositoryCoverageTests` were migrated to a new MS SQL-fixture file
`tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/Repositories/NotificationOutboxRepositoryIntegrationTests.cs`,
which also adds a 50-way parallel race test verifying exactly one row
lands and no exception bubbles.
**Description**
`InsertIfNotExistsAsync` does `AnyAsync(x => x.NotificationId == n.NotificationId)`,
then — if false — `AddAsync` + `SaveChangesAsync`. There is a check-then-act window
between the two operations: two sessions can both pass the `AnyAsync` check and both
attempt the INSERT, and the loser surfaces as a uniqueness violation on the
`NotificationId` primary key wrapped in a `DbUpdateException` / `SqlException` (error
2627). The site→central handoff for notifications is documented as **at-least-once
with ack-after-persist plus insert-if-not-exists**; collisions on the same
`NotificationId` are therefore not a "should never happen" but the *expected* contention
mode. As written, the second concurrent ack throws, fails the site→central
acknowledgement, and the site retries the same row again on its next forward — a
livelock if the contending pair keeps racing.
The sibling raw-SQL `IF NOT EXISTS … INSERT` paths in `AuditLogRepository.InsertIfNotExistsAsync`
(see SqlErrorUniqueIndexViolation / SqlErrorPrimaryKeyViolation handling at
`AuditLogRepository.cs:74-89`) and `SiteCallAuditRepository.UpsertAsync`
(`SiteCallAuditRepository.cs:87-96`) explicitly catch errors 2601/2627 and treat the
loser as a no-op — exactly the right pattern for "first-write-wins idempotent ingest".
This repository alone does not.
**Recommendation**
Either (a) rewrite the body as a single raw-SQL `IF NOT EXISTS … INSERT` and apply the
same 2601/2627 catch-and-log-Debug pattern the AuditLog and SiteCall repositories use,
or (b) wrap the existing flow in a try/catch around `SaveChangesAsync` that inspects
the inner `SqlException.Number` and returns `false` (i.e. "another writer won the race")
on 2601/2627. Option (a) is preferable because it collapses the two round-trips to one
and matches the established idempotent-ingest pattern used elsewhere in the module.
Add a regression test that simulates two concurrent `InsertIfNotExistsAsync` calls
(using two open contexts) for the same `NotificationId` and asserts neither call
throws and exactly one row lands.
### ConfigurationDatabase-016 — `InboundApiRepository.GetApiKeyByValueAsync` hashes the candidate with the unpeppered `ApiKeyHasher.Default`
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/InboundApiRepository.cs:35-39` |
**Resolution (2026-05-28):** Took option (a) — `InboundApiRepository` ctor now
accepts `Func<IApiKeyHasher>? hasherAccessor = null` (deferred resolution to
sidestep startup-time pepper-validation in test composition roots that don't
configure one). `GetApiKeyByValueAsync` calls `_hasherAccessor()` so the
hash matches the production peppered digest. `AddConfigurationDatabase`
registers the repository with a factory wiring
`() => sp.GetService<IApiKeyHasher>() ?? ApiKeyHasher.Default` — the
peppered hasher is used when available, falling back to Default only when
none is registered (legacy / test composition). Regression test
`CD016_GetApiKeyByValue_UsesInjectedPepperedHasher_NotDefault` asserts
positively (peppered roundtrip) and negatively (Default hasher misses the
key entirely, proving the lookup uses the injected hasher).
**Description**
`GetApiKeyByValueAsync` resolves an API key by its presented plaintext value by hashing
the candidate and looking up `KeyHash`. The hash, however, is computed with the static
`ApiKeyHasher.Default` (the fixed, deployment-independent unpeppered hasher used for
tests). Production key creation uses the DI-registered, *peppered* `IApiKeyHasher`
constructed from `InboundApiOptions.ApiKeyPepper` (see CD-012 resolution and
`ApiKeyHasher.ctor(string pepper)`), so the stored `KeyHash` of any real key was
produced under the deployment pepper. Hashing the candidate with the unpeppered
`Default` yields a different digest, and the `WHERE KeyHash = @hash` lookup will never
match a real key.
The production `ApiKeyValidator` (InboundAPI module) deliberately does NOT call this
method — it fetches all keys and runs a constant-time comparison via the
DI-registered hasher (`ApiKeyValidator.cs:53-64`) — so the immediate
authentication path is unaffected. But `GetApiKeyByValueAsync` remains a publicly
exposed `IInboundApiRepository` member; any new caller (a future admin tool, a CLI
command, a test) that uses it under a peppered configuration will silently get a
`null` result for an existing, valid key, and almost certainly mis-route the failure
as "key not found".
**Recommendation**
Either (a) take `IApiKeyHasher` via constructor injection — alongside the existing
`ScadaBridgeDbContext` and optional `ILogger` — and use it here so the repository
participates in the same peppered scheme as the rest of the system; or (b) delete
the method from both the implementation and `IInboundApiRepository` (Commons) on the
grounds that the production authentication path correctly avoids it for timing
reasons and there is no remaining valid caller. Add a regression test that constructs
the repository under a real `ApiKeyHasher("a-strong-pepper-value")`, inserts an
`ApiKey.FromHash(...)` using the same hasher, and asserts `GetApiKeyByValueAsync`
returns the row — under option (a) it should pass; under option (b) the method no
longer exists.
### ConfigurationDatabase-017 — Stub-attach delete on `DeploymentRecord` bypasses optimistic concurrency
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/DeploymentManagerRepository.cs:83-97` |
**Resolution (2026-05-28):**
`IDeploymentManagerRepository.DeleteDeploymentRecordAsync` now requires a `byte[] expectedRowVersion`
argument. The repository seeds `entry.OriginalValues["RowVersion"]` on both the Local-tracked and
stub-attach branches so EF emits `DELETE ... WHERE Id = @id AND RowVersion = @prior` and surfaces a
concurrent edit as `DbUpdateConcurrencyException`. A new SQLite regression test
`DeleteDeploymentRecord_StaleRowVersion_ThrowsConcurrencyException` in `ConcurrencyTests`
(backed by a dedicated `RowVersionConcurrencyTestDbContext` that keeps `RowVersion` as a
caller-managed concurrency token) asserts the exception fires on a stale token; the existing
`DeleteDeploymentRecord_ViaStubAttachPath_RemovesEntity` was updated to pass the fresh RowVersion.
**Description**
`DeploymentRecord` carries a SQL Server `rowversion` concurrency token (declared
in `DeploymentConfiguration` and confirmed by `ConcurrencyTests`), per the design
doc's "Optimistic concurrency is used on deployment status records". When
`DeleteDeploymentRecordAsync` falls into its stub-attach branch (no tracked entity
in `_dbContext.DeploymentRecords.Local` for the given id), it constructs
`new DeploymentRecord("stub", "stub") { Id = id }`, `Attach`es it, and `Remove`s it.
The stub's `RowVersion` is left at its default `null` (or `byte[0]`).
EF Core's SQL Server provider generates the delete as
`DELETE FROM DeploymentRecords WHERE Id = @id AND RowVersion = @stubRowVersion` — and
the stub rowversion is not the row's real rowversion, so on a real SQL Server (with
`IsRowVersion()` auto-populating the column) the WHERE never matches and `SaveChanges`
throws `DbUpdateConcurrencyException`. The path is exercised by
`RepositoryCoverageTests.DeleteDeploymentRecord_ViaStubAttachPath_RemovesEntity`
but the test fixture remaps `RowVersion` as a nullable `IsConcurrencyToken()` column
without auto-population (`SqliteTestHelper.ConfigureForTests`), so the stored
RowVersion is null AND the stub's RowVersion is null AND the SQLite delete matches.
Production-shape behaviour is the opposite.
The same stub-attach pattern is used on `SystemArtifactDeploymentRecord`,
`Site`, and `DataConnection`. Those entities have no rowversion token, so the
production behaviour is correct for them — the issue is specific to
`DeploymentRecord`.
**Recommendation**
Replace the stub-attach branch in `DeleteDeploymentRecordAsync` with a real lookup —
`await _dbContext.DeploymentRecords.FindAsync([id], ct)` then `Remove` if non-null —
mirroring `DeleteInstanceAttributeOverrideAsync` and `DeleteDeployedSnapshotAsync`.
This loses the "delete by id without a read" micro-optimisation (a real concern only
in batched-delete loops) but restores the documented concurrency contract. If the
optimisation is genuinely required, attach a `DeploymentRecord` with the *caller's*
known RowVersion (the caller had to fetch the row at some point) and accept the
`DbUpdateConcurrencyException` as the correct concurrency signal. Add a regression
test under MS SQL (extend `RepositoryCoverageTests` with a SQL-Server-flavoured
fixture, or use `MsSqlMigrationFixture`) that asserts the stub-attach delete works
when the real RowVersion is supplied.
### ConfigurationDatabase-018 — `DateTime`-typed `*Utc` columns on `AuditEvent` / `SiteCall` carry no `DateTimeKind` enforcement
| | |
|--|--|
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/AuditLogEntityTypeConfiguration.cs`, `Configurations/SiteCallEntityTypeConfiguration.cs` (mappings for `OccurredAtUtc`, `IngestedAtUtc`, `CreatedAtUtc`, `UpdatedAtUtc`, `TerminalAtUtc`) |
**Resolution (2026-05-28):** Added two private static `ValueConverter<DateTime, DateTime>` / `ValueConverter<DateTime?, DateTime?>` UTC-enforcing converters to `AuditLogEntityTypeConfiguration` and applied them to `AuditEvent.OccurredAtUtc` and `AuditEvent.IngestedAtUtc` via `HasConversion(...)`. The converter re-tags `DateTimeKind.Utc` on hydrate (where SQL Server's `datetime2` provider strips the Kind flag) and on write (so a producer-supplied `Kind=Unspecified` literal still lands as UTC in the model cache). Coordinates with the sibling `Commons-019` resolution (init-setter on `AuditEvent` re-tags Kind=Utc at construction). Regression test in `tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/Configurations/AuditLogEntityTypeConfigurationTests.cs::Configure_UtcConverter_HydratesOccurredAtUtcAsKindUtc` inserts an Unspecified-Kind value, re-reads through a cleared change-tracker, and asserts `Kind == Utc` on both columns. The `SiteCall` mapping is out of scope for this close (sibling component task).
**Description**
`AuditEvent.OccurredAtUtc` / `IngestedAtUtc` and `SiteCall.CreatedAtUtc` /
`UpdatedAtUtc` / `TerminalAtUtc` / `IngestedAtUtc` are declared as `DateTime` (not
`DateTimeOffset`) per the Audit Log #23 spec, with a UTC suffix convention. SQL Server's
`datetime2` provider strips the `Kind` flag on the wire — values inserted with
`DateTimeKind.Utc` round-trip as `DateTimeKind.Unspecified` on read. The EF mappings
add no `HasConversion(...)` to normalise the kind. The sibling Commons module just
flagged the same pattern as `Commons-019`; in this module the consequence is concrete:
- `AuditLogPartitionMaintenance.GetMaxBoundaryAsync` already defends with an explicit
`DateTime.SpecifyKind(dt, DateTimeKind.Utc)` (see `AuditLogPartitionMaintenance.cs:103-104`).
That defence is necessary precisely because the EF mapping does not enforce it.
- `AuditLogRepository.GetPartitionBoundariesOlderThanAsync` does NOT defend — it
returns `reader.GetDateTime(0)` directly with `Kind=Unspecified` (separate finding
CD-020).
- Downstream comparisons like `DateTime.UtcNow` (Kind=Utc) against a re-read
`OccurredAtUtc` (Kind=Unspecified) do not produce a runtime error, but any code
path that converts via `.ToLocalTime()` or `.ToUniversalTime()` will silently
interpret an unspecified-kind value as local time and produce wrong results.
**Recommendation**
Apply a value converter on every `DateTime`-typed `*Utc` column that re-tags the
`Kind` to `Utc` on read (and asserts/`SpecifyKind` on write to defend against an
accidental local-kind write). EF Core's built-in
`UtcValueConverter`-style pattern is a single line per column:
```csharp
builder.Property(e => e.OccurredAtUtc)
.HasConversion(
v => v,
v => DateTime.SpecifyKind(v, DateTimeKind.Utc));
```
Apply uniformly to `AuditEvent` (OccurredAtUtc, IngestedAtUtc), `SiteCall`
(CreatedAtUtc, UpdatedAtUtc, TerminalAtUtc, IngestedAtUtc), and any other
`DateTime *Utc` columns added later. Add a regression test that inserts a UTC row,
re-reads it in a fresh context, and asserts `Kind == DateTimeKind.Utc`. Coordinate
with the sibling `Commons-019` finding so the resolution is consistent across both
modules.
### ConfigurationDatabase-019 — `EnsureLookaheadAsync` swallows non-idempotent SPLIT failures and continues, creating partition holes
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Maintenance/AuditLogPartitionMaintenance.cs:181-199` |
**Resolution (2026-05-28):** Took option (a) — dropped the `try/catch (SqlException)`
around the per-month SPLIT loop entirely (and the now-unused
`using Microsoft.Data.SqlClient`). By class-doc construction the catch could never
fire for "boundary already exists" (the loop pre-reads max-boundary and only issues
SPLITs for strictly-greater months), so its only effect was to mask real failures
(permission revoked, deadlock victim, log full, filegroup full) and let the next
iteration split the following month — leaving a permanent partition hole. Now any
`SqlException` propagates out of `EnsureLookaheadAsync`, surfaces to the central
daily-tick hosted service as an Error, and the next tick retries from the same
boundary (at-least-once, no holes). Replaced the catch block with an inline comment
explaining the rationale so a future maintainer doesn't reintroduce it.
**Description**
`EnsureLookaheadAsync` loops one month at a time from `next` up to `horizon` and
issues `ALTER PARTITION SCHEME … NEXT USED` + `ALTER PARTITION FUNCTION … SPLIT RANGE`
per month. The class doc says idempotency is guaranteed by reading the max-boundary
first and only issuing SPLITs for strictly-greater months — so "boundary already
exists" (SQL Server msg 7708/7711) cannot occur by construction. Yet the loop wraps
each iteration in `catch (SqlException ex) { _logger.LogWarning(...); }` and
continues, with the rationale "the desired end state (boundary present) is satisfied
by either path."
That rationale is correct only for an "already-exists" error — which the pre-check
makes impossible. Any *other* `SqlException` — a permissions failure (the
`scadabridge_audit_purger` role's `ALTER ON SCHEMA::dbo` revoked or not granted), a
deadlock victim, a transient connection drop, a transaction log full, an underlying
filegroup full — leaves the boundary genuinely **not** created, logs a Warning
(quiet by default in most appenders), and the next iteration tries to SPLIT the
following month. That split *can* succeed (it is a different range value), creating
a permanent **hole** in the partition layout: month N never had a partition created,
month N+1 does, so any future row in month N lands in the partition that previously
spanned both months and partition-switch purge for month N becomes impossible.
The class is the central singleton's daily-tick partition roll-forward, so the hole
persists until an operator notices it and rebuilds manually — by which point months
of audit retention may be locked behind the unsplit range.
**Recommendation**
Either (a) drop the `try/catch` entirely so any SPLIT failure aborts the loop and
surfaces to the hosted service (the next tick retries — at-least-once with no holes),
or (b) keep the catch but narrow it to ONLY the
"boundary-already-exists" errors (SQL Server msg 7708 and 7711) and log at Debug,
mirroring how `AuditLogRepository.InsertIfNotExistsAsync` narrowly catches 2601/2627.
Option (a) is preferable: by class-doc construction the catch should never fire, so
its only effect is to mask the real-failure case. Add tests that simulate a SPLIT
failure (e.g. a permission denial via a constrained test login) and assert the loop
aborts after the first failure with no further SPLITs.
### ConfigurationDatabase-020 — `GetPartitionBoundariesOlderThanAsync` returns `DateTime` with `Kind=Unspecified`
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/AuditLogRepository.cs:378-387` |
**Resolution (2026-05-28):** Wrapped the `reader.GetDateTime(0)` read with `DateTime.SpecifyKind(..., DateTimeKind.Utc)` so each returned boundary now carries `Kind=Utc`, matching the explicit defensive pattern already in `AuditLogPartitionMaintenance.GetMaxBoundaryAsync`. Added an inline comment explaining the rationale (SQL Server `datetime2` strips Kind through ADO.NET; boundary values are stored UTC). With sibling CD-018 also closed, the EF read path now enforces UTC at the column level — the raw-ADO defence here is belt-and-braces for this method, which bypasses EF entirely.
**Description**
`GetPartitionBoundariesOlderThanAsync` reads `reader.GetDateTime(0)` and adds the
raw value to the returned list. SQL Server's `datetime2` materialises as
`DateTimeKind.Unspecified` on the ADO.NET side (see CD-019), so every returned
boundary has `Kind=Unspecified`. The sibling `AuditLogPartitionMaintenance.GetMaxBoundaryAsync`
(`AuditLogPartitionMaintenance.cs:103-104`) explicitly defends against this exact
issue by calling `DateTime.SpecifyKind(dt, DateTimeKind.Utc)` — exactly because EF /
ADO.NET strips the kind — but the repository method does not. Callers (the
`AuditLogPurgeActor`) that compare a returned boundary to `DateTime.UtcNow` get a
silently wrong comparison if they ever serialise to/from a string with a local-kind
assumption in between.
**Recommendation**
Wrap the read with `DateTime.SpecifyKind(reader.GetDateTime(0), DateTimeKind.Utc)`,
matching the explicit defensive pattern already in
`AuditLogPartitionMaintenance.GetMaxBoundaryAsync`. Better still: fix CD-019 (a value
converter on the column) so the defence at the read site is no longer required.
### ConfigurationDatabase-021 — `SwitchOutPartitionAsync` interpolates `monthBoundary` / staging table name into raw SQL
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/AuditLogRepository.cs:192-338` |
**Resolution (2026-05-28):** Took the targeted (1) part of the recommendation —
the `monthBoundary` format string is now `"yyyy-MM-dd HH:mm:ss.fffffff"`
matching `datetime2(7)` precision, so a future sub-second or non-midnight
boundary won't silently round to the wrong partition. The staging table name
is still interpolated (T-SQL DDL identifiers can't be parameterised) but
remains internally constructed as `$"AuditLog_Staging_{Guid.NewGuid():N}"`
so SQL injection is structurally impossible. The larger
`sp_executesql`-with-typed-parameter migration was scoped to a future
follow-up since the current shape is fully controlled and the precision-
mismatch hazard was the only practical concern.
**Description**
`SwitchOutPartitionAsync` builds two large SQL batches via interpolated strings
(`sampleSql` and `sql`) that include `{monthBoundaryStr}` and `{stagingTableName}`
directly in the SQL text, and executes them via `ExecuteSqlRawAsync` /
`cmd.ExecuteScalarAsync`. Both values are constructed inside the method —
`monthBoundaryStr = monthBoundary.ToUniversalTime().ToString("yyyy-MM-dd HH:mm:ss")`
and `stagingTableName = $"AuditLog_Staging_{Guid.NewGuid():N}"` — and the formats are
fully controlled. SQL injection is therefore not possible as the code stands.
Two related concerns:
1. The format string `"yyyy-MM-dd HH:mm:ss"` truncates fractional seconds. The
partition function is seeded at `T00:00:00` exactly, so truncation happens to
produce the right boundary value today. A future change that adds a sub-second
boundary (or invokes `SwitchOutPartitionAsync` with a non-midnight value) would
silently round to the wrong partition with no error — and SWITCH PARTITION would
either fail loudly or succeed against the wrong month. Use
`"yyyy-MM-dd HH:mm:ss.fffffff"` to match the precision the migration seeds at,
and the rounding ambiguity disappears.
2. The pattern of "build a multi-statement DDL batch by string concatenation" is
robust today only by inspection. A code review tripwire — the CLAUDE.md note "the
data-access layer must not concatenate SQL" — would catch the pattern earlier;
converting the batch to a parameterised `sp_executesql` invocation (the inner
`EXEC sp_executesql @sql` already exists for the SWITCH itself) is the textbook
safe form even when the input is internally controlled.
**Recommendation**
(1) Switch `monthBoundaryStr`'s format to `"yyyy-MM-dd HH:mm:ss.fffffff"`. (2)
Optionally migrate the two batches to fully parameterised `sp_executesql` form so
the `monthBoundary` value flows as a typed `@boundary datetime2(7)` parameter
rather than as interpolated text — the only piece that genuinely *cannot* be
parameterised is the staging table identifier (DDL identifiers are not parameterisable
in T-SQL), but a server-side `QUOTENAME(@stagingTable)` wrapper covers it. Add a
regression test that supplies a non-midnight `monthBoundary` value and asserts the
boundary lookup resolves to the expected partition.
### ConfigurationDatabase-022 — Stale "WP-24 Stub level sufficient for diff/staleness support" XML comment on `DeploymentManagerRepository`
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/DeploymentManagerRepository.cs:8-14` |
**Description**
The class-level XML doc on `DeploymentManagerRepository` reads "WP-24: Stub level
sufficient for diff/staleness support." WP-24 (Deployment Manager work-package) shipped
long ago; the repository now covers full `DeploymentRecord` CRUD,
`SystemArtifactDeploymentRecord` CRUD, `DeployedConfigSnapshot` CRUD, and an
`Instance` deletion path with explicit Restrict-FK cleanup
(`DeleteInstanceAsync` at line 210-229). The comment misleads a reader into
thinking the repository is incomplete and tempts them not to investigate further
before adding new behaviour. The same module-context observation was noted but
not raised in the prior review.
**Recommendation**
Remove the WP-24 line and rewrite the class doc to describe what the repository
actually does today: EF Core implementation of `IDeploymentManagerRepository`
covering deployment records, system-artifact deployment records, deployed config
snapshots, and the Restrict-FK-aware `DeleteInstanceAsync` for the
deployment pipeline. Cross-reference the optimistic-concurrency contract on
`DeploymentRecord.RowVersion`.
**Resolution (2026-05-28):**
Replaced the stale "WP-24: Stub level sufficient for diff/staleness support" XML
doc with an accurate one-paragraph summary covering `DeploymentRecord` CRUD
(plus the `RowVersion` optimistic-concurrency contract), `SystemArtifactDeploymentRecord`
CRUD, `DeployedConfigSnapshot` CRUD, and the Restrict-FK-aware `DeleteInstanceAsync`.
No behaviour change.
### ConfigurationDatabase-023 — `AuditLog` correlation-index name drifts from design doc (`IX_AuditLog_CorrelationId` vs `IX_AuditLog_Correlation`)
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/AuditLogEntityTypeConfiguration.cs:99-101`, `Migrations/20260520142214_AddAuditLogTable.cs:103-107` |
**Description**
The Component-ConfigurationDatabase design doc lists the AuditLog indexes by name —
including `IX_AuditLog_Correlation (CorrelationId)` for the "drilldown from a single
operation" use case. The implemented index name is `IX_AuditLog_CorrelationId` (the
fluent-config `HasDatabaseName` call and the matching DDL in the migration both use
the `Id`-suffixed form). The names are syntactically valid SQL Server index names and
the index does the right work; the drift is cosmetic but it breaks scripted
maintenance ops that grep for the documented name (e.g. a runbook reindex script).
The other four documented index names (`IX_AuditLog_OccurredAtUtc`,
`IX_AuditLog_Site_Occurred`, `IX_AuditLog_Channel_Status_Occurred`,
`IX_AuditLog_Target_Occurred`, plus the post-design additions
`IX_AuditLog_Execution`, `IX_AuditLog_ParentExecution`, `IX_AuditLog_Node_Occurred`)
agree with the code.
**Recommendation**
Pick one direction. Updating the design doc to match the code is cheap (one word) and
preserves the existing migration; renaming the index in the database requires a new
migration that does `sp_rename`. Document-aligning is the lower-cost option and
matches the resolution pattern used for CD-005.
**Resolution (2026-05-28):**
Doc-aligns-to-code, preserving the existing migration. `docs/requirements/Component-AuditLog.md`
already lists the index as `IX_AuditLog_CorrelationId` (line 109). The stale
`IX_AuditLog_Correlation` reference in `docs/requirements/Component-ConfigurationDatabase.md`
line 64 (AuditLog table prose) was updated to `IX_AuditLog_CorrelationId`. Code unchanged;
index name remains `IX_AuditLog_CorrelationId`.
### ConfigurationDatabase-024 — Missing test coverage for SPLIT-RANGE failure-continuation and production-shape rowversion delete
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/Maintenance/AuditLogPartitionMaintenanceTests.cs`, `tests/.../RepositoryCoverageTests.cs:855-869` |
**Resolution (2026-05-28):** (1) Added `AuditLogPartitionMaintenanceTests.EnsureLookahead_SecondSplitThrows_LoopAborts_FirstBoundaryStillCommitted` (Skippable, MS SQL fixture) — installs a `DbCommandInterceptor` that lets the 1st `ALTER PARTITION FUNCTION pf_AuditLog_Month() SPLIT RANGE` through and throws on the 2nd, asserts the exception propagates (CD-019's no-try/catch behaviour), counts exactly one successful split, and verifies the first boundary IS now persisted in `pf_AuditLog_Month` so the next tick resumes from N+1 with no holes. (2) Added `DeploymentManagerRepositoryTests.DeleteDeploymentRecord_CurrentRowVersion_StubAttachPath_DeleteSucceeds` — production-shape happy path: caller holds the current RowVersion, change-tracker cleared, delete completes without throwing `DbUpdateConcurrencyException` and the row is gone (1 row affected).
**Description**
`AuditLogPartitionMaintenanceTests` exercises the happy-path SPLIT-RANGE behaviour
(no-op, single-month, three-month, already-exists idempotency) but never simulates a
SPLIT *failure* — so the catch-and-continue behaviour flagged in CD-019 is
behaviourally untested. The class is a central singleton driving daily audit purge;
a regression that turned the failure path into a permanent hole would not surface in
the test suite.
Separately, `RepositoryCoverageTests.DeleteDeploymentRecord_ViaStubAttachPath_RemovesEntity`
covers the stub-attach delete path under the SQLite test fixture, but the fixture
remaps `RowVersion` as a nullable concurrency token (`SqliteTestHelper`), so it does
not exercise the production-shape `IsRowVersion()` auto-population — the actual
concurrency-token bug flagged in CD-018 cannot show up. There is an
`MsSqlMigrationFixture` in the test project already (used by the Audit Log migration
tests); the stub-attach delete deserves a parallel MS-SQL-flavoured test.
**Recommendation**
(1) Add an `AuditLogPartitionMaintenanceTests` case that constructs a context against
a constrained login (no `ALTER ON SCHEMA::dbo`), invokes `EnsureLookaheadAsync` for a
three-month gap, and asserts: only the partition boundaries created BEFORE the
permissions failure landed remain, and the call aborts cleanly without continuing to
later months. This pins down the resolution of CD-019. (2) Add a
`RepositoryCoverageTests` case that uses `MsSqlMigrationFixture` to insert a
`DeploymentRecord`, clear the change tracker, call `DeleteDeploymentRecordAsync`,
and assert the row is gone — pinning the resolution of CD-018. Both tests should be
`[SkippableFact]` so the suite still passes when no MS SQL Server is available.