Files
lmxopcua/code-reviews/Configuration/findings.md
Joseph Doherty ee51878c08 fix(configuration): resolve High code-review findings (Configuration-001, Configuration-008)
Configuration-001: wrap the EXEC dbo.sp_ValidateDraft call in
sp_PublishGeneration in a BEGIN TRY/CATCH ROLLBACK; THROW block so a
validation RAISERROR aborts the publish instead of being ignored.

Configuration-008: route caller-supplied strings interpolated into
ConfigAuditLog.DetailsJson through STRING_ESCAPE(@x, 'json') and emit
sp_RollbackToGeneration's @TargetGenerationId as a bare JSON number,
closing the JSON-injection / denial-of-operation vector.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:12:00 -04:00

193 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — Configuration
| Field | Value |
|---|---|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 9 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Configuration-001, Configuration-002, Configuration-003 |
| 2 | OtOpcUa conventions | Configuration-004 |
| 3 | Concurrency & thread safety | Configuration-005 |
| 4 | Error handling & resilience | Configuration-006, Configuration-007 |
| 5 | Security | Configuration-008, Configuration-009, Configuration-010 |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Configuration-011 |
| 10 | Documentation & comments | No issues found |
## Findings
### Configuration-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282` |
| Status | Resolved |
**Description:** `sp_PublishGeneration` invokes `EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;` and then continues unconditionally to the reservation MERGE and the `Status='Published'` update. `sp_ValidateDraft` signals every failure with `RAISERROR(..., 16, 1)` followed by `RETURN`. A severity-16 `RAISERROR` is not a batch-aborting error and `SET XACT_ABORT ON` does not abort the transaction for it, so control returns to `sp_PublishGeneration`, which publishes the draft even though validation rejected it (cross-cluster namespace binding, dangling tag FKs, duplicate external identifiers, EquipmentUuid immutability all pass through). Pre-publish validation is effectively bypassed.
**Recommendation:** Wrap the `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END TRY BEGIN CATCH ROLLBACK; THROW; END CATCH` so the validation `RAISERROR` propagates and aborts the publish, or have `sp_ValidateDraft` return a result-set/output parameter that `sp_PublishGeneration` inspects and explicitly rolls back on. Add a regression test that publishes a draft with a known violation and asserts it stays `Draft`.
**Resolution:** Resolved 2026-05-22 — wrapped the `EXEC dbo.sp_ValidateDraft` call in `sp_PublishGeneration` in a `BEGIN TRY ... BEGIN CATCH ROLLBACK; THROW; END CATCH` block so a validation `RAISERROR` rolls back the publish transaction and re-raises instead of being silently ignored; added DB-backed regression test `Publish_aborts_when_ValidateDraft_rejects_the_draft`.
### Configuration-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
| Status | Open |
**Description:** `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `ROLLBACK`. A bare `ROLLBACK` rolls back to the outermost transaction and sets `@@TRANCOUNT` to 0, so when `sp_RollbackToGeneration` later reaches its own `COMMIT` it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.
**Recommendation:** Make `sp_PublishGeneration` transaction-nesting aware: capture `@@TRANCOUNT` on entry, only `BEGIN TRANSACTION` when zero (otherwise `SAVE TRANSACTION`), and only `COMMIT`/`ROLLBACK` the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.
**Resolution:** _(open)_
### Configuration-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
| Status | Open |
**Description:** `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail `PathTooLong`. The check also silently `continue`s when an equipment's `UnsLineId`/`UnsAreaId` does not resolve, so an orphaned-line path is never length-checked.
**Recommendation:** Pass the actual `Enterprise` and `Site` strings into the validator (e.g. on `DraftSnapshot`, or as parameters alongside `ValidateClusterTopology`) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.
**Resolution:** _(open)_
### Configuration-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
| Status | Open |
**Description:** `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint``int` conversion. Only bits 011 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a `uint` value that overflows `int` and the conversion round-trip would corrupt it.
**Recommendation:** Change the enum underlying type to `int` (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.
**Resolution:** _(open)_
### Configuration-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
| Status | Open |
**Description:** `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructor's `EnsureIndex` calls are non-unique, so the storage layer does not prevent the duplicate, and `PruneOldGenerationsAsync`'s `keepLatest` accounting is then off.
**Recommendation:** Declare a unique index on `(ClusterId, GenerationId)` and treat the duplicate-key exception as an idempotent no-op, or guard `PutAsync` with an instance `SemaphoreSlim`/lock. Document the concurrency contract on `ILocalConfigCache`.
**Resolution:** _(open)_
### Configuration-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
| Status | Open |
**Description:** The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry `ShouldHandle` predicate has the same shape, so command-timeout cancellations are also not retried consistently.
**Recommendation:** Distinguish caller cancellation from command-timeout cancellation explicitly: inspect `cancellationToken.IsCancellationRequested` to decide whether an `OperationCanceledException` is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a `TimeoutRejectedException` path and a command-timeout `TaskCanceledException` path asserting cache fallback occurs.
**Resolution:** _(open)_
### Configuration-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
| Status | Open |
**Description:** `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.
**Recommendation:** Rethrow `OperationCanceledException` rather than recording it as an entity error; call `ct.ThrowIfCancellationRequested()` between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.
**Resolution:** _(open)_
### Configuration-008
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150`, `:373`, `:468` |
| Status | Resolved |
**Description:** Three stored procedures build `ConfigAuditLog.DetailsJson` by raw string concatenation of caller-supplied `nvarchar` parameters: `sp_RegisterNodeGenerationApplied` (`@Status`), `sp_RollbackToGeneration` (`@TargetGenerationId`), `sp_ReleaseExternalIdReservation` (`@Kind`, `@Value`). A value with a double-quote or backslash produces malformed JSON; combined with the `CK_ConfigAuditLog_DetailsJson_IsJson` check constraint, the `INSERT` fails the constraint and aborts the surrounding publish/rollback transaction (denial of operation). It is also a JSON-injection vector that can silently rewrite the audit record's shape.
**Recommendation:** Build the JSON with a safe constructor (`FOR JSON PATH, WITHOUT_ARRAY_WRAPPER` or `JSON_OBJECT(...)` on SQL Server 2022+) so values are properly escaped, or run each interpolated value through `STRING_ESCAPE(@Value, 'json')`. Add tests with quote/backslash-containing inputs.
**Resolution:** Resolved 2026-05-22 — routed every caller-supplied string interpolated into `DetailsJson` through `STRING_ESCAPE(@x, 'json')` (`@Status` in `sp_RegisterNodeGenerationApplied`; `@Kind`/`@Value` in `sp_ReleaseExternalIdReservation`) and emitted `sp_RollbackToGeneration`'s `@TargetGenerationId` as a bare JSON number via explicit `CONVERT(nvarchar(20), CONVERT(bigint, ...))`; added DB-backed regression tests `RegisterNodeGenerationApplied_escapes_quotes_in_audit_DetailsJson` and `ReleaseReservation_escapes_quotes_in_audit_DetailsJson` that round-trip quote/backslash inputs through `ISJSON`/`JSON_VALUE`.
### Configuration-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
| Status | Open |
**Description:** `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants `sa` to anyone with repo access. `TrustServerCertificate=True` plus `Encrypt=False` additionally disables transport protection for that connection.
**Recommendation:** Drop the embedded credential. Fall back to integrated auth (`Trusted_Connection=True`) or fail fast with a message instructing the developer to set `OTOPCUA_CONFIG_CONNECTION`. Rotate the dev `sa` password if this value is live.
**Resolution:** _(open)_
### Configuration-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` |
| Status | Open |
**Description:** On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.
**Recommendation:** Log `ex.GetType().Name` and `ex.Message` for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.
**Resolution:** _(open)_
### Configuration-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` |
| Status | Open |
**Description:** The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) `ValidatePathLength`'s constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.
**Recommendation:** Add `GenerationApplierTests` cases for a throwing callback (assert error recorded, assert cancellation propagates) and a `DraftValidatorTests` path-length boundary case. Add a `StoredProceduresTests` case that publishes an invalid draft and asserts it stays `Draft`.
**Resolution:** _(open)_