Files
lmxopcua/code-reviews/Configuration/findings.md
Joseph Doherty c126fc7a7d fix(configuration): resolve Medium code-review findings (Configuration-002, -003, -006, -009)
Configuration-002: sp_PublishGeneration is transaction-nesting aware
(BEGIN TRANSACTION vs SAVE TRANSACTION on @@TRANCOUNT) so a caller's outer
transaction survives a publish failure; sp_ValidateDraft wrapped in TRY/CATCH.
Configuration-003: ValidatePathLength uses the cluster's actual Enterprise/Site
lengths when available, falling back to the conservative approximation.
Configuration-006: ResilientConfigReader treats a command-timeout
TaskCanceledException as a fault (not caller cancellation) and falls back.
Configuration-009: removed the checked-in plaintext sa connection string;
CreateDbContext now requires OTOPCUA_CONFIG_CONNECTION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:13:27 -04:00

15 KiB
Raw Blame History

Code Review — Configuration

Field Value
Module src/Core/ZB.MOM.WW.OtOpcUa.Configuration
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 5

Checklist coverage

# Category Result
1 Correctness & logic bugs Configuration-001, Configuration-002, Configuration-003
2 OtOpcUa conventions Configuration-004
3 Concurrency & thread safety Configuration-005
4 Error handling & resilience Configuration-006, Configuration-007
5 Security Configuration-008, Configuration-009, Configuration-010
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions No issues found
9 Testing coverage Configuration-011
10 Documentation & comments No issues found

Findings

Configuration-001

Field Value
Severity High
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282
Status Resolved

Description: sp_PublishGeneration invokes EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId; and then continues unconditionally to the reservation MERGE and the Status='Published' update. sp_ValidateDraft signals every failure with RAISERROR(..., 16, 1) followed by RETURN. A severity-16 RAISERROR is not a batch-aborting error and SET XACT_ABORT ON does not abort the transaction for it, so control returns to sp_PublishGeneration, which publishes the draft even though validation rejected it (cross-cluster namespace binding, dangling tag FKs, duplicate external identifiers, EquipmentUuid immutability all pass through). Pre-publish validation is effectively bypassed.

Recommendation: Wrap the EXEC dbo.sp_ValidateDraft in BEGIN TRY ... END TRY BEGIN CATCH ROLLBACK; THROW; END CATCH so the validation RAISERROR propagates and aborts the publish, or have sp_ValidateDraft return a result-set/output parameter that sp_PublishGeneration inspects and explicitly rolls back on. Add a regression test that publishes a draft with a known violation and asserts it stays Draft.

Resolution: Resolved 2026-05-22 — wrapped the EXEC dbo.sp_ValidateDraft call in sp_PublishGeneration in a BEGIN TRY ... BEGIN CATCH ROLLBACK; THROW; END CATCH block so a validation RAISERROR rolls back the publish transaction and re-raises instead of being silently ignored; added DB-backed regression test Publish_aborts_when_ValidateDraft_rejects_the_draft.

Configuration-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325
Status Resolved

Description: sp_RollbackToGeneration opens its own BEGIN TRANSACTION, clones rows into a new Draft, then EXEC dbo.sp_PublishGeneration, which itself runs BEGIN TRANSACTION (nesting @@TRANCOUNT to 2) and on its failure paths executes a bare ROLLBACK. A bare ROLLBACK rolls back to the outermost transaction and sets @@TRANCOUNT to 0, so when sp_RollbackToGeneration later reaches its own COMMIT it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.

Recommendation: Make sp_PublishGeneration transaction-nesting aware: capture @@TRANCOUNT on entry, only BEGIN TRANSACTION when zero (otherwise SAVE TRANSACTION), and only COMMIT/ROLLBACK the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.

Resolution: Resolved 2026-05-22 — made sp_PublishGeneration transaction-nesting aware: captures @@TRANCOUNT on entry, issues BEGIN TRANSACTION when zero or SAVE TRANSACTION sp_PublishGeneration when nested, and uses ROLLBACK TRANSACTION sp_PublishGeneration (savepoint rollback) on all failure paths in the nested case so the caller's outer transaction is not wiped; also wrapped EXEC dbo.sp_ValidateDraft in BEGIN TRY ... END CATCH so validation errors propagate correctly.

Configuration-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73
Status Resolved

Description: ValidatePathLength computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (32 + 32 + ...) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail PathTooLong. The check also silently continues when an equipment's UnsLineId/UnsAreaId does not resolve, so an orphaned-line path is never length-checked.

Recommendation: Pass the actual Enterprise and Site strings into the validator (e.g. on DraftSnapshot, or as parameters alongside ValidateClusterTopology) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.

Resolution: Resolved 2026-05-22 — added nullable Enterprise and Site properties to DraftSnapshot; ValidatePathLength uses actual lengths when set and falls back to the conservative 32-char upper bound per segment with a comment explaining the trade-off; DraftValidationService now loads the cluster row and populates both properties; added PathLength_uses_actual_Enterprise_Site_when_provided and PathLength_conservative_fallback_when_Enterprise_Site_absent unit tests.

Configuration-004

Field Value
Severity Low
Category OtOpcUa conventions
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8, src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417
Status Open

Description: NodePermissions is declared [Flags] enum ... : uint, while its XML doc and NodeAcl.PermissionFlags' doc both say "stored as int", and ConfigureNodeAcl uses HasConversion<int>() — a uintint conversion. Only bits 011 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a uint value that overflows int and the conversion round-trip would corrupt it.

Recommendation: Change the enum underlying type to int (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.

Resolution: (open)

Configuration-005

Field Value
Severity Low
Category Concurrency & thread safety
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50
Status Open

Description: PutAsync performs a non-atomic find-then-insert/update. Two concurrent PutAsync calls for the same (ClusterId, GenerationId) can both observe existing is null and both Insert, producing two rows for one generation. The constructor's EnsureIndex calls are non-unique, so the storage layer does not prevent the duplicate, and PruneOldGenerationsAsync's keepLatest accounting is then off.

Recommendation: Declare a unique index on (ClusterId, GenerationId) and treat the duplicate-key exception as an idempotent no-op, or guard PutAsync with an instance SemaphoreSlim/lock. Document the concurrency contract on ILocalConfigCache.

Resolution: (open)

Configuration-006

Field Value
Severity Medium
Category Error handling & resilience
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79
Status Resolved

Description: The fallback catch filters on ex is not OperationCanceledException. A SQL command timeout surfaced by ADO.NET as a TaskCanceledException (derives from OperationCanceledException) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry ShouldHandle predicate has the same shape, so command-timeout cancellations are also not retried consistently.

Recommendation: Distinguish caller cancellation from command-timeout cancellation explicitly: inspect cancellationToken.IsCancellationRequested to decide whether an OperationCanceledException is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a TimeoutRejectedException path and a command-timeout TaskCanceledException path asserting cache fallback occurs.

Resolution: Resolved 2026-05-22 — changed the fallback catch filter to ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested so a command-timeout TaskCanceledException (caller token not cancelled) triggers cache fallback while genuine caller cancellation still propagates; changed the retry ShouldHandle predicate to Handle<Exception>() (handles all exceptions, relying on Polly's own cancellation-token check to stop retrying on genuine cancellation); added three unit tests: CommandTimeout_TaskCanceledException_FallsBackToCache, PollyTimeout_TimeoutRejectedException_FallsBackToCache, and CallerCancellation_Propagates_NotFallback.

Configuration-007

Field Value
Severity Low
Category Error handling & resilience
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44
Status Open

Description: ApplyPass wraps each callback in catch (Exception ex). This swallows OperationCanceledException — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.

Recommendation: Rethrow OperationCanceledException rather than recording it as an entity error; call ct.ThrowIfCancellationRequested() between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.

Resolution: (open)

Configuration-008

Field Value
Severity High
Category Security
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150, :373, :468
Status Resolved

Description: Three stored procedures build ConfigAuditLog.DetailsJson by raw string concatenation of caller-supplied nvarchar parameters: sp_RegisterNodeGenerationApplied (@Status), sp_RollbackToGeneration (@TargetGenerationId), sp_ReleaseExternalIdReservation (@Kind, @Value). A value with a double-quote or backslash produces malformed JSON; combined with the CK_ConfigAuditLog_DetailsJson_IsJson check constraint, the INSERT fails the constraint and aborts the surrounding publish/rollback transaction (denial of operation). It is also a JSON-injection vector that can silently rewrite the audit record's shape.

Recommendation: Build the JSON with a safe constructor (FOR JSON PATH, WITHOUT_ARRAY_WRAPPER or JSON_OBJECT(...) on SQL Server 2022+) so values are properly escaped, or run each interpolated value through STRING_ESCAPE(@Value, 'json'). Add tests with quote/backslash-containing inputs.

Resolution: Resolved 2026-05-22 — routed every caller-supplied string interpolated into DetailsJson through STRING_ESCAPE(@x, 'json') (@Status in sp_RegisterNodeGenerationApplied; @Kind/@Value in sp_ReleaseExternalIdReservation) and emitted sp_RollbackToGeneration's @TargetGenerationId as a bare JSON number via explicit CONVERT(nvarchar(20), CONVERT(bigint, ...)); added DB-backed regression tests RegisterNodeGenerationApplied_escapes_quotes_in_audit_DetailsJson and ReleaseReservation_escapes_quotes_in_audit_DetailsJson that round-trip quote/backslash inputs through ISJSON/JSON_VALUE.

Configuration-009

Field Value
Severity Medium
Category Security
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14
Status Resolved

Description: DefaultConnectionString embeds a plaintext sa password with User Id=sa directly in source, checked into the repository. Although used only at design time (dotnet ef), a checked-in sa credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants sa to anyone with repo access. TrustServerCertificate=True plus Encrypt=False additionally disables transport protection for that connection.

Recommendation: Drop the embedded credential. Fall back to integrated auth (Trusted_Connection=True) or fail fast with a message instructing the developer to set OTOPCUA_CONFIG_CONNECTION. Rotate the dev sa password if this value is live.

Resolution: Resolved 2026-05-22 — removed the embedded sa password and DefaultConnectionString constant entirely; CreateDbContext now throws InvalidOperationException with a clear setup message when OTOPCUA_CONFIG_CONNECTION is not set, rather than silently falling back to a hardcoded credential; added XML-doc example showing the recommended integrated-auth connection string.

Configuration-010

Field Value
Severity Low
Category Security
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81
Status Open

Description: On central-DB read failure the warning log records the full exception object. Callers pass arbitrary centralFetch delegates; if any delegate closes over a connection string, an exception thrown from it (or a SqlException carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.

Recommendation: Log ex.GetType().Name and ex.Message for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.

Resolution: (open)

Configuration-011

Field Value
Severity Low
Category Testing coverage
Location src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7, src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60
Status Open

Description: The companion test project covers the cache, schema compliance, stored procedures, and DraftValidator well, but two flagged behaviours are not pinned: (a) GenerationApplier ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) ValidatePathLength's constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.

Recommendation: Add GenerationApplierTests cases for a throwing callback (assert error recorded, assert cancellation propagates) and a DraftValidatorTests path-length boundary case. Add a StoredProceduresTests case that publishes an invalid draft and asserts it stays Draft.

Resolution: (open)