- Configuration-004: NodePermissions stored as int to match the EF HasConversion<int>() in OtOpcUaConfigDbContext.ConfigureNodeAcl. - Configuration-005: serialise LiteDbConfigCache.PutAsync so concurrent Put for the same (ClusterId, GenerationId) cannot duplicate rows. - Configuration-007: rethrow OperationCanceledException from GenerationApplier.ApplyPass when the caller's token is cancelled. - Configuration-010: scrub secrets and drop the full exception object from the ResilientConfigReader fallback warning log. - Configuration-011: pin the previously-uncovered GenerationApplier cancellation and path-length / publish-validation paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
Code Review — Configuration
| Field | Value |
|---|---|
| Module | src/Core/ZB.MOM.WW.OtOpcUa.Configuration |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | 76d35d1 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Configuration-001, Configuration-002, Configuration-003 |
| 2 | OtOpcUa conventions | Configuration-004 |
| 3 | Concurrency & thread safety | Configuration-005 |
| 4 | Error handling & resilience | Configuration-006, Configuration-007 |
| 5 | Security | Configuration-008, Configuration-009, Configuration-010 |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Configuration-011 |
| 10 | Documentation & comments | No issues found |
Findings
Configuration-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282 |
| Status | Resolved |
Description: sp_PublishGeneration invokes EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId; and then continues unconditionally to the reservation MERGE and the Status='Published' update. sp_ValidateDraft signals every failure with RAISERROR(..., 16, 1) followed by RETURN. A severity-16 RAISERROR is not a batch-aborting error and SET XACT_ABORT ON does not abort the transaction for it, so control returns to sp_PublishGeneration, which publishes the draft even though validation rejected it (cross-cluster namespace binding, dangling tag FKs, duplicate external identifiers, EquipmentUuid immutability all pass through). Pre-publish validation is effectively bypassed.
Recommendation: Wrap the EXEC dbo.sp_ValidateDraft in BEGIN TRY ... END TRY BEGIN CATCH ROLLBACK; THROW; END CATCH so the validation RAISERROR propagates and aborts the publish, or have sp_ValidateDraft return a result-set/output parameter that sp_PublishGeneration inspects and explicitly rolls back on. Add a regression test that publishes a draft with a known violation and asserts it stays Draft.
Resolution: Resolved 2026-05-22 — wrapped the EXEC dbo.sp_ValidateDraft call in sp_PublishGeneration in a BEGIN TRY ... BEGIN CATCH ROLLBACK; THROW; END CATCH block so a validation RAISERROR rolls back the publish transaction and re-raises instead of being silently ignored; added DB-backed regression test Publish_aborts_when_ValidateDraft_rejects_the_draft.
Configuration-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325 |
| Status | Resolved |
Description: sp_RollbackToGeneration opens its own BEGIN TRANSACTION, clones rows into a new Draft, then EXEC dbo.sp_PublishGeneration, which itself runs BEGIN TRANSACTION (nesting @@TRANCOUNT to 2) and on its failure paths executes a bare ROLLBACK. A bare ROLLBACK rolls back to the outermost transaction and sets @@TRANCOUNT to 0, so when sp_RollbackToGeneration later reaches its own COMMIT it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.
Recommendation: Make sp_PublishGeneration transaction-nesting aware: capture @@TRANCOUNT on entry, only BEGIN TRANSACTION when zero (otherwise SAVE TRANSACTION), and only COMMIT/ROLLBACK the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.
Resolution: Resolved 2026-05-22 — made sp_PublishGeneration transaction-nesting aware: captures @@TRANCOUNT on entry, issues BEGIN TRANSACTION when zero or SAVE TRANSACTION sp_PublishGeneration when nested, and uses ROLLBACK TRANSACTION sp_PublishGeneration (savepoint rollback) on all failure paths in the nested case so the caller's outer transaction is not wiped; also wrapped EXEC dbo.sp_ValidateDraft in BEGIN TRY ... END CATCH so validation errors propagate correctly.
Configuration-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73 |
| Status | Resolved |
Description: ValidatePathLength computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (32 + 32 + ...) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail PathTooLong. The check also silently continues when an equipment's UnsLineId/UnsAreaId does not resolve, so an orphaned-line path is never length-checked.
Recommendation: Pass the actual Enterprise and Site strings into the validator (e.g. on DraftSnapshot, or as parameters alongside ValidateClusterTopology) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.
Resolution: Resolved 2026-05-22 — added nullable Enterprise and Site properties to DraftSnapshot; ValidatePathLength uses actual lengths when set and falls back to the conservative 32-char upper bound per segment with a comment explaining the trade-off; DraftValidationService now loads the cluster row and populates both properties; added PathLength_uses_actual_Enterprise_Site_when_provided and PathLength_conservative_fallback_when_Enterprise_Site_absent unit tests.
Configuration-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8, src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417 |
| Status | Resolved |
Description: NodePermissions is declared [Flags] enum ... : uint, while its XML doc and NodeAcl.PermissionFlags' doc both say "stored as int", and ConfigureNodeAcl uses HasConversion<int>() — a uint→int conversion. Only bits 0–11 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a uint value that overflows int and the conversion round-trip would corrupt it.
Recommendation: Change the enum underlying type to int (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.
Resolution: Resolved 2026-05-23 — changed NodePermissions underlying type from uint to int so it matches the documented "stored as int" semantics and the HasConversion<int>() mapping in OtOpcUaConfigDbContext.ConfigureNodeAcl. Added regression test NodePermissionsTests pinning the underlying type and round-trip safety through int storage.
Configuration-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50 |
| Status | Resolved |
Description: PutAsync performs a non-atomic find-then-insert/update. Two concurrent PutAsync calls for the same (ClusterId, GenerationId) can both observe existing is null and both Insert, producing two rows for one generation. The constructor's EnsureIndex calls are non-unique, so the storage layer does not prevent the duplicate, and PruneOldGenerationsAsync's keepLatest accounting is then off.
Recommendation: Declare a unique index on (ClusterId, GenerationId) and treat the duplicate-key exception as an idempotent no-op, or guard PutAsync with an instance SemaphoreSlim/lock. Document the concurrency contract on ILocalConfigCache.
Resolution: Resolved 2026-05-23 — guarded PutAsync with an instance-level SemaphoreSlim so the find-then-insert/update window runs atomically for a given cache instance; documented the concurrency contract on ILocalConfigCache. Added regression test PutAsync_concurrent_for_same_cluster_and_generation_does_not_duplicate that runs 64 concurrent puts and inspects the LiteDB file directly to confirm exactly one row per (ClusterId, GenerationId) survives.
Configuration-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79 |
| Status | Resolved |
Description: The fallback catch filters on ex is not OperationCanceledException. A SQL command timeout surfaced by ADO.NET as a TaskCanceledException (derives from OperationCanceledException) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry ShouldHandle predicate has the same shape, so command-timeout cancellations are also not retried consistently.
Recommendation: Distinguish caller cancellation from command-timeout cancellation explicitly: inspect cancellationToken.IsCancellationRequested to decide whether an OperationCanceledException is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a TimeoutRejectedException path and a command-timeout TaskCanceledException path asserting cache fallback occurs.
Resolution: Resolved 2026-05-22 — changed the fallback catch filter to ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested so a command-timeout TaskCanceledException (caller token not cancelled) triggers cache fallback while genuine caller cancellation still propagates; changed the retry ShouldHandle predicate to Handle<Exception>() (handles all exceptions, relying on Polly's own cancellation-token check to stop retrying on genuine cancellation); added three unit tests: CommandTimeout_TaskCanceledException_FallsBackToCache, PollyTimeout_TimeoutRejectedException_FallsBackToCache, and CallerCancellation_Propagates_NotFallback.
Configuration-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44 |
| Status | Resolved |
Description: ApplyPass wraps each callback in catch (Exception ex). This swallows OperationCanceledException — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.
Recommendation: Rethrow OperationCanceledException rather than recording it as an entity error; call ct.ThrowIfCancellationRequested() between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.
Resolution: Resolved 2026-05-23 — added catch (OperationCanceledException) when (ct.IsCancellationRequested) { throw; } ahead of the generic catch in ApplyPass so genuine caller cancellation propagates rather than being recorded as an entity error, and added a ct.ThrowIfCancellationRequested() at the top of each Added/Modified pass iteration. The "failed Removed pass keeps walking Added/Modified" behaviour was confirmed as the intended contract (cascades must settle) and pinned by Apply_continues_to_Added_pass_when_a_Removed_callback_throws. New regression tests: Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled, Apply_stops_between_passes_when_cancellation_requested.
Configuration-008
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150, :373, :468 |
| Status | Resolved |
Description: Three stored procedures build ConfigAuditLog.DetailsJson by raw string concatenation of caller-supplied nvarchar parameters: sp_RegisterNodeGenerationApplied (@Status), sp_RollbackToGeneration (@TargetGenerationId), sp_ReleaseExternalIdReservation (@Kind, @Value). A value with a double-quote or backslash produces malformed JSON; combined with the CK_ConfigAuditLog_DetailsJson_IsJson check constraint, the INSERT fails the constraint and aborts the surrounding publish/rollback transaction (denial of operation). It is also a JSON-injection vector that can silently rewrite the audit record's shape.
Recommendation: Build the JSON with a safe constructor (FOR JSON PATH, WITHOUT_ARRAY_WRAPPER or JSON_OBJECT(...) on SQL Server 2022+) so values are properly escaped, or run each interpolated value through STRING_ESCAPE(@Value, 'json'). Add tests with quote/backslash-containing inputs.
Resolution: Resolved 2026-05-22 — routed every caller-supplied string interpolated into DetailsJson through STRING_ESCAPE(@x, 'json') (@Status in sp_RegisterNodeGenerationApplied; @Kind/@Value in sp_ReleaseExternalIdReservation) and emitted sp_RollbackToGeneration's @TargetGenerationId as a bare JSON number via explicit CONVERT(nvarchar(20), CONVERT(bigint, ...)); added DB-backed regression tests RegisterNodeGenerationApplied_escapes_quotes_in_audit_DetailsJson and ReleaseReservation_escapes_quotes_in_audit_DetailsJson that round-trip quote/backslash inputs through ISJSON/JSON_VALUE.
Configuration-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14 |
| Status | Resolved |
Description: DefaultConnectionString embeds a plaintext sa password with User Id=sa directly in source, checked into the repository. Although used only at design time (dotnet ef), a checked-in sa credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants sa to anyone with repo access. TrustServerCertificate=True plus Encrypt=False additionally disables transport protection for that connection.
Recommendation: Drop the embedded credential. Fall back to integrated auth (Trusted_Connection=True) or fail fast with a message instructing the developer to set OTOPCUA_CONFIG_CONNECTION. Rotate the dev sa password if this value is live.
Resolution: Resolved 2026-05-22 — removed the embedded sa password and DefaultConnectionString constant entirely; CreateDbContext now throws InvalidOperationException with a clear setup message when OTOPCUA_CONFIG_CONNECTION is not set, rather than silently falling back to a hardcoded credential; added XML-doc example showing the recommended integrated-auth connection string.
Configuration-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81 |
| Status | Resolved |
Description: On central-DB read failure the warning log records the full exception object. Callers pass arbitrary centralFetch delegates; if any delegate closes over a connection string, an exception thrown from it (or a SqlException carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.
Recommendation: Log ex.GetType().Name and ex.Message for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.
Resolution: Resolved 2026-05-23 — stopped passing the raw exception object to LogWarning; the fallback log now records only ex.GetType().Name and a ScrubSecrets-redacted ex.Message so connection-string fragments (Password, User Id, Pwd, Uid, AccessToken, Authorization, ApiKey/Api-Key) are stripped before reaching any sink. Added regression test FallbackWarning_does_not_log_full_exception_object_or_password_fragment that captures emitted log records and asserts no raw exception attached and no credential keyword present in the rendered message.
Configuration-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7, src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60 |
| Status | Resolved |
Description: The companion test project covers the cache, schema compliance, stored procedures, and DraftValidator well, but two flagged behaviours are not pinned: (a) GenerationApplier ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) ValidatePathLength's constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.
Recommendation: Add GenerationApplierTests cases for a throwing callback (assert error recorded, assert cancellation propagates) and a DraftValidatorTests path-length boundary case. Add a StoredProceduresTests case that publishes an invalid draft and asserts it stays Draft.
Resolution: Resolved 2026-05-23 — all three gaps now covered. (a) GenerationApplierTests.Apply_continues_to_Added_pass_when_a_Removed_callback_throws pins ordering; Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled and Apply_stops_between_passes_when_cancellation_requested (added under Configuration-007) pin cancellation. (b) DraftValidatorTests.PathLength_uses_actual_Enterprise_Site_when_provided and PathLength_conservative_fallback_when_Enterprise_Site_absent (added under Configuration-003) pin the path-length boundary. (c) StoredProceduresTests.Publish_aborts_when_ValidateDraft_rejects_the_draft (added under Configuration-001) pins the publish-bypasses-validation regression against the live SQL fixture.