Files
lmxopcua/code-reviews/Configuration/findings.md
T
Joseph Doherty c3d148e396 review(Configuration): fix LiteDB global BsonMapper cross-instance race (High)
Re-review at 7286d320. Configuration-012 (High): LiteDbConfigCache/GenerationSealedCache built
LiteDatabase on the process-wide BsonMapper.Global whose lazy member resolution races across
concurrently-constructed DBs (NotSupportedException/duplicate-key under contention; also caused
intermittent suite flakiness). Fix: per-cache fresh BsonMapper + pre-registered entity + TDD.
-013 (dead ValidateClusterTopology, ControlPlane) / -014 (collation case-sensitivity, needs
migration) deferred. No migration touched.
2026-06-19 11:06:56 -04:00

257 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — Configuration
| Field | Value |
|---|---|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration` |
| Reviewer | Claude Code |
| Review date | 2026-06-19 (re-review; first reviewed 2026-05-22) |
| Commit reviewed | `7286d320` (re-review; was `76d35d1`) |
| Status | Reviewed |
| Open findings | 2 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Configuration-001, Configuration-002, Configuration-003 |
| 2 | OtOpcUa conventions | Configuration-004 |
| 3 | Concurrency & thread safety | Configuration-005 |
| 4 | Error handling & resilience | Configuration-006, Configuration-007 |
| 5 | Security | Configuration-008, Configuration-009, Configuration-010 |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Configuration-011 |
| 10 | Documentation & comments | No issues found |
## Findings
### Configuration-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282` |
| Status | Resolved |
**Description:** `sp_PublishGeneration` invokes `EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;` and then continues unconditionally to the reservation MERGE and the `Status='Published'` update. `sp_ValidateDraft` signals every failure with `RAISERROR(..., 16, 1)` followed by `RETURN`. A severity-16 `RAISERROR` is not a batch-aborting error and `SET XACT_ABORT ON` does not abort the transaction for it, so control returns to `sp_PublishGeneration`, which publishes the draft even though validation rejected it (cross-cluster namespace binding, dangling tag FKs, duplicate external identifiers, EquipmentUuid immutability all pass through). Pre-publish validation is effectively bypassed.
**Recommendation:** Wrap the `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END TRY BEGIN CATCH ROLLBACK; THROW; END CATCH` so the validation `RAISERROR` propagates and aborts the publish, or have `sp_ValidateDraft` return a result-set/output parameter that `sp_PublishGeneration` inspects and explicitly rolls back on. Add a regression test that publishes a draft with a known violation and asserts it stays `Draft`.
**Resolution:** Resolved 2026-05-22 — wrapped the `EXEC dbo.sp_ValidateDraft` call in `sp_PublishGeneration` in a `BEGIN TRY ... BEGIN CATCH ROLLBACK; THROW; END CATCH` block so a validation `RAISERROR` rolls back the publish transaction and re-raises instead of being silently ignored; added DB-backed regression test `Publish_aborts_when_ValidateDraft_rejects_the_draft`.
### Configuration-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
| Status | Resolved |
**Description:** `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `ROLLBACK`. A bare `ROLLBACK` rolls back to the outermost transaction and sets `@@TRANCOUNT` to 0, so when `sp_RollbackToGeneration` later reaches its own `COMMIT` it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.
**Recommendation:** Make `sp_PublishGeneration` transaction-nesting aware: capture `@@TRANCOUNT` on entry, only `BEGIN TRANSACTION` when zero (otherwise `SAVE TRANSACTION`), and only `COMMIT`/`ROLLBACK` the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.
**Resolution:** Resolved 2026-05-22 — made `sp_PublishGeneration` transaction-nesting aware: captures `@@TRANCOUNT` on entry, issues `BEGIN TRANSACTION` when zero or `SAVE TRANSACTION sp_PublishGeneration` when nested, and uses `ROLLBACK TRANSACTION sp_PublishGeneration` (savepoint rollback) on all failure paths in the nested case so the caller's outer transaction is not wiped; also wrapped `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END CATCH` so validation errors propagate correctly.
### Configuration-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
| Status | Resolved |
**Description:** `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail `PathTooLong`. The check also silently `continue`s when an equipment's `UnsLineId`/`UnsAreaId` does not resolve, so an orphaned-line path is never length-checked.
**Recommendation:** Pass the actual `Enterprise` and `Site` strings into the validator (e.g. on `DraftSnapshot`, or as parameters alongside `ValidateClusterTopology`) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.
**Resolution:** Resolved 2026-05-22 — added nullable `Enterprise` and `Site` properties to `DraftSnapshot`; `ValidatePathLength` uses actual lengths when set and falls back to the conservative 32-char upper bound per segment with a comment explaining the trade-off; `DraftValidationService` now loads the cluster row and populates both properties; added `PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` unit tests.
### Configuration-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
| Status | Resolved |
**Description:** `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint``int` conversion. Only bits 011 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a `uint` value that overflows `int` and the conversion round-trip would corrupt it.
**Recommendation:** Change the enum underlying type to `int` (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.
**Resolution:** Resolved 2026-05-23 — changed `NodePermissions` underlying type from `uint` to `int` so it matches the documented "stored as int" semantics and the `HasConversion<int>()` mapping in `OtOpcUaConfigDbContext.ConfigureNodeAcl`. Added regression test `NodePermissionsTests` pinning the underlying type and round-trip safety through `int` storage.
### Configuration-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
| Status | Resolved |
**Description:** `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructor's `EnsureIndex` calls are non-unique, so the storage layer does not prevent the duplicate, and `PruneOldGenerationsAsync`'s `keepLatest` accounting is then off.
**Recommendation:** Declare a unique index on `(ClusterId, GenerationId)` and treat the duplicate-key exception as an idempotent no-op, or guard `PutAsync` with an instance `SemaphoreSlim`/lock. Document the concurrency contract on `ILocalConfigCache`.
**Resolution:** Resolved 2026-05-23 — guarded `PutAsync` with an instance-level `SemaphoreSlim` so the find-then-insert/update window runs atomically for a given cache instance; documented the concurrency contract on `ILocalConfigCache`. Added regression test `PutAsync_concurrent_for_same_cluster_and_generation_does_not_duplicate` that runs 64 concurrent puts and inspects the LiteDB file directly to confirm exactly one row per `(ClusterId, GenerationId)` survives.
### Configuration-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
| Status | Resolved |
**Description:** The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry `ShouldHandle` predicate has the same shape, so command-timeout cancellations are also not retried consistently.
**Recommendation:** Distinguish caller cancellation from command-timeout cancellation explicitly: inspect `cancellationToken.IsCancellationRequested` to decide whether an `OperationCanceledException` is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a `TimeoutRejectedException` path and a command-timeout `TaskCanceledException` path asserting cache fallback occurs.
**Resolution:** Resolved 2026-05-22 — changed the fallback `catch` filter to `ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested` so a command-timeout `TaskCanceledException` (caller token not cancelled) triggers cache fallback while genuine caller cancellation still propagates; changed the retry `ShouldHandle` predicate to `Handle<Exception>()` (handles all exceptions, relying on Polly's own cancellation-token check to stop retrying on genuine cancellation); added three unit tests: `CommandTimeout_TaskCanceledException_FallsBackToCache`, `PollyTimeout_TimeoutRejectedException_FallsBackToCache`, and `CallerCancellation_Propagates_NotFallback`.
### Configuration-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
| Status | Resolved |
**Description:** `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.
**Recommendation:** Rethrow `OperationCanceledException` rather than recording it as an entity error; call `ct.ThrowIfCancellationRequested()` between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.
**Resolution:** Resolved 2026-05-23 — added `catch (OperationCanceledException) when (ct.IsCancellationRequested) { throw; }` ahead of the generic catch in `ApplyPass` so genuine caller cancellation propagates rather than being recorded as an entity error, and added a `ct.ThrowIfCancellationRequested()` at the top of each Added/Modified pass iteration. The "failed Removed pass keeps walking Added/Modified" behaviour was confirmed as the intended contract (cascades must settle) and pinned by `Apply_continues_to_Added_pass_when_a_Removed_callback_throws`. New regression tests: `Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled`, `Apply_stops_between_passes_when_cancellation_requested`.
### Configuration-008
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150`, `:373`, `:468` |
| Status | Resolved |
**Description:** Three stored procedures build `ConfigAuditLog.DetailsJson` by raw string concatenation of caller-supplied `nvarchar` parameters: `sp_RegisterNodeGenerationApplied` (`@Status`), `sp_RollbackToGeneration` (`@TargetGenerationId`), `sp_ReleaseExternalIdReservation` (`@Kind`, `@Value`). A value with a double-quote or backslash produces malformed JSON; combined with the `CK_ConfigAuditLog_DetailsJson_IsJson` check constraint, the `INSERT` fails the constraint and aborts the surrounding publish/rollback transaction (denial of operation). It is also a JSON-injection vector that can silently rewrite the audit record's shape.
**Recommendation:** Build the JSON with a safe constructor (`FOR JSON PATH, WITHOUT_ARRAY_WRAPPER` or `JSON_OBJECT(...)` on SQL Server 2022+) so values are properly escaped, or run each interpolated value through `STRING_ESCAPE(@Value, 'json')`. Add tests with quote/backslash-containing inputs.
**Resolution:** Resolved 2026-05-22 — routed every caller-supplied string interpolated into `DetailsJson` through `STRING_ESCAPE(@x, 'json')` (`@Status` in `sp_RegisterNodeGenerationApplied`; `@Kind`/`@Value` in `sp_ReleaseExternalIdReservation`) and emitted `sp_RollbackToGeneration`'s `@TargetGenerationId` as a bare JSON number via explicit `CONVERT(nvarchar(20), CONVERT(bigint, ...))`; added DB-backed regression tests `RegisterNodeGenerationApplied_escapes_quotes_in_audit_DetailsJson` and `ReleaseReservation_escapes_quotes_in_audit_DetailsJson` that round-trip quote/backslash inputs through `ISJSON`/`JSON_VALUE`.
### Configuration-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
| Status | Resolved |
**Description:** `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants `sa` to anyone with repo access. `TrustServerCertificate=True` plus `Encrypt=False` additionally disables transport protection for that connection.
**Recommendation:** Drop the embedded credential. Fall back to integrated auth (`Trusted_Connection=True`) or fail fast with a message instructing the developer to set `OTOPCUA_CONFIG_CONNECTION`. Rotate the dev `sa` password if this value is live.
**Resolution:** Resolved 2026-05-22 — removed the embedded `sa` password and `DefaultConnectionString` constant entirely; `CreateDbContext` now throws `InvalidOperationException` with a clear setup message when `OTOPCUA_CONFIG_CONNECTION` is not set, rather than silently falling back to a hardcoded credential; added XML-doc example showing the recommended integrated-auth connection string.
### Configuration-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` |
| Status | Resolved |
**Description:** On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.
**Recommendation:** Log `ex.GetType().Name` and `ex.Message` for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.
**Resolution:** Resolved 2026-05-23 — stopped passing the raw exception object to `LogWarning`; the fallback log now records only `ex.GetType().Name` and a `ScrubSecrets`-redacted `ex.Message` so connection-string fragments (Password, User Id, Pwd, Uid, AccessToken, Authorization, ApiKey/Api-Key) are stripped before reaching any sink. Added regression test `FallbackWarning_does_not_log_full_exception_object_or_password_fragment` that captures emitted log records and asserts no raw exception attached and no credential keyword present in the rendered message.
### Configuration-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` |
| Status | Resolved |
**Description:** The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) `ValidatePathLength`'s constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.
**Recommendation:** Add `GenerationApplierTests` cases for a throwing callback (assert error recorded, assert cancellation propagates) and a `DraftValidatorTests` path-length boundary case. Add a `StoredProceduresTests` case that publishes an invalid draft and asserts it stays `Draft`.
**Resolution:** Resolved 2026-05-23 — all three gaps now covered. (a) `GenerationApplierTests.Apply_continues_to_Added_pass_when_a_Removed_callback_throws` pins ordering; `Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled` and `Apply_stops_between_passes_when_cancellation_requested` (added under Configuration-007) pin cancellation. (b) `DraftValidatorTests.PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` (added under Configuration-003) pin the path-length boundary. (c) `StoredProceduresTests.Publish_aborts_when_ValidateDraft_rejects_the_draft` (added under Configuration-001) pins the publish-bypasses-validation regression against the live SQL fixture.
## Re-review 2026-06-19 (commit 7286d320)
Re-reviewed the full non-migration C# data layer at `7286d320` (the Configuration source is byte-identical between `7286d320` and the working HEAD `8ac5a2db``git diff 7286d320 HEAD` shows no non-migration Configuration changes). Migration files remain off-limits and were not touched.
Prior findings Configuration-001…011 remain Resolved. Notable since the first review: the v1 generation model and its stored-proc publish path are retired; `DraftValidationService` (named in the Configuration-003 resolution) is replaced by `DraftSnapshotFactory`, which deliberately leaves `Enterprise`/`Site` null so `ValidatePathLength` uses its conservative upper bound — an intentional, documented fallback, not a regression. The DB-backed `StoredProceduresTests` referenced by the Configuration-001/008/011 resolutions no longer exist in the test project (consolidated out when the generation model dropped); the live SQL `SchemaComplianceTests`, `DriverHostStatusTests`, and `AuthorizationTests` still run against the Docker SQL fixture and pass.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Configuration-012 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | Configuration-012 |
| 4 | Error handling & resilience | No issues found (Configuration-006/007 fixes verified still in place) |
| 5 | Security | No issues found (Configuration-008/009/010 fixes verified still in place; `DesignTimeDbContextFactory` carries no credential; `ResilientConfigReader.ScrubSecrets` intact) |
| 6 | Performance & resource management | No issues found (factory + query paths use `AsNoTracking`; cluster audit query is a bounded two-step, not N+1) |
| 7 | Design-document adherence | Configuration-013 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Configuration-013 (dead-code path untested in prod wiring), Configuration-014 |
| 10 | Documentation & comments | Configuration-014 |
### Configuration-012
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:31`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs:71`, `:129` |
| Status | Resolved |
**Description:** Both LiteDB-backed caches constructed their `LiteDatabase` with the default constructor, which binds the process-wide singleton `BsonMapper.Global`. LiteDB resolves a POCO's members into that mapper **lazily on first use and without thread-safety across concurrently-constructed `LiteDatabase` instances**. When two caches initialise or run in parallel — e.g. a `GenerationSealedCache.SealAsync` racing a `ReadCurrentAsync`, or `LiteDbConfigCache` racing the sealed cache, or simply the test suite running both classes' fixtures in parallel — the global mapper races, surfacing as `NotSupportedException: Member ClusterId not found on BsonMapper for type GenerationSnapshot` (the query predicate can't resolve the member) or `LiteException: Cannot insert duplicate key in unique index '_id'. The duplicate value is '0'` (the `int Id` auto-id mapping was lost, so `Insert` writes a literal `0` twice). The per-instance `_writeGate` added under Configuration-005 serialises one instance's find-then-insert but does nothing about this cross-instance global-mapper hazard. Reproduced deterministically under thread-pool contention (the Configuration-005 regression test `PutAsync_concurrent_…` was intermittently failing in the full suite for exactly this reason). This is a real runtime fault, not test-only: the resilient-read fallback path and the seal path can run concurrently in a live node.
**Recommendation:** Give each `LiteDatabase` a private `BsonMapper` with the entity pre-registered (`mapper.Entity<GenerationSnapshot>()`) so member resolution happens once, single-threaded, at construction and never touches the global mapper.
**Resolution:** Resolved 2026-06-19 — added a private `BuildMapper()` to both `LiteDbConfigCache` and `GenerationSealedCache` that constructs a fresh `BsonMapper`, calls `mapper.Entity<GenerationSnapshot>()`, and passes it to every `new LiteDatabase(...)` (construction, seal-write, sealed-read). Added regression test `LiteDbConfigCacheTests.Concurrent_cache_instances_do_not_race_the_shared_bson_mapper` (24 cache instances × 16 concurrent puts each); confirmed it fails on the unfixed code (`Member ClusterId not found on BsonMapper`) and passes on the fixed code, and the previously-flaky `PutAsync_concurrent_…` is now stable across repeated full-suite runs.
### Configuration-013
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:243` (`ValidateClusterTopology`) |
| Status | Open |
**Description:** `DraftValidator.ValidateClusterTopology` is documented as the managed pre-publish guard that catches cluster-topology drift the SQL `CK_ServerCluster_RedundancyMode_NodeCount` check cannot see — specifically an operator disabling a `ClusterNode` (effective enabled-count = 1) while `RedundancyMode` stays `Hot`/`Warm`, which would boot the runtime into an invalid-topology band. It is fully unit-tested (`DraftValidatorTests` §"ValidateClusterTopology") but **no production code calls it.** The deploy gate in `AdminOperationsActor.StartDeployment` runs `DraftValidator.Validate(...)` (the snapshot rules) but never `ValidateClusterTopology(...)`, so the documented enabled-node-count guard is inert at deploy time — the only thing standing is the row-level SQL CHECK, which the doc explicitly says is insufficient.
**Recommendation:** Wire `ValidateClusterTopology` into the deploy/publish path — load the `ServerCluster` row(s) + their `ClusterNode`s and run it alongside `Validate`, folding its errors into the same reject summary. The fix belongs in `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (a different module), so it is **deferred from this module's edit scope** and recorded here against the now-dead Configuration-layer method. Cross-module: ControlPlane.
**Resolution:** _(open — fix is in the ControlPlane module's `AdminOperationsActor`, outside Configuration's edit scope)_
### Configuration-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Services/ILdapGroupRoleMappingService.cs:23`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Services/LdapGroupRoleMappingService.cs:24` |
| Status | Open |
**Description:** `ILdapGroupRoleMappingService.GetByGroupsAsync`'s XML doc asserts "Case-insensitive per LDAP conventions", but the implementation is `db.LdapGroupRoleMappings.Where(m => groupSet.Contains(m.LdapGroup))`, which translates to a SQL `IN (…)` whose case-sensitivity is entirely determined by the column's collation. On the default case-insensitive server collation (the dev/Docker SQL) the doc holds; on a case-sensitive-collation deployment the lookup would silently miss rows that differ only in case — and the in-memory EF unit tests (which match exact case) would not catch it. This is a hot path: it fires on every Admin-UI sign-in, so a silent miss denies the user their role grant. The doc overstates a guarantee the code does not enforce.
**Recommendation:** Either (a) soften the doc to state the match is collation-dependent and document the required CI collation as a deployment constraint, or (b) pin the `LdapGroup` column to an explicit `_CI_` collation in the model so the guarantee is enforced regardless of server default. Option (b) is a schema/migration change and must be deferred (no migration edits in this review); option (a) is a one-line doc change but only papers over the gap. Left **Open** pending a decision on which guarantee to commit to — no code change applied because the documented behaviour currently matches the deployment reality (CI server collation).
**Resolution:** _(open — preferred fix (b) pins the column collation, which needs an EF migration; deferred per the no-migration rule)_