c3d148e396
Re-review at 7286d320. Configuration-012 (High): LiteDbConfigCache/GenerationSealedCache built
LiteDatabase on the process-wide BsonMapper.Global whose lazy member resolution races across
concurrently-constructed DBs (NotSupportedException/duplicate-key under contention; also caused
intermittent suite flakiness). Fix: per-cache fresh BsonMapper + pre-registered entity + TDD.
-013 (dead ValidateClusterTopology, ControlPlane) / -014 (collation case-sensitivity, needs
migration) deferred. No migration touched.
257 lines
26 KiB
Markdown
257 lines
26 KiB
Markdown
# Code Review — Configuration
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration` |
|
||
| Reviewer | Claude Code |
|
||
| Review date | 2026-06-19 (re-review; first reviewed 2026-05-22) |
|
||
| Commit reviewed | `7286d320` (re-review; was `76d35d1`) |
|
||
| Status | Reviewed |
|
||
| Open findings | 2 |
|
||
|
||
## Checklist coverage
|
||
|
||
| # | Category | Result |
|
||
|---|---|---|
|
||
| 1 | Correctness & logic bugs | Configuration-001, Configuration-002, Configuration-003 |
|
||
| 2 | OtOpcUa conventions | Configuration-004 |
|
||
| 3 | Concurrency & thread safety | Configuration-005 |
|
||
| 4 | Error handling & resilience | Configuration-006, Configuration-007 |
|
||
| 5 | Security | Configuration-008, Configuration-009, Configuration-010 |
|
||
| 6 | Performance & resource management | No issues found |
|
||
| 7 | Design-document adherence | No issues found |
|
||
| 8 | Code organization & conventions | No issues found |
|
||
| 9 | Testing coverage | Configuration-011 |
|
||
| 10 | Documentation & comments | No issues found |
|
||
|
||
## Findings
|
||
|
||
### Configuration-001
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | High |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `sp_PublishGeneration` invokes `EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;` and then continues unconditionally to the reservation MERGE and the `Status='Published'` update. `sp_ValidateDraft` signals every failure with `RAISERROR(..., 16, 1)` followed by `RETURN`. A severity-16 `RAISERROR` is not a batch-aborting error and `SET XACT_ABORT ON` does not abort the transaction for it, so control returns to `sp_PublishGeneration`, which publishes the draft even though validation rejected it (cross-cluster namespace binding, dangling tag FKs, duplicate external identifiers, EquipmentUuid immutability all pass through). Pre-publish validation is effectively bypassed.
|
||
|
||
**Recommendation:** Wrap the `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END TRY BEGIN CATCH ROLLBACK; THROW; END CATCH` so the validation `RAISERROR` propagates and aborts the publish, or have `sp_ValidateDraft` return a result-set/output parameter that `sp_PublishGeneration` inspects and explicitly rolls back on. Add a regression test that publishes a draft with a known violation and asserts it stays `Draft`.
|
||
|
||
**Resolution:** Resolved 2026-05-22 — wrapped the `EXEC dbo.sp_ValidateDraft` call in `sp_PublishGeneration` in a `BEGIN TRY ... BEGIN CATCH ROLLBACK; THROW; END CATCH` block so a validation `RAISERROR` rolls back the publish transaction and re-raises instead of being silently ignored; added DB-backed regression test `Publish_aborts_when_ValidateDraft_rejects_the_draft`.
|
||
|
||
### Configuration-002
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `ROLLBACK`. A bare `ROLLBACK` rolls back to the outermost transaction and sets `@@TRANCOUNT` to 0, so when `sp_RollbackToGeneration` later reaches its own `COMMIT` it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.
|
||
|
||
**Recommendation:** Make `sp_PublishGeneration` transaction-nesting aware: capture `@@TRANCOUNT` on entry, only `BEGIN TRANSACTION` when zero (otherwise `SAVE TRANSACTION`), and only `COMMIT`/`ROLLBACK` the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.
|
||
|
||
**Resolution:** Resolved 2026-05-22 — made `sp_PublishGeneration` transaction-nesting aware: captures `@@TRANCOUNT` on entry, issues `BEGIN TRANSACTION` when zero or `SAVE TRANSACTION sp_PublishGeneration` when nested, and uses `ROLLBACK TRANSACTION sp_PublishGeneration` (savepoint rollback) on all failure paths in the nested case so the caller's outer transaction is not wiped; also wrapped `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END CATCH` so validation errors propagate correctly.
|
||
|
||
### Configuration-003
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail `PathTooLong`. The check also silently `continue`s when an equipment's `UnsLineId`/`UnsAreaId` does not resolve, so an orphaned-line path is never length-checked.
|
||
|
||
**Recommendation:** Pass the actual `Enterprise` and `Site` strings into the validator (e.g. on `DraftSnapshot`, or as parameters alongside `ValidateClusterTopology`) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.
|
||
|
||
**Resolution:** Resolved 2026-05-22 — added nullable `Enterprise` and `Site` properties to `DraftSnapshot`; `ValidatePathLength` uses actual lengths when set and falls back to the conservative 32-char upper bound per segment with a comment explaining the trade-off; `DraftValidationService` now loads the cluster row and populates both properties; added `PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` unit tests.
|
||
|
||
### Configuration-004
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | OtOpcUa conventions |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint`→`int` conversion. Only bits 0–11 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a `uint` value that overflows `int` and the conversion round-trip would corrupt it.
|
||
|
||
**Recommendation:** Change the enum underlying type to `int` (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.
|
||
|
||
**Resolution:** Resolved 2026-05-23 — changed `NodePermissions` underlying type from `uint` to `int` so it matches the documented "stored as int" semantics and the `HasConversion<int>()` mapping in `OtOpcUaConfigDbContext.ConfigureNodeAcl`. Added regression test `NodePermissionsTests` pinning the underlying type and round-trip safety through `int` storage.
|
||
|
||
### Configuration-005
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Concurrency & thread safety |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructor's `EnsureIndex` calls are non-unique, so the storage layer does not prevent the duplicate, and `PruneOldGenerationsAsync`'s `keepLatest` accounting is then off.
|
||
|
||
**Recommendation:** Declare a unique index on `(ClusterId, GenerationId)` and treat the duplicate-key exception as an idempotent no-op, or guard `PutAsync` with an instance `SemaphoreSlim`/lock. Document the concurrency contract on `ILocalConfigCache`.
|
||
|
||
**Resolution:** Resolved 2026-05-23 — guarded `PutAsync` with an instance-level `SemaphoreSlim` so the find-then-insert/update window runs atomically for a given cache instance; documented the concurrency contract on `ILocalConfigCache`. Added regression test `PutAsync_concurrent_for_same_cluster_and_generation_does_not_duplicate` that runs 64 concurrent puts and inspects the LiteDB file directly to confirm exactly one row per `(ClusterId, GenerationId)` survives.
|
||
|
||
### Configuration-006
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Error handling & resilience |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry `ShouldHandle` predicate has the same shape, so command-timeout cancellations are also not retried consistently.
|
||
|
||
**Recommendation:** Distinguish caller cancellation from command-timeout cancellation explicitly: inspect `cancellationToken.IsCancellationRequested` to decide whether an `OperationCanceledException` is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a `TimeoutRejectedException` path and a command-timeout `TaskCanceledException` path asserting cache fallback occurs.
|
||
|
||
**Resolution:** Resolved 2026-05-22 — changed the fallback `catch` filter to `ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested` so a command-timeout `TaskCanceledException` (caller token not cancelled) triggers cache fallback while genuine caller cancellation still propagates; changed the retry `ShouldHandle` predicate to `Handle<Exception>()` (handles all exceptions, relying on Polly's own cancellation-token check to stop retrying on genuine cancellation); added three unit tests: `CommandTimeout_TaskCanceledException_FallsBackToCache`, `PollyTimeout_TimeoutRejectedException_FallsBackToCache`, and `CallerCancellation_Propagates_NotFallback`.
|
||
|
||
### Configuration-007
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Error handling & resilience |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.
|
||
|
||
**Recommendation:** Rethrow `OperationCanceledException` rather than recording it as an entity error; call `ct.ThrowIfCancellationRequested()` between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.
|
||
|
||
**Resolution:** Resolved 2026-05-23 — added `catch (OperationCanceledException) when (ct.IsCancellationRequested) { throw; }` ahead of the generic catch in `ApplyPass` so genuine caller cancellation propagates rather than being recorded as an entity error, and added a `ct.ThrowIfCancellationRequested()` at the top of each Added/Modified pass iteration. The "failed Removed pass keeps walking Added/Modified" behaviour was confirmed as the intended contract (cascades must settle) and pinned by `Apply_continues_to_Added_pass_when_a_Removed_callback_throws`. New regression tests: `Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled`, `Apply_stops_between_passes_when_cancellation_requested`.
|
||
|
||
### Configuration-008
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | High |
|
||
| Category | Security |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150`, `:373`, `:468` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Three stored procedures build `ConfigAuditLog.DetailsJson` by raw string concatenation of caller-supplied `nvarchar` parameters: `sp_RegisterNodeGenerationApplied` (`@Status`), `sp_RollbackToGeneration` (`@TargetGenerationId`), `sp_ReleaseExternalIdReservation` (`@Kind`, `@Value`). A value with a double-quote or backslash produces malformed JSON; combined with the `CK_ConfigAuditLog_DetailsJson_IsJson` check constraint, the `INSERT` fails the constraint and aborts the surrounding publish/rollback transaction (denial of operation). It is also a JSON-injection vector that can silently rewrite the audit record's shape.
|
||
|
||
**Recommendation:** Build the JSON with a safe constructor (`FOR JSON PATH, WITHOUT_ARRAY_WRAPPER` or `JSON_OBJECT(...)` on SQL Server 2022+) so values are properly escaped, or run each interpolated value through `STRING_ESCAPE(@Value, 'json')`. Add tests with quote/backslash-containing inputs.
|
||
|
||
**Resolution:** Resolved 2026-05-22 — routed every caller-supplied string interpolated into `DetailsJson` through `STRING_ESCAPE(@x, 'json')` (`@Status` in `sp_RegisterNodeGenerationApplied`; `@Kind`/`@Value` in `sp_ReleaseExternalIdReservation`) and emitted `sp_RollbackToGeneration`'s `@TargetGenerationId` as a bare JSON number via explicit `CONVERT(nvarchar(20), CONVERT(bigint, ...))`; added DB-backed regression tests `RegisterNodeGenerationApplied_escapes_quotes_in_audit_DetailsJson` and `ReleaseReservation_escapes_quotes_in_audit_DetailsJson` that round-trip quote/backslash inputs through `ISJSON`/`JSON_VALUE`.
|
||
|
||
### Configuration-009
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Security |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants `sa` to anyone with repo access. `TrustServerCertificate=True` plus `Encrypt=False` additionally disables transport protection for that connection.
|
||
|
||
**Recommendation:** Drop the embedded credential. Fall back to integrated auth (`Trusted_Connection=True`) or fail fast with a message instructing the developer to set `OTOPCUA_CONFIG_CONNECTION`. Rotate the dev `sa` password if this value is live.
|
||
|
||
**Resolution:** Resolved 2026-05-22 — removed the embedded `sa` password and `DefaultConnectionString` constant entirely; `CreateDbContext` now throws `InvalidOperationException` with a clear setup message when `OTOPCUA_CONFIG_CONNECTION` is not set, rather than silently falling back to a hardcoded credential; added XML-doc example showing the recommended integrated-auth connection string.
|
||
|
||
### Configuration-010
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Security |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.
|
||
|
||
**Recommendation:** Log `ex.GetType().Name` and `ex.Message` for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.
|
||
|
||
**Resolution:** Resolved 2026-05-23 — stopped passing the raw exception object to `LogWarning`; the fallback log now records only `ex.GetType().Name` and a `ScrubSecrets`-redacted `ex.Message` so connection-string fragments (Password, User Id, Pwd, Uid, AccessToken, Authorization, ApiKey/Api-Key) are stripped before reaching any sink. Added regression test `FallbackWarning_does_not_log_full_exception_object_or_password_fragment` that captures emitted log records and asserts no raw exception attached and no credential keyword present in the rendered message.
|
||
|
||
### Configuration-011
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) `ValidatePathLength`'s constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.
|
||
|
||
**Recommendation:** Add `GenerationApplierTests` cases for a throwing callback (assert error recorded, assert cancellation propagates) and a `DraftValidatorTests` path-length boundary case. Add a `StoredProceduresTests` case that publishes an invalid draft and asserts it stays `Draft`.
|
||
|
||
**Resolution:** Resolved 2026-05-23 — all three gaps now covered. (a) `GenerationApplierTests.Apply_continues_to_Added_pass_when_a_Removed_callback_throws` pins ordering; `Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled` and `Apply_stops_between_passes_when_cancellation_requested` (added under Configuration-007) pin cancellation. (b) `DraftValidatorTests.PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` (added under Configuration-003) pin the path-length boundary. (c) `StoredProceduresTests.Publish_aborts_when_ValidateDraft_rejects_the_draft` (added under Configuration-001) pins the publish-bypasses-validation regression against the live SQL fixture.
|
||
|
||
## Re-review 2026-06-19 (commit 7286d320)
|
||
|
||
Re-reviewed the full non-migration C# data layer at `7286d320` (the Configuration source is byte-identical between `7286d320` and the working HEAD `8ac5a2db` — `git diff 7286d320 HEAD` shows no non-migration Configuration changes). Migration files remain off-limits and were not touched.
|
||
|
||
Prior findings Configuration-001…011 remain Resolved. Notable since the first review: the v1 generation model and its stored-proc publish path are retired; `DraftValidationService` (named in the Configuration-003 resolution) is replaced by `DraftSnapshotFactory`, which deliberately leaves `Enterprise`/`Site` null so `ValidatePathLength` uses its conservative upper bound — an intentional, documented fallback, not a regression. The DB-backed `StoredProceduresTests` referenced by the Configuration-001/008/011 resolutions no longer exist in the test project (consolidated out when the generation model dropped); the live SQL `SchemaComplianceTests`, `DriverHostStatusTests`, and `AuthorizationTests` still run against the Docker SQL fixture and pass.
|
||
|
||
| # | Category | Result |
|
||
|---|---|---|
|
||
| 1 | Correctness & logic bugs | Configuration-012 |
|
||
| 2 | OtOpcUa conventions | No issues found |
|
||
| 3 | Concurrency & thread safety | Configuration-012 |
|
||
| 4 | Error handling & resilience | No issues found (Configuration-006/007 fixes verified still in place) |
|
||
| 5 | Security | No issues found (Configuration-008/009/010 fixes verified still in place; `DesignTimeDbContextFactory` carries no credential; `ResilientConfigReader.ScrubSecrets` intact) |
|
||
| 6 | Performance & resource management | No issues found (factory + query paths use `AsNoTracking`; cluster audit query is a bounded two-step, not N+1) |
|
||
| 7 | Design-document adherence | Configuration-013 |
|
||
| 8 | Code organization & conventions | No issues found |
|
||
| 9 | Testing coverage | Configuration-013 (dead-code path untested in prod wiring), Configuration-014 |
|
||
| 10 | Documentation & comments | Configuration-014 |
|
||
|
||
### Configuration-012
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | High |
|
||
| Category | Concurrency & thread safety |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:31`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs:71`, `:129` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Both LiteDB-backed caches constructed their `LiteDatabase` with the default constructor, which binds the process-wide singleton `BsonMapper.Global`. LiteDB resolves a POCO's members into that mapper **lazily on first use and without thread-safety across concurrently-constructed `LiteDatabase` instances**. When two caches initialise or run in parallel — e.g. a `GenerationSealedCache.SealAsync` racing a `ReadCurrentAsync`, or `LiteDbConfigCache` racing the sealed cache, or simply the test suite running both classes' fixtures in parallel — the global mapper races, surfacing as `NotSupportedException: Member ClusterId not found on BsonMapper for type GenerationSnapshot` (the query predicate can't resolve the member) or `LiteException: Cannot insert duplicate key in unique index '_id'. The duplicate value is '0'` (the `int Id` auto-id mapping was lost, so `Insert` writes a literal `0` twice). The per-instance `_writeGate` added under Configuration-005 serialises one instance's find-then-insert but does nothing about this cross-instance global-mapper hazard. Reproduced deterministically under thread-pool contention (the Configuration-005 regression test `PutAsync_concurrent_…` was intermittently failing in the full suite for exactly this reason). This is a real runtime fault, not test-only: the resilient-read fallback path and the seal path can run concurrently in a live node.
|
||
|
||
**Recommendation:** Give each `LiteDatabase` a private `BsonMapper` with the entity pre-registered (`mapper.Entity<GenerationSnapshot>()`) so member resolution happens once, single-threaded, at construction and never touches the global mapper.
|
||
|
||
**Resolution:** Resolved 2026-06-19 — added a private `BuildMapper()` to both `LiteDbConfigCache` and `GenerationSealedCache` that constructs a fresh `BsonMapper`, calls `mapper.Entity<GenerationSnapshot>()`, and passes it to every `new LiteDatabase(...)` (construction, seal-write, sealed-read). Added regression test `LiteDbConfigCacheTests.Concurrent_cache_instances_do_not_race_the_shared_bson_mapper` (24 cache instances × 16 concurrent puts each); confirmed it fails on the unfixed code (`Member ClusterId not found on BsonMapper`) and passes on the fixed code, and the previously-flaky `PutAsync_concurrent_…` is now stable across repeated full-suite runs.
|
||
|
||
### Configuration-013
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Design-document adherence |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:243` (`ValidateClusterTopology`) |
|
||
| Status | Open |
|
||
|
||
**Description:** `DraftValidator.ValidateClusterTopology` is documented as the managed pre-publish guard that catches cluster-topology drift the SQL `CK_ServerCluster_RedundancyMode_NodeCount` check cannot see — specifically an operator disabling a `ClusterNode` (effective enabled-count = 1) while `RedundancyMode` stays `Hot`/`Warm`, which would boot the runtime into an invalid-topology band. It is fully unit-tested (`DraftValidatorTests` §"ValidateClusterTopology") but **no production code calls it.** The deploy gate in `AdminOperationsActor.StartDeployment` runs `DraftValidator.Validate(...)` (the snapshot rules) but never `ValidateClusterTopology(...)`, so the documented enabled-node-count guard is inert at deploy time — the only thing standing is the row-level SQL CHECK, which the doc explicitly says is insufficient.
|
||
|
||
**Recommendation:** Wire `ValidateClusterTopology` into the deploy/publish path — load the `ServerCluster` row(s) + their `ClusterNode`s and run it alongside `Validate`, folding its errors into the same reject summary. The fix belongs in `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (a different module), so it is **deferred from this module's edit scope** and recorded here against the now-dead Configuration-layer method. Cross-module: ControlPlane.
|
||
|
||
**Resolution:** _(open — fix is in the ControlPlane module's `AdminOperationsActor`, outside Configuration's edit scope)_
|
||
|
||
### Configuration-014
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Documentation & comments |
|
||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Services/ILdapGroupRoleMappingService.cs:23`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Services/LdapGroupRoleMappingService.cs:24` |
|
||
| Status | Open |
|
||
|
||
**Description:** `ILdapGroupRoleMappingService.GetByGroupsAsync`'s XML doc asserts "Case-insensitive per LDAP conventions", but the implementation is `db.LdapGroupRoleMappings.Where(m => groupSet.Contains(m.LdapGroup))`, which translates to a SQL `IN (…)` whose case-sensitivity is entirely determined by the column's collation. On the default case-insensitive server collation (the dev/Docker SQL) the doc holds; on a case-sensitive-collation deployment the lookup would silently miss rows that differ only in case — and the in-memory EF unit tests (which match exact case) would not catch it. This is a hot path: it fires on every Admin-UI sign-in, so a silent miss denies the user their role grant. The doc overstates a guarantee the code does not enforce.
|
||
|
||
**Recommendation:** Either (a) soften the doc to state the match is collation-dependent and document the required CI collation as a deployment constraint, or (b) pin the `LdapGroup` column to an explicit `_CI_` collation in the model so the guarantee is enforced regardless of server default. Option (b) is a schema/migration change and must be deferred (no migration edits in this review); option (a) is a one-line doc change but only papers over the gap. Left **Open** pending a decision on which guarantee to commit to — no code change applied because the documented behaviour currently matches the deployment reality (CI server collation).
|
||
|
||
**Resolution:** _(open — preferred fix (b) pins the column collation, which needs an EF migration; deferred per the no-migration rule)_
|