From c126fc7a7dcd2009cf2eb8b6cb057ec86ee79102 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Fri, 22 May 2026 08:13:27 -0400 Subject: [PATCH] fix(configuration): resolve Medium code-review findings (Configuration-002, -003, -006, -009) Configuration-002: sp_PublishGeneration is transaction-nesting aware (BEGIN TRANSACTION vs SAVE TRANSACTION on @@TRANCOUNT) so a caller's outer transaction survives a publish failure; sp_ValidateDraft wrapped in TRY/CATCH. Configuration-003: ValidatePathLength uses the cluster's actual Enterprise/Site lengths when available, falling back to the conservative approximation. Configuration-006: ResilientConfigReader treats a command-timeout TaskCanceledException as a fault (not caller cancellation) and falls back. Configuration-009: removed the checked-in plaintext sa connection string; CreateDbContext now requires OTOPCUA_CONFIG_CONNECTION. Co-Authored-By: Claude Opus 4.7 (1M context) --- code-reviews/Configuration/findings.md | 18 ++-- .../DesignTimeDbContextFactory.cs | 24 +++-- .../LocalCache/ResilientConfigReader.cs | 14 ++- .../20260417215224_StoredProcedures.cs | 34 +++++-- .../Validation/DraftSnapshot.cs | 12 +++ .../Validation/DraftValidator.cs | 12 ++- .../Services/DraftValidationService.cs | 6 ++ .../DraftValidatorTests.cs | 86 ++++++++++++++++ .../ResilientConfigReaderTests.cs | 99 +++++++++++++++++++ 9 files changed, 274 insertions(+), 31 deletions(-) diff --git a/code-reviews/Configuration/findings.md b/code-reviews/Configuration/findings.md index ac2ece9..9159ebb 100644 --- a/code-reviews/Configuration/findings.md +++ b/code-reviews/Configuration/findings.md @@ -7,7 +7,7 @@ | Review date | 2026-05-22 | | Commit reviewed | `76d35d1` | | Status | Reviewed | -| Open findings | 9 | +| Open findings | 5 | ## Checklist coverage @@ -48,13 +48,13 @@ | Severity | Medium | | Category | Correctness & logic bugs | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` | -| Status | Open | +| Status | Resolved | **Description:** `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `ROLLBACK`. A bare `ROLLBACK` rolls back to the outermost transaction and sets `@@TRANCOUNT` to 0, so when `sp_RollbackToGeneration` later reaches its own `COMMIT` it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure. **Recommendation:** Make `sp_PublishGeneration` transaction-nesting aware: capture `@@TRANCOUNT` on entry, only `BEGIN TRANSACTION` when zero (otherwise `SAVE TRANSACTION`), and only `COMMIT`/`ROLLBACK` the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — made `sp_PublishGeneration` transaction-nesting aware: captures `@@TRANCOUNT` on entry, issues `BEGIN TRANSACTION` when zero or `SAVE TRANSACTION sp_PublishGeneration` when nested, and uses `ROLLBACK TRANSACTION sp_PublishGeneration` (savepoint rollback) on all failure paths in the nested case so the caller's outer transaction is not wiped; also wrapped `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END CATCH` so validation errors propagate correctly. ### Configuration-003 @@ -63,13 +63,13 @@ | Severity | Medium | | Category | Correctness & logic bugs | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` | -| Status | Open | +| Status | Resolved | **Description:** `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail `PathTooLong`. The check also silently `continue`s when an equipment's `UnsLineId`/`UnsAreaId` does not resolve, so an orphaned-line path is never length-checked. **Recommendation:** Pass the actual `Enterprise` and `Site` strings into the validator (e.g. on `DraftSnapshot`, or as parameters alongside `ValidateClusterTopology`) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — added nullable `Enterprise` and `Site` properties to `DraftSnapshot`; `ValidatePathLength` uses actual lengths when set and falls back to the conservative 32-char upper bound per segment with a comment explaining the trade-off; `DraftValidationService` now loads the cluster row and populates both properties; added `PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` unit tests. ### Configuration-004 @@ -108,13 +108,13 @@ | Severity | Medium | | Category | Error handling & resilience | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` | -| Status | Open | +| Status | Resolved | **Description:** The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry `ShouldHandle` predicate has the same shape, so command-timeout cancellations are also not retried consistently. **Recommendation:** Distinguish caller cancellation from command-timeout cancellation explicitly: inspect `cancellationToken.IsCancellationRequested` to decide whether an `OperationCanceledException` is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a `TimeoutRejectedException` path and a command-timeout `TaskCanceledException` path asserting cache fallback occurs. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — changed the fallback `catch` filter to `ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested` so a command-timeout `TaskCanceledException` (caller token not cancelled) triggers cache fallback while genuine caller cancellation still propagates; changed the retry `ShouldHandle` predicate to `Handle()` (handles all exceptions, relying on Polly's own cancellation-token check to stop retrying on genuine cancellation); added three unit tests: `CommandTimeout_TaskCanceledException_FallsBackToCache`, `PollyTimeout_TimeoutRejectedException_FallsBackToCache`, and `CallerCancellation_Propagates_NotFallback`. ### Configuration-007 @@ -153,13 +153,13 @@ | Severity | Medium | | Category | Security | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` | -| Status | Open | +| Status | Resolved | **Description:** `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants `sa` to anyone with repo access. `TrustServerCertificate=True` plus `Encrypt=False` additionally disables transport protection for that connection. **Recommendation:** Drop the embedded credential. Fall back to integrated auth (`Trusted_Connection=True`) or fail fast with a message instructing the developer to set `OTOPCUA_CONFIG_CONNECTION`. Rotate the dev `sa` password if this value is live. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — removed the embedded `sa` password and `DefaultConnectionString` constant entirely; `CreateDbContext` now throws `InvalidOperationException` with a clear setup message when `OTOPCUA_CONFIG_CONNECTION` is not set, rather than silently falling back to a hardcoded credential; added XML-doc example showing the recommended integrated-auth connection string. ### Configuration-010 diff --git a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs index a5707a3..402a0f0 100644 --- a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs +++ b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs @@ -5,19 +5,27 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration; /// /// Used by dotnet ef at design time (migrations, scaffolding). Reads the connection string -/// from the OTOPCUA_CONFIG_CONNECTION environment variable, falling back to the local dev -/// container on localhost:1433. +/// from the OTOPCUA_CONFIG_CONNECTION environment variable. /// +/// +/// Set the variable before running migration commands, e.g.: +/// +/// $env:OTOPCUA_CONFIG_CONNECTION = "Server=10.100.0.35,14330;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True;" +/// dotnet ef migrations add MyMigration --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration +/// +/// No credential is embedded in source. Do not add a plaintext password as a fallback. +/// public sealed class DesignTimeDbContextFactory : IDesignTimeDbContextFactory { - // Host-port 14330 avoids collision with the native MSSQL14 instance on 1433 (Galaxy "ZB" DB). - private const string DefaultConnectionString = - "Server=localhost,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;"; - public OtOpcUaConfigDbContext CreateDbContext(string[] args) { - var connection = Environment.GetEnvironmentVariable("OTOPCUA_CONFIG_CONNECTION") - ?? DefaultConnectionString; + var connection = Environment.GetEnvironmentVariable("OTOPCUA_CONFIG_CONNECTION"); + + if (string.IsNullOrWhiteSpace(connection)) + throw new InvalidOperationException( + "OTOPCUA_CONFIG_CONNECTION is not set. " + + "Export the variable before running 'dotnet ef' commands. Example: " + + "Server=10.100.0.35,14330;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True;"); var options = new DbContextOptionsBuilder() .UseSqlServer(connection, sql => sql.MigrationsAssembly(typeof(OtOpcUaConfigDbContext).Assembly.FullName)) diff --git a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs index 7f81378..f8e8cc0 100644 --- a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs +++ b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs @@ -48,7 +48,13 @@ public sealed class ResilientConfigReader UseJitter = true, Delay = TimeSpan.FromMilliseconds(100), MaxDelay = TimeSpan.FromSeconds(1), - ShouldHandle = new PredicateBuilder().Handle(ex => ex is not OperationCanceledException), + // Handle ALL exceptions including OperationCanceledException. A SQL command-level + // timeout surfaces as TaskCanceledException (derives from OperationCanceledException) + // when the caller's token is NOT cancelled, and must be retried just like any other + // transient error. Polly itself checks the cancellation token between retries and + // stops with OperationCanceledException on genuine caller cancellation regardless of + // this predicate. + ShouldHandle = new PredicateBuilder().Handle(), }); } @@ -76,7 +82,11 @@ public sealed class ResilientConfigReader _staleFlag.MarkFresh(); return result; } - catch (Exception ex) when (ex is not OperationCanceledException) + // Catch all exceptions that are NOT genuine caller cancellations. A SQL command-level + // timeout surfaces as TaskCanceledException (derives from OperationCanceledException) + // but the caller's token is NOT cancelled — we must fall back to the sealed cache for + // that case, not propagate. Only rethrow if the caller actually requested cancellation. + catch (Exception ex) when (ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested) { _logger.LogWarning(ex, "Central-DB read failed after retries; falling back to sealed cache for cluster {ClusterId}", clusterId); // GenerationCacheUnavailableException surfaces intentionally — fails the caller's diff --git a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs index 8646767..66c5bc9 100644 --- a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs +++ b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs @@ -270,7 +270,22 @@ BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; - BEGIN TRANSACTION; + -- Transaction-nesting awareness: if a caller (e.g. sp_RollbackToGeneration) already + -- holds a transaction, we use SAVE TRANSACTION so our failure path rolls back only to + -- the savepoint instead of issuing a bare ROLLBACK that wipes the caller's transaction + -- (which sets @@TRANCOUNT = 0 and causes error 3902 on the caller's subsequent COMMIT). + DECLARE @OwnsTxn bit = 0; + DECLARE @SaveName nvarchar(32) = N'sp_PublishGeneration'; + + IF @@TRANCOUNT = 0 + BEGIN + BEGIN TRANSACTION; + SET @OwnsTxn = 1; + END + ELSE + BEGIN + SAVE TRANSACTION sp_PublishGeneration; + END DECLARE @Lock nvarchar(255) = N'OtOpcUa_Publish_' + @ClusterId; DECLARE @LockResult int; @@ -278,20 +293,22 @@ BEGIN IF @LockResult < 0 BEGIN RAISERROR('PublishConflict: another publish is in progress for cluster %s', 16, 1, @ClusterId); - ROLLBACK; + IF @OwnsTxn = 1 ROLLBACK; + ELSE ROLLBACK TRANSACTION sp_PublishGeneration; RETURN; END -- sp_ValidateDraft signals every rejection with RAISERROR(..., 16, 1) — a severity-16 error is -- NOT batch-aborting and SET XACT_ABORT ON does not abort the transaction for it, so without a -- TRY/CATCH control would return here and the draft would publish despite failed validation. - -- Catch the validation error, roll back the publish transaction, and re-raise so the caller sees - -- the real validation failure. + -- Catch the validation error, roll back the publish transaction (only to our savepoint when a + -- caller owns the outer transaction), and re-raise so the caller sees the real validation failure. BEGIN TRY EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId; END TRY BEGIN CATCH - IF @@TRANCOUNT > 0 ROLLBACK; + IF @OwnsTxn = 1 ROLLBACK; + ELSE ROLLBACK TRANSACTION sp_PublishGeneration; THROW; END CATCH @@ -324,15 +341,16 @@ BEGIN IF @@ROWCOUNT = 0 BEGIN - RAISERROR('Draft %I64d for cluster %s not found (was it already published?)', 16, 1, @DraftGenerationId, @ClusterId); - ROLLBACK; + RAISERROR('Draft %I64d for cluster %s not in Draft status (was it already published?)', 16, 1, @DraftGenerationId, @ClusterId); + IF @OwnsTxn = 1 ROLLBACK; + ELSE ROLLBACK TRANSACTION sp_PublishGeneration; RETURN; END INSERT dbo.ConfigAuditLog (Principal, EventType, ClusterId, GenerationId) VALUES (SUSER_SNAME(), 'Published', @ClusterId, @DraftGenerationId); - COMMIT; + IF @OwnsTxn = 1 COMMIT; END "; diff --git a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftSnapshot.cs b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftSnapshot.cs index 9de011c..5679b5d 100644 --- a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftSnapshot.cs +++ b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftSnapshot.cs @@ -11,6 +11,18 @@ public sealed class DraftSnapshot public required long GenerationId { get; init; } public required string ClusterId { get; init; } + /// + /// Cluster's Enterprise segment (UNS level 1). When set, uses + /// the actual length for path-length checks instead of a conservative 32-char upper bound. + /// + public string? Enterprise { get; init; } + + /// + /// Cluster's Site segment (UNS level 2). When set, uses the + /// actual length for path-length checks instead of a conservative 32-char upper bound. + /// + public string? Site { get; init; } + public IReadOnlyList Namespaces { get; init; } = []; public IReadOnlyList DriverInstances { get; init; } = []; public IReadOnlyList Devices { get; init; } = []; diff --git a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs index 194e4d7..3be8186 100644 --- a/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs +++ b/src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs @@ -59,8 +59,13 @@ public static class DraftValidator /// Cluster.Enterprise + Site + area + line + equipment + 4 slashes ≤ 200 chars. private static void ValidatePathLength(DraftSnapshot draft, List errors) { - // The cluster row isn't in the snapshot — we assume caller pre-validated Enterprise+Site - // length and bound them as constants <= 64 chars each. Here we validate the dynamic portion. + // Use actual Enterprise/Site lengths when the snapshot carries them (populated by + // DraftValidationService from the ServerCluster row). Fall back to a conservative + // 32-char upper bound per segment when not supplied — over-penalises short values + // but never under-penalises long ones, which is acceptable for the fallback case. + var enterpriseLen = draft.Enterprise?.Length ?? 32; + var siteLen = draft.Site?.Length ?? 32; + var areaById = draft.UnsAreas.ToDictionary(a => a.UnsAreaId); var lineById = draft.UnsLines.ToDictionary(l => l.UnsLineId); @@ -69,8 +74,7 @@ public static class DraftValidator if (!lineById.TryGetValue(eq.UnsLineId!, out var line)) continue; if (!areaById.TryGetValue(line.UnsAreaId, out var area)) continue; - // rough upper bound: Enterprise+Site at most 32+32; add dynamic segments + 4 slashes - var len = 32 + 32 + area.Name.Length + line.Name.Length + eq.Name.Length + 4; + var len = enterpriseLen + siteLen + area.Name.Length + line.Name.Length + eq.Name.Length + 4; if (len > MaxPathLength) errors.Add(new("PathTooLong", $"Equipment path exceeds {MaxPathLength} chars (approx {len})", diff --git a/src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/DraftValidationService.cs b/src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/DraftValidationService.cs index 1aba5ba..3843f10 100644 --- a/src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/DraftValidationService.cs +++ b/src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/DraftValidationService.cs @@ -18,10 +18,16 @@ public sealed class DraftValidationService(OtOpcUaConfigDbContext db) .FirstOrDefaultAsync(g => g.GenerationId == draftId, ct) ?? throw new InvalidOperationException($"Draft {draftId} not found"); + // Load the cluster row so path-length validation uses actual Enterprise/Site lengths. + var cluster = await db.ServerClusters.AsNoTracking() + .FirstOrDefaultAsync(c => c.ClusterId == draft.ClusterId, ct); + var snapshot = new DraftSnapshot { GenerationId = draft.GenerationId, ClusterId = draft.ClusterId, + Enterprise = cluster?.Enterprise, + Site = cluster?.Site, Namespaces = await db.Namespaces.AsNoTracking().Where(n => n.GenerationId == draftId).ToListAsync(ct), DriverInstances = await db.DriverInstances.AsNoTracking().Where(d => d.GenerationId == draftId).ToListAsync(ct), Devices = await db.Devices.AsNoTracking().Where(d => d.GenerationId == draftId).ToListAsync(ct), diff --git a/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/DraftValidatorTests.cs b/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/DraftValidatorTests.cs index 4e1146d..839c1a1 100644 --- a/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/DraftValidatorTests.cs +++ b/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/DraftValidatorTests.cs @@ -232,4 +232,90 @@ public sealed class DraftValidatorTests Enabled = enabled, CreatedBy = "t", }; + + // ------------------------------------------------------------------------------------ + // ValidatePathLength — Enterprise/Site length precision (Configuration-003) + // ------------------------------------------------------------------------------------ + + [Fact] + public void PathLength_uses_actual_Enterprise_Site_when_provided() + { + // Craft a snapshot where the 32+32 approximation would flag PathTooLong but the + // actual Enterprise/Site lengths would not. Equipment name intentionally exceeds the + // UNS-segment 32-char limit (it would also trigger UnsSegmentInvalid — that is fine; + // both checks are independent and the test is filtering only on PathTooLong). + // + // approx: 32+32 + 32+32+90 +4 = 222 > 200 → would flag PathTooLong + // actual: 2 +1 + 32+32+90 +4 = 161 ≤ 200 → no PathTooLong + var areaId = "area-a"; + var lineId = "line-b"; + var uuid = Guid.NewGuid(); + var eqName = new string('x', 90); // 90 chars — exceeds UNS regex but that's a separate error + + var snapshot = new DraftSnapshot + { + GenerationId = 1, + ClusterId = "c", + Enterprise = "zb", // 2 chars (actual) + Site = "s", // 1 char (actual) + UnsAreas = [new UnsArea { UnsAreaId = areaId, ClusterId = "c", Name = new string('a', 32) }], + UnsLines = [new UnsLine { UnsLineId = lineId, UnsAreaId = areaId, Name = new string('b', 32) }], + Equipment = + [ + new Equipment + { + EquipmentUuid = uuid, + EquipmentId = DraftValidator.DeriveEquipmentId(uuid), + Name = eqName, + DriverInstanceId = "d", + UnsLineId = lineId, + MachineCode = "m", + }, + ], + }; + + var errors = DraftValidator.Validate(snapshot); + + errors.ShouldNotContain(e => e.Code == "PathTooLong", + "actual Enterprise='zb' + Site='s' keeps total path at 161 chars — under the 200-char limit"); + } + + [Fact] + public void PathLength_conservative_fallback_when_Enterprise_Site_absent() + { + // Without Enterprise/Site on the snapshot the validator assumes 32+32. + // A path whose segments sum to 93 chars (area=32 + line=32 + eq=29) fits + // under 200 even with the 32+32 approximation (32+32+93+4 = 161 ≤ 200) and + // must NOT be flagged — the fallback must not over-penalise valid paths that + // would also be valid under real Enterprise/Site values. + var areaId = "area-x"; + var lineId = "line-y"; + var uuid = Guid.NewGuid(); + + var snapshot = new DraftSnapshot + { + GenerationId = 1, + ClusterId = "c", + // Enterprise and Site deliberately omitted — conservative fallback path + UnsAreas = [new UnsArea { UnsAreaId = areaId, ClusterId = "c", Name = new string('a', 32) }], + UnsLines = [new UnsLine { UnsLineId = lineId, UnsAreaId = areaId, Name = new string('b', 32) }], + Equipment = + [ + new Equipment + { + EquipmentUuid = uuid, + EquipmentId = DraftValidator.DeriveEquipmentId(uuid), + Name = new string('c', 29), + DriverInstanceId = "d", + UnsLineId = lineId, + MachineCode = "m", + }, + ], + }; + + var errors = DraftValidator.Validate(snapshot); + + errors.ShouldNotContain(e => e.Code == "PathTooLong", + "conservative 32+32+32+32+29+4 = 161 chars is still under the 200-char limit"); + } } diff --git a/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs b/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs index 7e05565..056c734 100644 --- a/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs +++ b/tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs @@ -1,4 +1,5 @@ using Microsoft.Extensions.Logging.Abstractions; +using Polly.Timeout; using Shouldly; using Xunit; using ZB.MOM.WW.OtOpcUa.Configuration.LocalCache; @@ -119,6 +120,104 @@ public sealed class ResilientConfigReaderTests : IDisposable attempts.ShouldBeLessThanOrEqualTo(1); } + + // ------------------------------------------------------------------------------------ + // Configuration-006 — command-timeout TaskCanceledException and TimeoutRejectedException + // must fall back to the sealed cache, not propagate as caller cancellation. + // ------------------------------------------------------------------------------------ + + [Fact] + public async Task CommandTimeout_TaskCanceledException_FallsBackToCache() + { + // A SQL command-level timeout surfaces as a TaskCanceledException thrown by the + // delegate itself (not triggered by the caller's CancellationToken). It must be + // treated as a transient failure and trigger the cache fallback, not be mistaken + // for genuine caller cancellation and propagated. + var cache = new GenerationSealedCache(_root); + await cache.SealAsync(new GenerationSnapshot + { + ClusterId = "cluster-b", GenerationId = 7, CachedAt = DateTime.UtcNow, + PayloadJson = "{\"from\":\"cache\"}", + }); + var flag = new StaleConfigFlag(); + var reader = new ResilientConfigReader(cache, flag, NullLogger.Instance, + timeout: TimeSpan.FromSeconds(10), retryCount: 0); + + // Simulate a command-level timeout: TaskCanceledException with no linked token. + var result = await reader.ReadAsync( + "cluster-b", + _ => throw new TaskCanceledException("SQL command timeout (no caller token)"), + snap => snap.PayloadJson, + CancellationToken.None); // caller token is NOT cancelled + + result.ShouldBe("{\"from\":\"cache\"}", + "command-timeout TaskCanceledException must fall back to sealed cache"); + flag.IsStale.ShouldBeTrue("cache fallback marks the stale flag"); + } + + [Fact] + public async Task PollyTimeout_TimeoutRejectedException_FallsBackToCache() + { + // When Polly's own timeout strategy fires it throws TimeoutRejectedException. + // That should trigger the cache fallback just like any other transient error. + var cache = new GenerationSealedCache(_root); + await cache.SealAsync(new GenerationSnapshot + { + ClusterId = "cluster-c", GenerationId = 8, CachedAt = DateTime.UtcNow, + PayloadJson = "{\"from\":\"polly-timeout-cache\"}", + }); + var flag = new StaleConfigFlag(); + // Set an extremely short Polly timeout so the async delay triggers it. + var reader = new ResilientConfigReader(cache, flag, NullLogger.Instance, + timeout: TimeSpan.FromMilliseconds(10), retryCount: 0); + + var result = await reader.ReadAsync( + "cluster-c", + async ct => + { + await Task.Delay(TimeSpan.FromSeconds(5), ct); // far exceeds 10 ms timeout + return "never"; + }, + snap => snap.PayloadJson, + CancellationToken.None); + + result.ShouldBe("{\"from\":\"polly-timeout-cache\"}", + "Polly TimeoutRejectedException must fall back to sealed cache"); + flag.IsStale.ShouldBeTrue("cache fallback marks the stale flag"); + } + + [Fact] + public async Task CallerCancellation_Propagates_NotFallback() + { + // Explicit caller cancellation must NOT fall back to the sealed cache — the + // caller said stop, so we must stop. + var cache = new GenerationSealedCache(_root); + await cache.SealAsync(new GenerationSnapshot + { + ClusterId = "cluster-d", GenerationId = 9, CachedAt = DateTime.UtcNow, + PayloadJson = "{\"should\":\"not be returned\"}", + }); + var flag = new StaleConfigFlag(); + var reader = new ResilientConfigReader(cache, flag, NullLogger.Instance, + timeout: TimeSpan.FromSeconds(10), retryCount: 0); + using var cts = new CancellationTokenSource(); + cts.Cancel(); + + await Should.ThrowAsync(async () => + { + await reader.ReadAsync( + "cluster-d", + ct => + { + ct.ThrowIfCancellationRequested(); + return ValueTask.FromResult("ok"); + }, + _ => "cache-should-not-be-used", + cts.Token); + }); + + flag.IsStale.ShouldBeFalse("no cache snapshot served on genuine cancellation"); + } } [Trait("Category", "Unit")]