6 Commits

Author SHA1 Message Date
Joseph Doherty e74e8f7b31 docs(code-reviews): regenerate index — 23 Low findings resolved
Batch 1 cleared Open findings in Core, Core.Abstractions, Core.AlarmHistorian,
Configuration, and Analyzers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:39:16 -04:00
Joseph Doherty 0993fa5a19 fix(analyzers): resolve Low code-review findings (Analyzers-002,003,004,005,007)
- Analyzers-002: drop the three dead AlarmSurfaceInvoker entries from
  the wrapper-method allow-list and from the diagnostic message.
- Analyzers-003: bail out of AnalyzeInvocation when the semantic model
  is null (was previously emitting a false positive).
- Analyzers-004: resolve guarded-interface + wrapper-method symbols
  once via CompilationStartAction and compare with SymbolEqualityComparer
  instead of formatting fully-qualified names on every invocation.
- Analyzers-005: add regression tests for default-interface-method
  reads (ReadAtTimeAsync / ReadEventsAsync on a concrete driver), with
  + without an override, and inside a CapabilityInvoker.ExecuteAsync
  lambda.
- Analyzers-007: rewrite the analyzer remarks to accurately describe
  the symbol-identity guarded-call detection, DIM handling, and the
  wrapper-lambda match heuristic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:38:37 -04:00
Joseph Doherty 0da4f3b63a fix(core-alarm-historian): resolve Low code-review findings (Core.AlarmHistorian-008,011)
- Core.AlarmHistorian-008: cache queue depth in an Interlocked counter so
  EnqueueAsync no longer runs COUNT(*) on every alarm; consolidate
  DrainOnceAsync onto a single SqliteConnection per tick (purge, batch
  read, dead-letter, and outcome transaction all share it).
- Core.AlarmHistorian-011: confirm the stale Galaxy.Host XML doc
  references were already fixed under earlier commits; flip to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:38:26 -04:00
Joseph Doherty b92fea15d4 fix(configuration): resolve Low code-review findings (Configuration-004,005,007,010,011)
- Configuration-004: NodePermissions stored as int to match the EF
  HasConversion<int>() in OtOpcUaConfigDbContext.ConfigureNodeAcl.
- Configuration-005: serialise LiteDbConfigCache.PutAsync so concurrent
  Put for the same (ClusterId, GenerationId) cannot duplicate rows.
- Configuration-007: rethrow OperationCanceledException from
  GenerationApplier.ApplyPass when the caller's token is cancelled.
- Configuration-010: scrub secrets and drop the full exception object
  from the ResilientConfigReader fallback warning log.
- Configuration-011: pin the previously-uncovered GenerationApplier
  cancellation and path-length / publish-validation paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:38:18 -04:00
Joseph Doherty 8be6afbda4 fix(core): resolve Low code-review findings (Core-004,008,009,010,011,012)
- Core-004: add ConfigureAwait(false) to DriverHost.RegisterAsync /
  UnregisterAsync / DisposeAsync.
- Core-008: rewrite the BuildAddressSpaceAsync XML doc to correctly name
  the caller (OpcUaApplicationHost.PopulateAddressSpaces) that owns the
  per-driver isolation.
- Core-009: snapshot DriverResilienceOptions once per non-idempotent write
  in CapabilityInvoker.ExecuteWriteAsync.
- Core-010: switch DriverResilienceOptions.Resolve to TryGetValue with a
  diagnostic error message when a tier table is missing a capability.
- Core-011: add an optional diagnostic callback to PermissionTrieBuilder
  so production callers can surface scope-path mismatches.
- Core-012: correct the stale WedgeDetector ctor summary and add the
  Reconnecting row to DriverHealthReport's state matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:38:09 -04:00
Joseph Doherty ff2e75ab98 fix(core-abstractions): resolve Low code-review findings (Core.Abstractions-004,005,006,007,008)
- Core.Abstractions-004: guard DriverTypeRegistry.Register with a Lock so
  concurrent registrations are atomic.
- Core.Abstractions-005: narrow PollGroupEngine catch blocks to non-fatal
  exceptions, add optional onError callback, tolerate disposed-CTS races.
- Core.Abstractions-006: document the deliberate int-vs-uint asymmetry on
  IHistoryProvider.ReadEventsAsync / IHistorianDataSource.ReadEventsAsync.
- Core.Abstractions-007: pin the gaps with PollGroupEngine + DriverHealth
  contract tests.
- Core.Abstractions-008: correct XML docs on DriverHealth.LastError and
  the optional / required asymmetry on the history-read surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:37:54 -04:00
42 changed files with 2122 additions and 269 deletions
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -48,13 +48,13 @@
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:46-50,130` |
| Status | Open |
| Status | Resolved |
**Description:** `AlarmSurfaceInvoker` is listed in `WrapperTypes`, but `AlarmSurfaceInvoker`'s public methods (`SubscribeAsync`, `UnsubscribeAsync`, `AcknowledgeAsync`) take no lambda arguments at all — callers pass `IReadOnlyList<...>` / `IAlarmSubscriptionHandle`, and the invoker builds the resilience lambdas internally. `IsInsideWrapperLambda` only ever returns `true` when it finds an `AnonymousFunctionExpressionSyntax` argument in the outer call's argument list. Because no `AlarmSurfaceInvoker` call site can have a lambda argument, the `AlarmSurfaceInvoker` entry in `WrapperTypes` is effectively dead — it can never satisfy the suppression condition. Guarded `IAlarmSource` calls written inside `AlarmSurfaceInvoker.cs` are in fact suppressed correctly, but only because they sit inside `CapabilityInvoker.ExecuteAsync` lambdas (the `CapabilityInvoker` entry does the work). The dead entry is misleading and suggests the analyzer recognises an `AlarmSurfaceInvoker` "lambda home" that does not exist.
**Recommendation:** Either remove `AlarmSurfaceInvoker` from `WrapperTypes` (its calls are already covered transitively by the `CapabilityInvoker` match) and update the XML doc, or — if the intent is to allow `IAlarmSource` calls anywhere inside `AlarmSurfaceInvoker` regardless of lambda nesting — add an explicit "call site is lexically within the `AlarmSurfaceInvoker` type declaration" check rather than relying on a lambda-argument scan that never fires.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Removed the three dead `AlarmSurfaceInvoker` entries from `WrapperMethodKeys` (renamed from `WrapperMethods`); calls inside `AlarmSurfaceInvoker` methods remain correctly suppressed via the transitive `CapabilityInvoker.ExecuteAsync` lambda match. Updated XML docs + diagnostic message to drop the misleading `AlarmSurfaceInvoker.*` reference. Pinned the transitive coverage with regression test `GuardedCall_InsideAlarmSurfaceInvokerMethod_WrappedByCapabilityInvoker_PassesCleanly`.
### Analyzers-003
@@ -63,13 +63,13 @@
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:80,114-116` |
| Status | Open |
| Status | Resolved |
**Description:** `IsInsideWrapperLambda` is passed `context.Operation.SemanticModel` and returns `false` when that model is `null`. A `false` return means "not wrapped", so a null semantic model produces a false-positive diagnostic rather than silently skipping the call. For `RegisterOperationAction` the `SemanticModel` is non-null in normal compilation, so this is low-risk in practice, but the failure mode is the wrong direction — a tooling/IDE edge case where the model is unavailable would flag correct code. Separately, the analyzer has no defensive guard against partially-bound / malformed call sites: `method.ContainingType`, `method.ReturnType`, and `iface.GetMembers()` are dereferenced without null checks. `IInvocationOperation.TargetMethod` is non-null by contract and `ContainingType` is non-null for an ordinary method, so a hard crash is unlikely, but an analyzer that throws on malformed in-progress syntax degrades the IDE experience for the whole solution.
**Recommendation:** When `semanticModel is null` in `AnalyzeInvocation`, return early (skip the call) instead of letting `IsInsideWrapperLambda` report it as unwrapped, so unavailable semantics never produce a false positive.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `AnalyzeInvocation` now returns early when `context.Operation.SemanticModel is null` instead of reporting the call as unwrapped; added defensive null guards for `TargetMethod`, `ContainingType`, and `ReturnType` so an analyzer crash on partially-bound IDE syntax cannot leak. The `MethodImplementsInterfaceMember` helper iterates members filtered by `is IMethodSymbol` so non-method members never throw on `.GetMembers()` dereferences.
### Analyzers-004
@@ -78,13 +78,13 @@
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` |
| Status | Open |
| Status | Resolved |
**Description:** `ImplementsGuardedInterface` runs on every invocation operation in the compilation (every keystroke in the IDE). For each candidate it allocates via `AllInterfaces.Concat(new[] { method.ContainingType })`, builds a fully-qualified display string per interface and calls `string.Replace("global::", ...)`, then for matching interfaces iterates `iface.GetMembers().OfType<IMethodSymbol>()` calling `FindImplementationForInterfaceMember` per member. The `GuardedInterfaces` / `WrapperTypes` lookups are `string[].Contains` (linear scan) rather than a hash set. None of this is catastrophic — the interface sets are tiny — but the work is repeated for every invocation including the overwhelming majority that target non-guarded methods, and the FQN string formatting plus `Replace` allocation on the hot path is avoidable.
**Recommendation:** Move to `RegisterCompilationStartAction`: resolve the guarded interface and wrapper-type symbols once via `Compilation.GetTypeByMetadataName`, capture them, and compare invocation symbols by `SymbolEqualityComparer` identity. Replace the `string[]` membership checks with a `HashSet`. This also makes the analyzer correctly no-op in compilations that do not reference `Core.Abstractions`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Refactored to `RegisterCompilationStartAction` that resolves all guarded-interface and wrapper-method symbols once via `Compilation.GetTypeByMetadataName`, stores them in `HashSet<INamedTypeSymbol>` / `Dictionary<INamedTypeSymbol, HashSet<string>>` using `SymbolEqualityComparer.Default`, then registers the per-invocation action only when guarded types are present (the analyzer is a no-op when none are referenced). The hot path now uses `SymbolEqualityComparer.Default.Equals` instead of FQN-string formatting + `Replace`. Pinned the cold-compilation no-op with regression test `Compilation_WithoutGuardedInterfaceReferences_EmitsNoDiagnostics`.
### Analyzers-005
@@ -93,13 +93,13 @@
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` |
| Status | Open |
| Status | Resolved |
**Description:** `CapabilityInvoker`'s XML doc (`src/Core/.../Resilience/CapabilityInvoker.cs:15-17`) enumerates the routed capability surface as `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, and all four `IHistoryProvider` reads — matching the analyzer's `GuardedInterfaces` set. However `IHistoryProvider` exposes five async methods, and two of them (`ReadAtTimeAsync`, `ReadEventsAsync`) are C# default-interface-method implementations. When a driver does not override a DIM and a caller invokes it through a concrete driver reference, `FindImplementationForInterfaceMember` returns the interface's own default method symbol; the second equality branch (`method.OriginalDefinition` == `member`) still catches the interface-typed-receiver case, so detection holds — but this DIM interaction is undocumented and untested, and a future driver that overrides one DIM but not the other creates an asymmetric guarded surface that nobody has verified.
**Recommendation:** Add explicit test cases (see Analyzers-006) for `IHistoryProvider` calls via both an interface-typed receiver and a concrete driver that (a) overrides and (b) inherits the default `ReadAtTimeAsync` / `ReadEventsAsync`. If a gap is found, handle DIM members explicitly. Add a short remark to the analyzer XML doc noting the default-interface-method consideration.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Extended the test stub `IHistoryProvider` with both DIM members (`ReadAtTimeAsync` + `ReadEventsAsync`) and added four regression tests pinning the DIM behaviour for (a) concrete driver inheriting the DIM (`Direct_ReadAtTimeAsync_OnConcreteDriverInheritingDIM_TripsDiagnostic`, `Direct_ReadEventsAsync_OnConcreteDriverInheritingDIM_TripsDiagnostic`), (b) concrete driver overriding the DIM (`Direct_ReadAtTimeAsync_OnConcreteDriverOverridingDIM_TripsDiagnostic`), and (c) DIM call correctly wrapped (`Wrapped_ReadAtTimeAsync_DIM_InsideCapabilityInvokerLambda_PassesCleanly`). Confirmed no gap: both DIM branches (interface-typed receiver routing to `method.OriginalDefinition == member`, concrete-receiver-with-override routing to `FindImplementationForInterfaceMember`) match correctly. Added a dedicated DIM paragraph to the analyzer's XML remarks.
### Analyzers-006
@@ -130,10 +130,10 @@
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` |
| Status | Open |
| Status | Resolved |
**Description:** The `<remarks>` block states the analyzer "matches by receiver-interface identity using Roslyn's semantic model, not by method name". This is accurate for the guarded-call detection (`ImplementsGuardedInterface` uses symbols), but the wrapper detection in `IsInsideWrapperLambda` is described in the same block as walking the syntax tree and checking enclosing invocations by containing type — and that detection is in fact looser than the prose implies (see Analyzers-001): it does not verify the lambda is bound to the resilience `callSite` parameter. The XML doc reads as if the wrapper match is precise. The `<remarks>` also notes the rule does not enforce the capability argument matches the method, but omits the more important current limitation — that a lambda in any argument position of a wrapper-typed call suppresses the diagnostic.
**Recommendation:** Tighten the `<remarks>` to state precisely what `IsInsideWrapperLambda` checks today (textual containment within a lambda argument of a `CapabilityInvoker` / `AlarmSurfaceInvoker`-typed invocation), and note the known limitation that it does not bind the lambda to the `callSite` parameter. Keep the doc in sync if Analyzers-001 is fixed.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Rewrote the analyzer's `<remarks>` into five precise paragraphs: (1) guarded-call detection uses symbol identity, (2) DIM handling (covers Analyzers-005), (3) wrapper-lambda detection matches both containing-type symbol AND method name, with the lambda-not-bound-to-callSite-parameter limitation called out explicitly, (4) why `AlarmSurfaceInvoker` is not a wrapper home (covers Analyzers-002 narrative), (5) the existing capability-argument-not-enforced caveat. The doc is now in sync with the post-Analyzers-001/-002/-004 implementation.
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -78,13 +78,13 @@
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
| Status | Open |
| Status | Resolved |
**Description:** `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint``int` conversion. Only bits 011 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a `uint` value that overflows `int` and the conversion round-trip would corrupt it.
**Recommendation:** Change the enum underlying type to `int` (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — changed `NodePermissions` underlying type from `uint` to `int` so it matches the documented "stored as int" semantics and the `HasConversion<int>()` mapping in `OtOpcUaConfigDbContext.ConfigureNodeAcl`. Added regression test `NodePermissionsTests` pinning the underlying type and round-trip safety through `int` storage.
### Configuration-005
@@ -93,13 +93,13 @@
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
| Status | Open |
| Status | Resolved |
**Description:** `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructor's `EnsureIndex` calls are non-unique, so the storage layer does not prevent the duplicate, and `PruneOldGenerationsAsync`'s `keepLatest` accounting is then off.
**Recommendation:** Declare a unique index on `(ClusterId, GenerationId)` and treat the duplicate-key exception as an idempotent no-op, or guard `PutAsync` with an instance `SemaphoreSlim`/lock. Document the concurrency contract on `ILocalConfigCache`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — guarded `PutAsync` with an instance-level `SemaphoreSlim` so the find-then-insert/update window runs atomically for a given cache instance; documented the concurrency contract on `ILocalConfigCache`. Added regression test `PutAsync_concurrent_for_same_cluster_and_generation_does_not_duplicate` that runs 64 concurrent puts and inspects the LiteDB file directly to confirm exactly one row per `(ClusterId, GenerationId)` survives.
### Configuration-006
@@ -123,13 +123,13 @@
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
| Status | Open |
| Status | Resolved |
**Description:** `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.
**Recommendation:** Rethrow `OperationCanceledException` rather than recording it as an entity error; call `ct.ThrowIfCancellationRequested()` between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `catch (OperationCanceledException) when (ct.IsCancellationRequested) { throw; }` ahead of the generic catch in `ApplyPass` so genuine caller cancellation propagates rather than being recorded as an entity error, and added a `ct.ThrowIfCancellationRequested()` at the top of each Added/Modified pass iteration. The "failed Removed pass keeps walking Added/Modified" behaviour was confirmed as the intended contract (cascades must settle) and pinned by `Apply_continues_to_Added_pass_when_a_Removed_callback_throws`. New regression tests: `Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled`, `Apply_stops_between_passes_when_cancellation_requested`.
### Configuration-008
@@ -168,13 +168,13 @@
| Severity | Low |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` |
| Status | Open |
| Status | Resolved |
**Description:** On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.
**Recommendation:** Log `ex.GetType().Name` and `ex.Message` for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — stopped passing the raw exception object to `LogWarning`; the fallback log now records only `ex.GetType().Name` and a `ScrubSecrets`-redacted `ex.Message` so connection-string fragments (Password, User Id, Pwd, Uid, AccessToken, Authorization, ApiKey/Api-Key) are stripped before reaching any sink. Added regression test `FallbackWarning_does_not_log_full_exception_object_or_password_fragment` that captures emitted log records and asserts no raw exception attached and no credential keyword present in the rendered message.
### Configuration-011
@@ -183,10 +183,10 @@
| Severity | Low |
| Category | Testing coverage |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` |
| Status | Open |
| Status | Resolved |
**Description:** The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) `ValidatePathLength`'s constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.
**Recommendation:** Add `GenerationApplierTests` cases for a throwing callback (assert error recorded, assert cancellation propagates) and a `DraftValidatorTests` path-length boundary case. Add a `StoredProceduresTests` case that publishes an invalid draft and asserts it stays `Draft`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — all three gaps now covered. (a) `GenerationApplierTests.Apply_continues_to_Added_pass_when_a_Removed_callback_throws` pins ordering; `Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled` and `Apply_stops_between_passes_when_cancellation_requested` (added under Configuration-007) pin cancellation. (b) `DraftValidatorTests.PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` (added under Configuration-003) pin the path-length boundary. (c) `StoredProceduresTests.Publish_aborts_when_ValidateDraft_rejects_the_draft` (added under Configuration-001) pins the publish-bypasses-validation regression against the live SQL fixture.
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -84,13 +84,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40` |
| Status | Open |
| Status | Resolved |
**Description:** `Register` performs a check-then-act sequence (`snapshot.ContainsKey` then build `next` then `Interlocked.Exchange`) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a `next` dictionary; the second `Interlocked.Exchange` then wins and silently discards the first registration, defeating the documented "registered only once" guarantee. The class XML doc states registration happens single-threaded at startup, so this is not a live defect — but the use of `Interlocked.Exchange` for the swap implies the type is fully thread-safe for writers, which it is not. The mismatch between the implementation's apparent intent and its actual guarantee is a maintenance hazard.
**Recommendation:** Either guard `Register` with a `lock` so the check-build-swap is atomic, or strengthen the XML `Thread-safety` remark to state explicitly that concurrent `Register` calls are unsupported and only reader/writer concurrency is safe.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — guarded the duplicate check + copy-on-write rebuild + swap with a private `Lock`, making the check-build-swap atomic. Added `Register_ConcurrentDistinctTypes_AllSucceed` and `Register_ConcurrentDuplicateType_ExactlyOneWins` tests that exercise 16/32 racing threads and assert the "registered only once" guarantee holds.
### Core.Abstractions-005
@@ -99,13 +99,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99` |
| Status | Open |
| Status | Resolved |
**Description:** Both the initial-poll and steady-state catch blocks use a bare `catch { }` that swallows every exception type, including non-transient programmer errors such as `NullReferenceException` and `ArgumentOutOfRangeException` (see Core.Abstractions-002). The XML remark says "transient poll error — loop continues, driver health surface logs it", but the engine never actually notifies the driver — there is no callback or event for a caught exception, so the driver's health surface has nothing to log. A persistently failing reader produces a silently spinning loop with zero observability from inside this module.
**Recommendation:** Narrow the catch to the exception types a reader is expected to throw (or at least exclude obviously-fatal ones), and add an optional `Action<Exception>` error callback (or raise an event) so the owning driver can record poll failures on its health surface as the doc claims happens.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — narrowed the catch blocks with an `IsFatal` guard so `OutOfMemoryException` / `StackOverflowException` / `AccessViolationException` / `ThreadAbortException` propagate instead of being swallowed; added an optional `Action<Exception>? onError` constructor parameter (backward-compatible — every existing driver call site uses named args and is unaffected) and routed every caught reader / contract-violation exception through a `ReportError` helper that defends against a buggy error sink. Also tolerates `ObjectDisposedException` from `Task.Delay` against an already-disposed CTS (defensive race-safety). Added `Reader_exception_is_reported_to_onError_callback`, `Reader_contract_violation_routes_to_onError_callback`, and `OnError_handler_that_throws_does_not_crash_loop` regression tests.
### Core.Abstractions-006
@@ -114,13 +114,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63` |
| Status | Open |
| Status | Resolved |
**Description:** The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. `IHistoryProvider.ReadRawAsync` and `IHistorianDataSource.ReadRawAsync` take `uint maxValuesPerNode`, but `ReadEventsAsync` (on both interfaces) takes `int maxEvents`. The OPC UA `HistoryRead` service request fields are unsigned, and a negative `maxEvents` has no defined meaning. Mixing `int` and `uint` for the same parameter role across sibling methods forces every caller and implementer to reason about the inconsistency and risks accidental sign issues at the boundary.
**Recommendation:** Standardize on `uint` for all max-rows parameters across both `IHistoryProvider` and `IHistorianDataSource` (or document explicitly why `maxEvents` differs).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — took the documented-difference path (signature change would cross ten files outside Core.Abstractions). `int maxEvents` is intentionally signed: callers and downstream historian adapters (`WonderwareHistorianClient`, `HistorianDataSource`) treat `maxEvents <= 0` as a "use the backend's default cap" sentinel that has no `uint` equivalent. Updated XML docs on both `IHistoryProvider.ReadEventsAsync` and `IHistorianDataSource.ReadEventsAsync` to spell out the asymmetry and rationale. Added `HistoryRead_MaxParameter_TypePinned` / `HistoryProvider_MaxParameter_TypePinned` contract tests so an accidental future flip is caught.
### Core.Abstractions-007
@@ -129,13 +129,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** `PollGroupEngine` is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are unchanged across polls (would catch Core.Abstractions-001), and (b) no test exercises a reader that returns a snapshot list shorter than the input references (would catch Core.Abstractions-002). The `Reader_exception_does_not_crash_loop` test only covers a reader that throws before producing any result. `DataValueSnapshot` change-detection semantics for reference-typed values are therefore unverified.
**Recommendation:** Add tests for the unchanged-array case and the short-result-list case once Core.Abstractions-001/002 are addressed, so the intended contract is locked down.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the gap is filled by the regression tests added when Core.Abstractions-001 and -002 were closed: `Array_valued_tag_unchanged_contents_raises_only_once`, `Array_valued_tag_changed_contents_raises_event`, and `Reader_short_result_list_raises_descriptive_exception_and_loop_continues` lock the two previously-untested paths down. The fixes for -004 / -005 / -008 added a further nine regression tests (`Register_ConcurrentDistinctTypes_AllSucceed`, `Register_ConcurrentDuplicateType_ExactlyOneWins`, `Reader_exception_is_reported_to_onError_callback`, `Reader_contract_violation_routes_to_onError_callback`, `OnError_handler_that_throws_does_not_crash_loop`, `LastError_IsIndependent_OfState`, `DriverState_EnumContainsExpectedMembers`, `HistoryRead_MaxParameter_TypePinned`, `HistoryProvider_MaxParameter_TypePinned`, `HistoryProvider_OptionalMethods_HaveDefaultImplementation`). Total test count rose from 57 to 75.
### Core.Abstractions-008
@@ -144,7 +144,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69` |
| Status | Open |
| Status | Resolved |
**Description:** Two XML-doc inaccuracies:
@@ -153,4 +153,4 @@ a category produced nothing rather than leaving it blank.
**Recommendation:** Reword `DriverHealth.LastError` to "Most recent error message; may be null when no error has been recorded" without tying nullness to a specific state. Add a one-line note on `IHistoryProvider`/`IHistorianDataSource` explaining why one surface uses default methods and the other does not.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — reworded `DriverHealth.LastError` to "null when no error has been recorded" and called out that the field is independent of `State`. Added an asymmetry `<remarks>` to `IHistoryProvider` (default-impl-throws so legacy drivers compile) and to `IHistorianDataSource.ReadEventsAsync` (required because server-side historians own the full surface) cross-referencing the finding. Added `LastError_IsIndependent_OfState` + `DriverState_EnumContainsExpectedMembers` and `HistoryProvider_OptionalMethods_HaveDefaultImplementation` contract tests so the asymmetry pins.
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -138,13 +138,13 @@
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,255-278` |
| Status | Open |
| Status | Resolved |
**Description:** Each `EnqueueAsync` (one per alarm transition — a hot path on a busy plant) opens a connection, runs `EnforceCapacity` (a `COUNT(*)` over the queue table on every single enqueue), serializes JSON, inserts, and closes the connection. The unconditional `COUNT(*)` on every enqueue is an avoidable scan; the open/close churn defeats connection pooling benefits and adds lock-acquisition overhead per event. `DrainOnceAsync` similarly opens three separate connections per tick (`PurgeAgedDeadLetters`, `ReadBatch`, the transaction block).
**Recommendation:** Reuse a single pooled write connection. Replace the per-enqueue `COUNT(*)` with a periodic capacity check (every Nth enqueue, or piggy-backed on the drain tick), or maintain an in-memory approximate counter. Combine the drain-tick connections into one.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added an `Interlocked`-guarded in-memory `_queuedRowCount` seeded from storage at construction and kept current by every mutation (enqueue increment, drain Ack/PermanentFail/corrupt-dead-letter decrements, capacity-eviction adjustment, RetryDeadLettered re-add). `EnqueueAsync` now short-circuits capacity enforcement against the cached counter via `EnforceCapacityFastPathAsync`, only paying for a real `COUNT(*)` when the cached value reaches the capacity wall or the periodic resync interval (every 10,000 enqueues) elapses; the obsolete sync `EnforceCapacity` was removed. `GetStatus()` reads `QueueDepth` from the same counter so a busy Admin UI no longer hits the DB for it. `DrainOnceAsync` is consolidated onto one shared `SqliteConnection` per tick — purge, read, corrupt-dead-letter, and the outcome-applying transaction now reuse it instead of opening three. Regression tests `EnqueueAsync_does_not_count_all_rows_on_every_call_below_capacity`, `Enqueue_and_drain_keep_queue_depth_consistent_with_storage`, and `Counter_remains_consistent_under_concurrent_enqueue_and_drain` added.
### Core.AlarmHistorian-009
@@ -183,10 +183,10 @@
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs:5-9,76`, `AlarmHistorianEvent.cs:20` |
| Status | Open |
| Status | Resolved |
**Description:** Several doc-comments reference the retired v1 architecture. The `IAlarmHistorianSink` summary says ingestion "routes through Galaxy.Host's pipe" and `IAlarmHistorianWriter` says "Stream G wires this to the Galaxy.Host IPC client", but `docs/AlarmTracking.md` and `CLAUDE.md` state the legacy `Galaxy.Host` project was retired in PR 7.2 and the write path is now the Wonderware historian sidecar (`WonderwareHistorianClient`). `AlarmHistorianEvent.cs:20` likewise says "the Galaxy.Host handler maps to the historian's enum on the wire." These stale references will mislead a reader about where the writer is actually hosted.
**Recommendation:** Update the doc-comments to refer to the Wonderware historian sidecar / `WonderwareHistorianClient` (`IAlarmHistorianWriter` implementation) instead of `Galaxy.Host`, consistent with `docs/AlarmTracking.md`'s "Historian write-back" section.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the three stale `Galaxy.Host` references were already replaced ahead of this resolution by earlier commits (`bdca772` rewrote the `IAlarmHistorianSink` summary + `IAlarmHistorianWriter` summary to name the Wonderware historian sidecar / `WonderwareHistorianClient`; `f6d487b` rewrote the `AlarmHistorianEvent.EventKind` doc-comment). A fresh grep across the project confirms no remaining `Galaxy.Host` / "Stream G wires this" strings — only the legitimate `Galaxy-native` alarm-source label survives. Status flipped to Resolved during the -008 pass; no new source change was needed.
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -78,13 +78,13 @@
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` |
| Status | Open |
| Status | Resolved |
**Description:** `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. The server host has no synchronization context so behaviour is currently correct, but the inconsistency is a maintenance hazard and a deviation from the established convention in `Core.Resilience`.
**Recommendation:** Add `.ConfigureAwait(false)` to the three awaited calls in `DriverHost.RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `.ConfigureAwait(false)` to the three awaited driver calls in `RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`; added three `RegisterAsync/UnregisterAsync/DisposeAsync_Does_Not_Capture_SynchronizationContext` regression tests that install a tracking `SynchronizationContext` on a dedicated thread and assert zero captured posts.
### Core-005
@@ -138,13 +138,13 @@
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
| Status | Open |
| Status | Resolved |
**Description:** The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception from `discovery.DiscoverAsync` propagates straight out unhandled, and nothing here marks a subtree Faulted. The isolation is presumably done by the server-layer caller, but the comment asserts behaviour this class does not implement.
**Recommendation:** Either implement the documented isolation in `GenericDriverNodeManager`, or correct the XML doc to state that exception isolation is the caller's responsibility and name the type that performs it.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — corrected the `BuildAddressSpaceAsync` XML doc to (a) explicitly state exception isolation is the caller's responsibility, and (b) name the type that performs it (`Server.OpcUa.OpcUaApplicationHost.PopulateAddressSpaces`); added `BuildAddressSpaceAsync_Propagates_Discovery_Exceptions_To_Caller` regression test verifying the documented propagation behaviour.
### Core-009
@@ -153,13 +153,13 @@
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` |
| Status | Open |
| Status | Resolved |
**Description:** `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path it rebuilds a fresh `DriverResilienceOptions` and a one-entry dictionary on every non-idempotent write, and the redundant accessor calls could observe two different snapshots if an Admin edit lands between them. Phase 6.1 budgets a 1% pipeline overhead; this is unnecessary allocation plus a minor consistency hazard.
**Recommendation:** Capture `var options = _optionsAccessor();` once at the top of the non-idempotent branch and derive both the `with` and the `Resolve` call from that snapshot. Consider caching the no-retry pipeline keyed on `(hostName, non-idempotent)`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ExecuteWriteAsync` now captures `_optionsAccessor()` into a single `snapshot` local at the top of the non-idempotent branch; the `with` expression and the `Resolve(Write)` call both derive from that snapshot so the two values are guaranteed coherent and only one accessor invocation occurs per call. Added `ExecuteWriteAsync_NonIdempotent_Snapshots_Options_Once_Per_Call` (counts invocations) and `ExecuteWriteAsync_NonIdempotent_Uses_Consistent_Options_Snapshot` (alternating-accessor) regression tests.
### Core-010
@@ -168,13 +168,13 @@
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` |
| Status | Open |
| Status | Resolved |
**Description:** `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will make `Resolve` throw `KeyNotFoundException` at runtime on the capability hot path rather than failing at build time. The two are coupled by convention only.
**Recommendation:** Either add a `default` arm to `Resolve` returning a conservative policy (and logging), or add a unit-test invariant asserting every `DriverCapability` value is present in each tier's default table.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `Resolve` now uses `TryGetValue` and throws a diagnostic `KeyNotFoundException` whose message names the missing capability + tier and points to `GetTierDefaults` when a capability is missing from both the override map and the tier table; the existing `TierDefaults_Cover_EveryCapability` test invariant prevents this in shipped code, and added `Resolve_Returns_NonNull_Policy_For_Every_Capability` (per-tier exhaustive) + `Resolve_Throws_Diagnostic_When_Capability_Missing_From_Tier_Defaults` regression tests.
### Core-011
@@ -183,13 +183,13 @@
| Severity | Low |
| Category | Testing coverage |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` |
| Status | Open |
| Status | Resolved |
**Description:** `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for deterministic tests, not for production"). The fallback silently produces a structurally incorrect trie when `scopePaths` is null or a row's `ScopeId` is missing — a UnsLine-scoped grant ends up as a direct child of the root, so `WalkEquipment` / `WalkSystemPlatform` never reach it and the grant is effectively dropped, with no diagnostic. There is no test asserting the production multi-level descent versus the fallback.
**Recommendation:** Add unit tests covering `Build` with `scopePaths` producing the correct multi-level trie and the missing-`ScopeId` fallback. Have `Descend` surface a diagnostic (or throw outside test configuration) when a sub-cluster row cannot be located in `scopePaths`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added optional `Action<PermissionTrieBuildDiagnostic>? diagnostic` parameter to `PermissionTrieBuilder.Build`; `Descend` now invokes the callback with a `MissingScopePath` diagnostic when a sub-cluster row's `ScopeId` is absent from a supplied (non-null) `scopePaths` lookup so production callers can log + surface orphan grants instead of silently dropping them. New `PermissionTrieBuilderTests` covers (a) production multi-level descent with sibling-line non-leakage, (b) the deterministic-test fallback, (c) the diagnostic firing on a missing scope-path entry, (d) no diagnostic when all rows resolve, and (e) no diagnostic when `scopePaths` is null (explicit test mode).
### Core-012
@@ -198,10 +198,10 @@
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` |
| Status | Open |
| Status | Resolved |
**Description:** Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is documented as stateless; the comment describes nothing the constructor does. (2) `DriverHealthReport` — the `<remarks>` state matrix lists Unknown, Initializing, Healthy, Degraded, Faulted but `Aggregate` (lines 42-44) also folds `DriverState.Reconnecting` into the Degraded verdict. `Reconnecting` is a real `DriverState` member absent from the documented matrix.
**Recommendation:** Replace the `WedgeDetector` constructor `<summary>` with an accurate description (e.g. "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s"). Add `Reconnecting` to the `DriverHealthReport` `<remarks>` state matrix and state it maps to Degraded.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced the `WedgeDetector(.ctor)` `<summary>` with an accurate "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s" description plus a `<param>` block; added the `Reconnecting` row to the `DriverHealthReport` `<remarks>` state matrix and updated the verdict-rule prose. Added `WedgeDetectorTests.Doc_Constructor_Summary_Describes_Threshold_Clamp` and `DriverHealthReportTests.Doc_State_Matrix_Includes_Reconnecting` regression tests that parse the generated `.xml` doc to assert the strings, plus `Any_Reconnecting_WithoutFaultedOrNotReady_IsDegraded` confirming the documented Reconnecting → Degraded behaviour.
+28 -28
View File
@@ -11,14 +11,14 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Admin](Admin/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 13 |
| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 7 |
| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 10 |
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 11 |
| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 12 |
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 8 |
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 11 |
| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 12 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 13 |
@@ -51,11 +51,6 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| Admin-010 | Low | OtOpcUa conventions | `Components/App.razor:9,16` | `App.razor` loads Bootstrap CSS and JS from the `cdn.jsdelivr.net` CDN. `admin-ui.md` section "Tech Stack" specifies "Bootstrap 5 vendored under `wwwroot/lib/bootstrap/`" precisely so the Admin app has no third-party runtime dependency. A… |
| Admin-011 | Low | Concurrency & thread safety | `Hubs/FleetStatusPoller.cs:24-26,98-103` | `FleetStatusPoller` keeps three plain `Dictionary<>` fields (`_last`, `_lastRole`, `_lastResilience`) mutated from `PollOnceAsync`. The poller `ExecuteAsync` loop is single-threaded so the steady-state poll path is safe, but `ResetCache()`… |
| Admin-012 | Low | Design-document adherence | `Services/EquipmentCsvImporter.cs:18-19,33-37,229,232` | `EquipmentCsvImporter` declares `EquipmentId` as a required CSV column and parses it into a `required` field. `admin-ui.md` section "Equipment CSV import" (revised after adversarial review finding #4) is explicit: "No `EquipmentId` column… |
| Analyzers-002 | Low | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:46-50,130` | `AlarmSurfaceInvoker` is listed in `WrapperTypes`, but `AlarmSurfaceInvoker`'s public methods (`SubscribeAsync`, `UnsubscribeAsync`, `AcknowledgeAsync`) take no lambda arguments at all — callers pass `IReadOnlyList<...>` / `IAlarmSubscript… |
| Analyzers-003 | Low | Error handling & resilience | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:80,114-116` | `IsInsideWrapperLambda` is passed `context.Operation.SemanticModel` and returns `false` when that model is `null`. A `false` return means "not wrapped", so a null semantic model produces a false-positive diagnostic rather than silently ski… |
| Analyzers-004 | Low | Performance & resource management | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` | `ImplementsGuardedInterface` runs on every invocation operation in the compilation (every keystroke in the IDE). For each candidate it allocates via `AllInterfaces.Concat(new[] { method.ContainingType })`, builds a fully-qualified display… |
| Analyzers-005 | Low | Design-document adherence | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` | `CapabilityInvoker`'s XML doc (`src/Core/.../Resilience/CapabilityInvoker.cs:15-17`) enumerates the routed capability surface as `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, and all… |
| Analyzers-007 | Low | Documentation & comments | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` | The `<remarks>` block states the analyzer "matches by receiver-interface identity using Roslyn's semantic model, not by method name". This is accurate for the guarded-call detection (`ImplementsGuardedInterface` uses symbols), but the wrap… |
| Client.CLI-002 | Low | Correctness & logic bugs | `Commands/SubscribeCommand.cs:129-137` | The summary computes `neverWentBad` as every target whose node-id key is absent from the `everBad` dictionary. A node that received no update at all is also absent from `everBad`, so it is counted in `neverWentBad` and printed under the he… |
| Client.CLI-003 | Low | Correctness & logic bugs | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` | Numeric command options accept any value with no range validation. `--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be supplied as `0` or a negative number. A negative `--depth`/`--max-depth` silently d… |
| Client.CLI-004 | Low | OtOpcUa conventions | `Commands/SubscribeCommand.cs:13-37` | `SubscribeCommand` is the only command in the module whose constructor and all `[CommandOption]` properties have no XML doc comments. Every other command (`ConnectCommand`, `ReadCommand`, `WriteCommand`, `BrowseCommand`, `AlarmsCommand`, `… |
@@ -75,24 +70,6 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| Client.UI-009 | Low | Design-document adherence | `ViewModels/HistoryViewModel.cs:44-54` | `HistoryViewModel.AggregateTypes` exposes eight entries: `null` (Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`. `docs/Client.UI.md` ("Query Options" table) lists only "Raw (default), Average, Minimum, Maxi… |
| Client.UI-010 | Low | Code organization & conventions | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` | `DateTimeRangePicker` declares `MinDateTimeProperty` / `MaxDateTimeProperty` styled properties with public CLR accessors, but neither is read anywhere in the control. `TryParseDateTime`, `OnStartLostFocus`, and `OnEndLostFocus` never clamp… |
| Client.UI-011 | Low | Documentation & comments | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` | The certificate-store-path `TextBox` watermark reads `(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208 folder name. Per `CLAUDE.md` / `docs/Client.UI.md` the canonical path is now `{LocalAppData}/OtOpcUaClient/`… |
| Configuration-004 | Low | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` | `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint``int` conversion. Only bits 011 are used… |
| Configuration-005 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` | `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructo… |
| Configuration-007 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` | `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes ins… |
| Configuration-010 | Low | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` | On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying serve… |
| Configuration-011 | Low | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` | The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test… |
| Core-004 | Low | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` | `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. T… |
| Core-008 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` | The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception fro… |
| Core-009 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` | `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path i… |
| Core-010 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` | `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will m… |
| Core-011 | Low | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` | `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for determ… |
| Core-012 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` | Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is doc… |
| Core.Abstractions-004 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40` | `Register` performs a check-then-act sequence (`snapshot.ContainsKey` then build `next` then `Interlocked.Exchange`) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a `next` diction… |
| Core.Abstractions-005 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99` | Both the initial-poll and steady-state catch blocks use a bare `catch { }` that swallows every exception type, including non-transient programmer errors such as `NullReferenceException` and `ArgumentOutOfRangeException` (see Core.Abstracti… |
| Core.Abstractions-006 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63` | The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. `IHistoryProvider.ReadRawAsync` and `IHistorianDataSource.ReadRawAsync` take `uint maxValuesPerNode`, but `ReadEventsAsync` (on both interfac… |
| Core.Abstractions-007 | Low | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs` | `PollGroupEngine` is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are… |
| Core.Abstractions-008 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69` | Two XML-doc inaccuracies: 1. `DriverHealth.LastError` is documented as "Most recent error message; null when state is Healthy." The `DriverState` enum also defines `Degraded`, `Reconnecting`, and `Faulted` states, all of which carry an err… |
| Core.AlarmHistorian-008 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,255-278` | Each `EnqueueAsync` (one per alarm transition — a hot path on a busy plant) opens a connection, runs `EnforceCapacity` (a `COUNT(*)` over the queue table on every single enqueue), serializes JSON, inserts, and closes the connection. The un… |
| Core.AlarmHistorian-011 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs:5-9,76`, `AlarmHistorianEvent.cs:20` | Several doc-comments reference the retired v1 architecture. The `IAlarmHistorianSink` summary says ingestion "routes through Galaxy.Host's pipe" and `IAlarmHistorianWriter` says "Stream G wires this to the Galaxy.Host IPC client", but `doc… |
| Core.ScriptedAlarms-003 | Low | Documentation & comments | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` | `docs/ScriptedAlarms.md` (Composition step 3) and the `OnUpstreamChange` comment ("Fire-and-forget so driver-side dispatch isn't blocked", line 225-226) describe the `OnEvent` emission path as non-blocking / fire-and-forget. In the code, `… |
| Core.ScriptedAlarms-006 | Low | Concurrency & thread safety | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` | `OnUpstreamChange` and `RunShelvingCheck` both launch fire-and-forget tasks (`_ = ReevaluateAsync(...)`, `_ = ShelvingCheckAsync(...)`) with `CancellationToken.None`. There is no tracking of these in-flight tasks, so `Dispose` cannot await… |
| Core.ScriptedAlarms-008 | Low | Performance & resource management | `Part9StateMachine.cs:261-268` | `AppendComment` copies the entire existing comment list into a new `List` on every audit-producing transition (ack, confirm, shelve, unshelve, enable, disable, add-comment, auto-unshelve). The `Comments` list is append-only and unbounded —… |
@@ -390,3 +367,26 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-010 | Medium | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` |
| Server-011 | Medium | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` |
| Server-013 | Medium | Resolved | Design-document adherence | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` |
| Analyzers-002 | Low | Resolved | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:46-50,130` |
| Analyzers-003 | Low | Resolved | Error handling & resilience | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:80,114-116` |
| Analyzers-004 | Low | Resolved | Performance & resource management | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` |
| Analyzers-005 | Low | Resolved | Design-document adherence | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` |
| Analyzers-007 | Low | Resolved | Documentation & comments | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` |
| Configuration-004 | Low | Resolved | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
| Configuration-005 | Low | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
| Configuration-007 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
| Configuration-010 | Low | Resolved | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` |
| Configuration-011 | Low | Resolved | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` |
| Core-004 | Low | Resolved | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` |
| Core-008 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
| Core-009 | Low | Resolved | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` |
| Core-010 | Low | Resolved | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` |
| Core-011 | Low | Resolved | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` |
| Core-012 | Low | Resolved | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` |
| Core.Abstractions-004 | Low | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40` |
| Core.Abstractions-005 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99` |
| Core.Abstractions-006 | Low | Resolved | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63` |
| Core.Abstractions-007 | Low | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs` |
| Core.Abstractions-008 | Low | Resolved | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69` |
| Core.AlarmHistorian-008 | Low | Resolved | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,255-278` |
| Core.AlarmHistorian-011 | Low | Resolved | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs:5-9,76`, `AlarmHistorianEvent.cs:20` |
@@ -19,6 +19,10 @@ public sealed class GenerationApplier(ApplyCallbacks callbacks) : IGenerationApp
foreach (var kind in new[] { ChangeKind.Added, ChangeKind.Modified })
{
// Honour cancellation between passes — a caller can abort the apply between Removed
// and Added phases even if individual callbacks don't observe the token themselves
// (Configuration-007).
ct.ThrowIfCancellationRequested();
await ApplyPass(diff.Namespaces, kind, callbacks.OnNamespace, errors, ct);
await ApplyPass(diff.Drivers, kind, callbacks.OnDriver, errors, ct);
await ApplyPass(diff.Devices, kind, callbacks.OnDevice, errors, ct);
@@ -42,6 +46,12 @@ public sealed class GenerationApplier(ApplyCallbacks callbacks) : IGenerationApp
foreach (var change in changes.Where(c => c.Kind == kind))
{
try { await callback(change, ct); }
// Configuration-007: cancellation must propagate, not be silently recorded as an
// entity error. Distinguish caller cancellation (token signalled) from any
// OperationCanceledException raised independently of the caller's token, which we
// still want to surface as an entity error so a single misbehaving callback does
// not crash the entire apply.
catch (OperationCanceledException) when (ct.IsCancellationRequested) { throw; }
catch (Exception ex) { errors.Add($"{typeof(T).Name} {change.Kind} '{change.LogicalId}': {ex.Message}"); }
}
}
@@ -5,7 +5,7 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration.Enums;
/// Stored as <c>int</c> bitmask in <see cref="Entities.NodeAcl.PermissionFlags"/>.
/// </summary>
[Flags]
public enum NodePermissions : uint
public enum NodePermissions : int
{
None = 0,
@@ -4,6 +4,13 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
/// Per-node local cache of the most-recently-applied generation(s). Used to bootstrap the
/// address space when the central DB is unreachable (decision #79 — degraded-but-running).
/// </summary>
/// <remarks>
/// <para><b>Concurrency contract:</b> implementations must serialize writes — specifically,
/// <see cref="PutAsync"/> for the same <c>(ClusterId, GenerationId)</c> from concurrent
/// callers must not produce duplicate rows. Reads may run concurrently with reads and writes.
/// The <see cref="LiteDbConfigCache"/> implementation enforces this via an instance-level
/// <see cref="SemaphoreSlim"/> around the find-then-insert/update window.</para>
/// </remarks>
public interface ILocalConfigCache
{
Task<GenerationSnapshot?> GetMostRecentAsync(string clusterId, CancellationToken ct = default);
@@ -13,6 +13,12 @@ public sealed class LiteDbConfigCache : ILocalConfigCache, IDisposable
private const string CollectionName = "generations";
private readonly LiteDatabase _db;
private readonly ILiteCollection<GenerationSnapshot> _col;
// PutAsync is a find-then-insert/update; without serialization, two concurrent puts for the
// same (ClusterId, GenerationId) can both observe `existing is null` and both Insert,
// producing duplicate rows (Configuration-005). Serialize writes through this semaphore so
// the read-modify-write block is atomic for a given instance. LiteDB itself only locks the
// page-level write, not the find-then-insert window.
private readonly SemaphoreSlim _writeGate = new(initialCount: 1, maxCount: 1);
public LiteDbConfigCache(string dbPath)
{
@@ -47,23 +53,32 @@ public sealed class LiteDbConfigCache : ILocalConfigCache, IDisposable
return Task.FromResult<GenerationSnapshot?>(snapshot);
}
public Task PutAsync(GenerationSnapshot snapshot, CancellationToken ct = default)
public async Task PutAsync(GenerationSnapshot snapshot, CancellationToken ct = default)
{
ct.ThrowIfCancellationRequested();
// upsert by (ClusterId, GenerationId) — replace in place if already cached
var existing = _col
.Find(s => s.ClusterId == snapshot.ClusterId && s.GenerationId == snapshot.GenerationId)
.FirstOrDefault();
if (existing is null)
_col.Insert(snapshot);
else
// Serialize the find-then-insert/update so concurrent callers do not observe a stale
// `existing is null` and both Insert (Configuration-005). LiteDB's per-call lock is
// not enough — the read and the write are independent calls.
await _writeGate.WaitAsync(ct).ConfigureAwait(false);
try
{
snapshot.Id = existing.Id;
_col.Update(snapshot);
}
// upsert by (ClusterId, GenerationId) — replace in place if already cached
var existing = _col
.Find(s => s.ClusterId == snapshot.ClusterId && s.GenerationId == snapshot.GenerationId)
.FirstOrDefault();
return Task.CompletedTask;
if (existing is null)
_col.Insert(snapshot);
else
{
snapshot.Id = existing.Id;
_col.Update(snapshot);
}
}
finally
{
_writeGate.Release();
}
}
public Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10, CancellationToken ct = default)
@@ -82,7 +97,11 @@ public sealed class LiteDbConfigCache : ILocalConfigCache, IDisposable
return Task.CompletedTask;
}
public void Dispose() => _db.Dispose();
public void Dispose()
{
_writeGate.Dispose();
_db.Dispose();
}
}
public sealed class LocalConfigCacheCorruptException(string message, Exception inner)
@@ -1,3 +1,4 @@
using System.Text.RegularExpressions;
using Microsoft.Extensions.Logging;
using Polly;
using Polly.Retry;
@@ -61,6 +62,23 @@ public sealed class ResilientConfigReader
_pipeline = builder.Build();
}
/// <summary>
/// Configuration-010: redact connection-string fragments (Password, User Id, Pwd, etc.)
/// that a caller's exception message could carry. Conservative regex pass — anything
/// matching <c>Key=Value</c> with a known credential key gets its value replaced.
/// </summary>
private static readonly Regex SecretsRegex = new(
@"(?ix)\b(Password|Pwd|User\s*Id|Uid|AccessToken|Authorization|Api[-_]?Key)\s*=\s*[^;,)\s]*",
RegexOptions.Compiled);
internal static string ScrubSecrets(string? message)
{
if (string.IsNullOrEmpty(message)) return message ?? string.Empty;
// Replace the entire matched fragment (key + value) with a redaction marker so the
// key name itself doesn't leak — log scrapers grep for "Password=" too.
return SecretsRegex.Replace(message, "[redacted credential]");
}
/// <summary>
/// Execute <paramref name="centralFetch"/> through the resilience pipeline. On full failure
/// (post-retry), reads the sealed cache for <paramref name="clusterId"/> and passes the
@@ -88,7 +106,15 @@ public sealed class ResilientConfigReader
// that case, not propagate. Only rethrow if the caller actually requested cancellation.
catch (Exception ex) when (ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested)
{
_logger.LogWarning(ex, "Central-DB read failed after retries; falling back to sealed cache for cluster {ClusterId}", clusterId);
// Configuration-010: do NOT pass the raw exception object — it carries the stack
// and inner-exception chain, and SqlException/wrapping delegates can surface
// connection-string fragments (Password=…, User Id=…) embedded in messages.
// Log only the exception type and a scrubbed message so secrets stay out of logs.
_logger.LogWarning(
"Central-DB read failed after retries ({ExceptionType}: {SanitizedMessage}); falling back to sealed cache for cluster {ClusterId}",
ex.GetType().Name,
ScrubSecrets(ex.Message),
clusterId);
// GenerationCacheUnavailableException surfaces intentionally — fails the caller's
// operation. StaleConfigFlag stays unchanged; the flag only flips when we actually
// served a cache snapshot.
@@ -6,7 +6,15 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
/// </summary>
/// <param name="State">Current driver-instance state.</param>
/// <param name="LastSuccessfulRead">Timestamp of the most recent successful equipment read; null if never.</param>
/// <param name="LastError">Most recent error message; null when state is Healthy.</param>
/// <param name="LastError">
/// Most recent error message; null when no error has been recorded. The type makes no
/// guarantee about correlation with <paramref name="State"/> — a driver in
/// <see cref="DriverState.Healthy"/> may legitimately retain the last error from a recovered
/// failure (useful for diagnostics), and <see cref="DriverState.Degraded"/> /
/// <see cref="DriverState.Reconnecting"/> / <see cref="DriverState.Faulted"/> states may all
/// carry a non-null message. Callers must not key behaviour on the LastError-null ↔ Healthy
/// pairing (Core.Abstractions-008).
/// </param>
public sealed record DriverHealth(
DriverState State,
DateTime? LastSuccessfulRead,
@@ -10,33 +10,46 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
/// and #111 (driver type → namespace kind mapping enforced by sp_ValidateDraft).
/// The registry is the source of truth for both checks.
///
/// Thread-safety: registration happens at startup (single thread); lookups happen on every
/// config-apply (multi-threaded). The internal dictionary is replaced atomically via
/// <see cref="System.Threading.Interlocked"/> on register; readers see a stable snapshot.
/// Thread-safety: registration is typically single-threaded at startup; lookups happen on
/// every config-apply (multi-threaded). The check-then-act inside <see cref="Register"/> is
/// guarded by a private lock so concurrent registrations are atomic — the "registered only
/// once per process" guarantee holds even if two callers race. Readers operate against the
/// volatile snapshot reference produced by the last successful <see cref="Register"/> and
/// never block.
/// </remarks>
public sealed class DriverTypeRegistry
{
private readonly Lock _writeLock = new();
private IReadOnlyDictionary<string, DriverTypeMetadata> _types =
new Dictionary<string, DriverTypeMetadata>(StringComparer.OrdinalIgnoreCase);
/// <summary>Register a driver type. Throws if the type name is already registered.</summary>
/// <remarks>
/// The check-then-act (duplicate check → copy-on-write rebuild → swap) is performed under
/// <see cref="_writeLock"/> so concurrent <see cref="Register"/> calls cannot silently
/// discard each other's registrations — see Core.Abstractions-004.
/// </remarks>
public void Register(DriverTypeMetadata metadata)
{
ArgumentNullException.ThrowIfNull(metadata);
var snapshot = _types;
if (snapshot.ContainsKey(metadata.TypeName))
lock (_writeLock)
{
throw new InvalidOperationException(
$"Driver type '{metadata.TypeName}' is already registered. " +
$"Each driver type may be registered only once per process.");
}
var snapshot = _types;
if (snapshot.ContainsKey(metadata.TypeName))
{
throw new InvalidOperationException(
$"Driver type '{metadata.TypeName}' is already registered. " +
$"Each driver type may be registered only once per process.");
}
var next = new Dictionary<string, DriverTypeMetadata>(snapshot, StringComparer.OrdinalIgnoreCase)
{
[metadata.TypeName] = metadata,
};
Interlocked.Exchange(ref _types, next);
var next = new Dictionary<string, DriverTypeMetadata>(snapshot, StringComparer.OrdinalIgnoreCase)
{
[metadata.TypeName] = metadata,
};
_types = next;
}
}
/// <summary>Look up a driver type by name. Throws if unknown.</summary>
@@ -59,6 +59,21 @@ public interface IHistorianDataSource : IDisposable
/// Distinct from any live event stream; sources here come from the historian's
/// event log. <paramref name="sourceName"/> is null to return all sources.
/// </summary>
/// <remarks>
/// Note on parameter types — <paramref name="maxEvents"/> is <see cref="int"/> (not
/// <see cref="uint"/>) so callers can pass <c>0</c> or a negative value as a "use the
/// backend's default cap" sentinel; see <c>WonderwareHistorianClient</c> /
/// <c>HistorianDataSource</c> and Core.Abstractions-006 for the rationale. The sibling
/// <see cref="ReadRawAsync"/> / <see cref="ReadProcessedAsync"/> use
/// <c>uint maxValuesPerNode</c> because their OPC UA HistoryRead surface has no
/// equivalent "use default" sentinel.
///
/// This surface declares <see cref="ReadAtTimeAsync"/> and <see cref="ReadEventsAsync"/>
/// as required members — a server-side historian owns the full read surface, unlike
/// <see cref="IHistoryProvider"/> where the same two methods are optional default-impl
/// methods so legacy drivers can stay raw-only. The asymmetry is intentional
/// (Core.Abstractions-008).
/// </remarks>
Task<HistoricalEventsResult> ReadEventsAsync(
string? sourceName,
DateTime startUtc,
@@ -6,6 +6,14 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
/// Galaxy (Wonderware Historian via the optional plugin), OPC UA Client (forward
/// to upstream server).
/// </summary>
/// <remarks>
/// <see cref="ReadAtTimeAsync"/> and <see cref="ReadEventsAsync"/> are C# default interface
/// methods that throw <see cref="NotSupportedException"/> — drivers opt in by overriding so
/// a raw-only driver compiles without forcing it to provide at-time / event surfaces it
/// has no backend for. The sibling server-side surface, <see cref="IHistorianDataSource"/>,
/// declares both methods as required because a registered historian owns the full read
/// surface; the asymmetry is intentional (Core.Abstractions-008).
/// </remarks>
public interface IHistoryProvider
{
/// <summary>
@@ -60,12 +68,24 @@ public interface IHistoryProvider
/// </param>
/// <param name="startUtc">Inclusive lower bound on <c>EventTimeUtc</c>.</param>
/// <param name="endUtc">Exclusive upper bound on <c>EventTimeUtc</c>.</param>
/// <param name="maxEvents">Upper cap on returned events — the driver's backend enforces this.</param>
/// <param name="maxEvents">
/// Upper cap on returned events — the driver's backend enforces this. The type is
/// <see cref="int"/> rather than <see cref="uint"/> (which the sibling raw / processed
/// reads use for <c>maxValuesPerNode</c>) because callers and downstream historian
/// adapters historically treat <c>maxEvents &lt;= 0</c> as a sentinel meaning
/// "use the backend's default cap" (see <c>WonderwareHistorianClient</c> /
/// <c>HistorianDataSource</c>). The asymmetry is intentional — Core.Abstractions-006.
/// </param>
/// <param name="cancellationToken">Request cancellation.</param>
/// <remarks>
/// Default implementation throws. Only drivers with an event historian (Galaxy via the
/// Wonderware Alarm &amp; Events log) override. Modbus / the OPC UA Client driver stay
/// with the default and let callers see <c>BadHistoryOperationUnsupported</c>.
///
/// Note the type asymmetry with <see cref="ReadRawAsync"/> /
/// <see cref="ReadProcessedAsync"/> (both use <c>uint maxValuesPerNode</c>): event
/// readers accept a signed <c>int maxEvents</c> so callers can pass 0 / negative as a
/// "use default cap" sentinel without an extra parameter or overload.
/// </remarks>
Task<HistoricalEventsResult> ReadEventsAsync(
string? sourceName,
@@ -19,14 +19,21 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
/// from the previously-seen snapshot.</para>
///
/// <para>Exceptions thrown by the reader on the initial poll or any subsequent poll are
/// swallowed — the loop continues on the next tick. The driver's own health surface is
/// where transient poll failures should be reported; the engine intentionally does not
/// double-book that responsibility.</para>
/// caught — the loop continues on the next tick. When an <c>onError</c> callback is supplied
/// to the constructor the caught exception is routed to it so the driver's health surface
/// can record the failure. Without an <c>onError</c> callback the exception is silently
/// swallowed (preserves the original behaviour for drivers that have not opted in yet).</para>
///
/// <para>Programmer errors and obviously-fatal exceptions (<see cref="OutOfMemoryException"/>,
/// <see cref="ThreadAbortException"/>, <see cref="StackOverflowException"/>,
/// <see cref="AccessViolationException"/>) are NOT caught — they propagate and tear the poll
/// loop down rather than spin a silently-broken subscription.</para>
/// </remarks>
public sealed class PollGroupEngine : IAsyncDisposable
{
private readonly Func<IReadOnlyList<string>, CancellationToken, Task<IReadOnlyList<DataValueSnapshot>>> _reader;
private readonly Action<ISubscriptionHandle, string, DataValueSnapshot> _onChange;
private readonly Action<Exception>? _onError;
private readonly TimeSpan _minInterval;
private readonly ConcurrentDictionary<long, SubscriptionState> _subscriptions = new();
private long _nextId;
@@ -40,15 +47,21 @@ public sealed class PollGroupEngine : IAsyncDisposable
/// <see cref="ISubscribable.OnDataChange"/> event.</param>
/// <param name="minInterval">Interval floor; anything below is clamped. Defaults to 100 ms
/// per <see cref="DefaultMinInterval"/>.</param>
/// <param name="onError">Optional error sink — invoked once per caught reader exception (or
/// internal contract-violation throw) so the owning driver can route the failure to its
/// health surface (Core.Abstractions-005). Defensive: an <c>onError</c> handler that
/// itself throws is silently absorbed so a buggy forwarder cannot crash the poll loop.</param>
public PollGroupEngine(
Func<IReadOnlyList<string>, CancellationToken, Task<IReadOnlyList<DataValueSnapshot>>> reader,
Action<ISubscriptionHandle, string, DataValueSnapshot> onChange,
TimeSpan? minInterval = null)
TimeSpan? minInterval = null,
Action<Exception>? onError = null)
{
ArgumentNullException.ThrowIfNull(reader);
ArgumentNullException.ThrowIfNull(onChange);
_reader = reader;
_onChange = onChange;
_onError = onError;
_minInterval = minInterval ?? DefaultMinInterval;
}
@@ -102,19 +115,54 @@ public sealed class PollGroupEngine : IAsyncDisposable
// whether it has changed, satisfying OPC UA Part 4 initial-value semantics.
try { await PollOnceAsync(state, forceRaise: true, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
catch { /* first-read error tolerated — loop continues */ }
catch (Exception ex) when (!IsFatal(ex))
{
// first-read error tolerated — loop continues; forward to driver health surface.
ReportError(ex);
}
while (!ct.IsCancellationRequested)
{
try { await Task.Delay(state.Interval, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
// Defensive: the CTS may be disposed by Unsubscribe/DisposeAsync between the
// cancellation check above and the Task.Delay touching the token. Treat that race
// as a normal cancellation rather than a fatal exception.
catch (ObjectDisposedException) { return; }
try { await PollOnceAsync(state, forceRaise: false, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
catch { /* transient poll error — loop continues, driver health surface logs it */ }
catch (Exception ex) when (!IsFatal(ex))
{
// transient poll error — loop continues, driver health surface logs it
// via the supplied onError callback (Core.Abstractions-005).
ReportError(ex);
}
}
}
/// <summary>
/// Programmer-error / process-fatal exception classification: anything that cannot be
/// safely "swallowed and retry on the next tick" must escape the poll loop instead.
/// </summary>
private static bool IsFatal(Exception ex)
=> ex is OutOfMemoryException
or StackOverflowException
or AccessViolationException
or ThreadAbortException;
/// <summary>
/// Forward a caught exception to the optional <c>onError</c> callback. Defensive
/// against an <c>onError</c> implementation that itself throws — that would crash the
/// poll loop and re-introduce the silent-stall failure mode this method exists to prevent.
/// </summary>
private void ReportError(Exception ex)
{
if (_onError is null) return;
try { _onError(ex); }
catch { /* never let a buggy error sink stop the poll loop */ }
}
private async Task PollOnceAsync(SubscriptionState state, bool forceRaise, CancellationToken ct)
{
var snapshots = await _reader(state.TagReferences, ct).ConfigureAwait(false);
@@ -87,6 +87,25 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
// having to scrape the WARN log.
private long _evictedCount;
// Core.AlarmHistorian-008: keep an approximate in-memory count of non-dead-lettered
// rows so EnqueueAsync does not need to run a SELECT COUNT(*) on every call. The
// counter is seeded from storage at construction, kept current by every mutation
// (Enqueue, Drain, RetryDeadLettered, PurgeAgedDeadLetters, EnforceCapacity), and
// periodically re-synced from storage as a safety net against drift.
// Mutations cross threads (EnqueueAsync is called from the emitting thread, drain
// runs on the timer / drain thread) so it is updated via Interlocked.
private long _queuedRowCount;
// Probe counter — incremented every time we actually issue a real COUNT(*) for
// capacity enforcement. Public for test instrumentation only.
private long _capacityProbeCount;
// After every Nth enqueue we resync the in-memory counter from storage to defend
// against silent drift (e.g. an external process editing the DB).
private const long ResyncEnqueueInterval = 10_000;
private long _enqueuesSinceResync;
/// <summary>Test-only: number of times the perf-optimised path fell through to a real <c>COUNT(*)</c>.</summary>
public long DebugCapacityProbeCount => Interlocked.Read(ref _capacityProbeCount);
public SqliteStoreAndForwardSink(
string databasePath,
IAlarmHistorianWriter writer,
@@ -115,6 +134,9 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
}.ToString();
InitializeSchema();
// Core.AlarmHistorian-008: seed the in-memory counter from storage so the
// perf-optimised EnqueueAsync path starts in sync with what's on disk.
_queuedRowCount = ProbeQueuedRowCount();
}
/// <summary>
@@ -223,7 +245,11 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
await ApplyPragmasAsync(conn, cancellationToken).ConfigureAwait(false);
await EnforceCapacityAsync(conn, cancellationToken).ConfigureAwait(false);
// Core.AlarmHistorian-008: use the in-memory counter to short-circuit the
// capacity check on every enqueue. The bare hot path is now one INSERT — no
// SELECT COUNT(*). We fall back to a real probe only when the cached counter
// says we're at or above capacity, or periodically to defend against drift.
await EnforceCapacityFastPathAsync(conn, cancellationToken).ConfigureAwait(false);
using var cmd = conn.CreateCommand();
cmd.CommandText = """
@@ -234,6 +260,57 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
cmd.Parameters.AddWithValue("$enqueued", _clock().ToString("O"));
cmd.Parameters.AddWithValue("$payload", JsonSerializer.Serialize(evt));
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
Interlocked.Increment(ref _queuedRowCount);
}
/// <summary>
/// Capacity enforcement on the hot enqueue path: consults the in-memory counter
/// first and only probes storage with a real <c>COUNT(*)</c> when (a) the
/// cached value indicates the capacity wall is in reach, or (b) the periodic
/// resync interval has elapsed. The actual eviction (when over capacity) goes
/// through <see cref="EnforceCapacityAsync"/> which still runs a precise
/// COUNT to compute the exact number of rows to evict.
/// </summary>
private async Task EnforceCapacityFastPathAsync(SqliteConnection conn, CancellationToken ct)
{
var enqueuesSinceResync = Interlocked.Increment(ref _enqueuesSinceResync);
var cached = Interlocked.Read(ref _queuedRowCount);
// Periodic resync — bounded amount of drift even under exotic conditions.
if (enqueuesSinceResync >= ResyncEnqueueInterval)
{
await ResyncQueuedRowCountAsync(conn, ct).ConfigureAwait(false);
cached = Interlocked.Read(ref _queuedRowCount);
Interlocked.Exchange(ref _enqueuesSinceResync, 0);
}
// Below capacity per the cached counter — skip the COUNT(*) entirely.
if (cached < _capacity) return;
// Cached counter says we're at or above the capacity wall — fall back to the
// precise path which probes COUNT(*) and evicts whatever's needed.
await EnforceCapacityAsync(conn, ct).ConfigureAwait(false);
}
/// <summary>Synchronously query <c>COUNT(*)</c> of non-dead-lettered rows. Used at startup.</summary>
private long ProbeQueuedRowCount()
{
Interlocked.Increment(ref _capacityProbeCount);
using var conn = OpenConnection();
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
return (long)(cmd.ExecuteScalar() ?? 0L);
}
/// <summary>Re-sync the in-memory counter from storage (async path).</summary>
private async Task ResyncQueuedRowCountAsync(SqliteConnection conn, CancellationToken ct)
{
Interlocked.Increment(ref _capacityProbeCount);
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
var live = (long)(await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false) ?? 0L);
Interlocked.Exchange(ref _queuedRowCount, live);
}
/// <summary>
@@ -242,6 +319,12 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
/// on RetryPlease. Safe to call from multiple threads; the semaphore enforces
/// serial execution.
/// </summary>
/// <remarks>
/// Core.AlarmHistorian-008: every per-tick SQLite operation runs through a
/// single shared connection (purge, read, corrupt-row dead-letter, and the
/// outcome-applying transaction). Pre-fix the drain opened three independent
/// connections per tick, each paying the open + PRAGMA cost.
/// </remarks>
public async Task DrainOnceAsync(CancellationToken ct)
{
if (_disposed) return;
@@ -254,8 +337,12 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
_lastDrainUtc = _clock();
}
PurgeAgedDeadLetters();
var batch = ReadBatch();
// One connection per drain tick — used by purge, read, corrupt-dead-letter,
// and the outcome-applying transaction.
using var conn = OpenConnection();
PurgeAgedDeadLetters(conn);
var batch = ReadBatch(conn);
if (batch.Count == 0)
{
lock (_statusLock) { _drainState = HistorianDrainState.Idle; }
@@ -271,11 +358,13 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
if (corruptRowIds.Count > 0)
{
using var corruptConn = OpenConnection();
using var corruptTx = corruptConn.BeginTransaction();
using var corruptTx = conn.BeginTransaction();
foreach (var rowId in corruptRowIds)
DeadLetterRow(corruptConn, corruptTx, rowId, $"corrupt payload at {_clock():O}");
DeadLetterRow(conn, corruptTx, rowId, $"corrupt payload at {_clock():O}");
corruptTx.Commit();
// Each corrupt row leaves the non-dead-lettered queue — bookkeeping for
// the in-memory counter (Core.AlarmHistorian-008).
Interlocked.Add(ref _queuedRowCount, -corruptRowIds.Count);
_logger.Warning(
"Dead-lettered {Count} historian queue row(s) with un-deserializable payload",
corruptRowIds.Count);
@@ -330,26 +419,34 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
return;
}
using var conn = OpenConnection();
using var tx = conn.BeginTransaction();
for (var i = 0; i < outcomes.Count; i++)
int rowsLeavingQueue = 0;
using (var tx = conn.BeginTransaction())
{
var outcome = outcomes[i];
var rowId = liveRows[i].RowId;
switch (outcome)
for (var i = 0; i < outcomes.Count; i++)
{
case HistorianWriteOutcome.Ack:
DeleteRow(conn, tx, rowId);
break;
case HistorianWriteOutcome.PermanentFail:
DeadLetterRow(conn, tx, rowId, $"permanent fail at {_clock():O}");
break;
case HistorianWriteOutcome.RetryPlease:
BumpAttempt(conn, tx, rowId, "retry-please");
break;
var outcome = outcomes[i];
var rowId = liveRows[i].RowId;
switch (outcome)
{
case HistorianWriteOutcome.Ack:
DeleteRow(conn, tx, rowId);
rowsLeavingQueue++;
break;
case HistorianWriteOutcome.PermanentFail:
DeadLetterRow(conn, tx, rowId, $"permanent fail at {_clock():O}");
rowsLeavingQueue++;
break;
case HistorianWriteOutcome.RetryPlease:
BumpAttempt(conn, tx, rowId, "retry-please");
break;
}
}
tx.Commit();
}
tx.Commit();
// Ack-deleted + PermanentFail-dead-lettered rows both leave the
// non-dead-lettered queue — keep the counter aligned (Core.AlarmHistorian-008).
if (rowsLeavingQueue > 0)
Interlocked.Add(ref _queuedRowCount, -rowsLeavingQueue);
var acks = outcomes.Count(o => o == HistorianWriteOutcome.Ack);
lock (_statusLock)
@@ -375,15 +472,15 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
public HistorianSinkStatus GetStatus()
{
using var conn = OpenConnection();
// Core.AlarmHistorian-008: read the non-dead-lettered count from the in-memory
// counter so a busy Admin UI / health probe does not hammer the DB. Dead-letter
// depth is rare-path only (it lives in the queue until retention) so a real
// COUNT(*) on a single combined connection is fine.
var queued = Interlocked.Read(ref _queuedRowCount);
if (queued < 0) queued = 0;
long queued;
long deadlettered;
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
queued = (long)(cmd.ExecuteScalar() ?? 0L);
}
using (var conn = OpenConnection())
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 1";
@@ -421,7 +518,11 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
using var conn = OpenConnection();
using var cmd = conn.CreateCommand();
cmd.CommandText = "UPDATE Queue SET DeadLettered = 0, AttemptCount = 0, LastError = NULL WHERE DeadLettered = 1";
return cmd.ExecuteNonQuery();
var revived = cmd.ExecuteNonQuery();
// Dead-lettered rows rejoin the non-dead-lettered queue — keep the in-memory
// counter aligned (Core.AlarmHistorian-008).
if (revived > 0) Interlocked.Add(ref _queuedRowCount, revived);
return revived;
}
/// <summary>
@@ -432,10 +533,9 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
/// </summary>
private readonly record struct QueueRow(long RowId, AlarmHistorianEvent? Event);
private List<QueueRow> ReadBatch()
private List<QueueRow> ReadBatch(SqliteConnection conn)
{
var rows = new List<QueueRow>();
using var conn = OpenConnection();
using var cmd = conn.CreateCommand();
cmd.CommandText = """
SELECT RowId, PayloadJson FROM Queue
@@ -501,50 +601,21 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
cmd.ExecuteNonQuery();
}
private void EnforceCapacity(SqliteConnection conn)
{
// Count non-dead-lettered rows only — dead-lettered rows retain for
// post-mortem per the configured retention window.
long count;
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
count = (long)(cmd.ExecuteScalar() ?? 0L);
}
if (count < _capacity) return;
var toEvict = count - _capacity + 1;
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = """
DELETE FROM Queue
WHERE RowId IN (
SELECT RowId FROM Queue
WHERE DeadLettered = 0
ORDER BY RowId ASC
LIMIT $n
)
""";
cmd.Parameters.AddWithValue("$n", toEvict);
cmd.ExecuteNonQuery();
}
// Core.AlarmHistorian-009: increment the lifetime eviction counter so the
// Admin UI / health check can report overflow without requiring log scraping.
lock (_statusLock) { _evictedCount += toEvict; }
_logger.Warning(
"Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room (lifetime evictions: {Total})",
_capacity, toEvict, _evictedCount);
}
// Async variant used by EnqueueAsync (Core.AlarmHistorian-003).
// Core.AlarmHistorian-008: the precise path — runs COUNT(*) to compute the exact
// number of rows to evict. Reached only from the fast-path fallback when the
// in-memory counter says we are at or above capacity.
private async Task EnforceCapacityAsync(SqliteConnection conn, CancellationToken ct)
{
Interlocked.Increment(ref _capacityProbeCount);
long count;
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
count = (long)(await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false) ?? 0L);
}
// Resync the in-memory counter while we have a fresh number.
Interlocked.Exchange(ref _queuedRowCount, count);
if (count < _capacity) return;
var toEvict = count - _capacity + 1;
@@ -562,16 +633,16 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
cmd.Parameters.AddWithValue("$n", toEvict);
await cmd.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
}
Interlocked.Add(ref _queuedRowCount, -toEvict);
lock (_statusLock) { _evictedCount += toEvict; }
_logger.Warning(
"Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room (lifetime evictions: {Total})",
_capacity, toEvict, _evictedCount);
}
private void PurgeAgedDeadLetters()
private void PurgeAgedDeadLetters(SqliteConnection conn)
{
var cutoff = (_clock() - _deadLetterRetention).ToString("O");
using var conn = OpenConnection();
using var cmd = conn.CreateCommand();
cmd.CommandText = """
DELETE FROM Queue
@@ -26,11 +26,27 @@ public static class PermissionTrieBuilder
/// Build a trie for one cluster/generation from the supplied rows. The caller is
/// responsible for pre-filtering rows to the target generation + cluster.
/// </summary>
/// <param name="clusterId">Cluster the trie is being built for; rows for other clusters are skipped.</param>
/// <param name="generationId">Config-generation the rows belong to; stamped on the returned trie.</param>
/// <param name="rows">ACL rows for this cluster + generation.</param>
/// <param name="scopePaths">
/// Optional <c>ScopeId</c> → multi-level trie-path lookup. When supplied, sub-cluster rows
/// descend to their structurally-correct trie node. When null, sub-cluster rows fall back
/// to a direct child of the trie root keyed on <c>ScopeId</c> — deterministic-test mode.
/// </param>
/// <param name="diagnostic">
/// Optional callback invoked when a sub-cluster row's <c>ScopeId</c> cannot be located
/// in <paramref name="scopePaths"/>. Production callers should wire a logger here so
/// orphaned grants surface — silently dropping them under the wrong trie level was the
/// Core-011 production hazard. The callback fires only when <paramref name="scopePaths"/>
/// is non-null (a null lookup is the explicit deterministic-test fallback mode).
/// </param>
public static PermissionTrie Build(
string clusterId,
long generationId,
IReadOnlyList<NodeAcl> rows,
IReadOnlyDictionary<string, NodeAclPath>? scopePaths = null)
IReadOnlyDictionary<string, NodeAclPath>? scopePaths = null,
Action<PermissionTrieBuildDiagnostic>? diagnostic = null)
{
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
ArgumentNullException.ThrowIfNull(rows);
@@ -45,7 +61,7 @@ public static class PermissionTrieBuilder
var node = row.ScopeKind switch
{
NodeAclScopeKind.Cluster => trie.Root,
_ => Descend(trie.Root, row, scopePaths),
_ => Descend(trie.Root, row, scopePaths, diagnostic),
};
if (node is not null)
@@ -55,16 +71,30 @@ public static class PermissionTrieBuilder
return trie;
}
private static PermissionTrieNode? Descend(PermissionTrieNode root, NodeAcl row, IReadOnlyDictionary<string, NodeAclPath>? scopePaths)
private static PermissionTrieNode? Descend(
PermissionTrieNode root,
NodeAcl row,
IReadOnlyDictionary<string, NodeAclPath>? scopePaths,
Action<PermissionTrieBuildDiagnostic>? diagnostic)
{
if (string.IsNullOrEmpty(row.ScopeId)) return null;
// For sub-cluster scopes the caller supplies a path lookup so we know the containing
// namespace / UnsArea / UnsLine ids. Without a path lookup we fall back to putting the
// row directly under the root using its ScopeId — works for deterministic tests, not
// for production where the hierarchy must be honored.
// for production where the hierarchy must be honored. If a scopePaths lookup IS
// provided but is missing the row's ScopeId, surface a diagnostic so the caller can
// log the orphan instead of silently dropping the grant under an unreachable node.
if (scopePaths is null || !scopePaths.TryGetValue(row.ScopeId, out var path))
{
if (scopePaths is not null)
{
diagnostic?.Invoke(new PermissionTrieBuildDiagnostic(
NodeAclId: row.NodeAclId,
ScopeKind: row.ScopeKind,
ScopeId: row.ScopeId,
Reason: PermissionTrieBuildDiagnosticReason.MissingScopePath));
}
return EnsureChild(root, row.ScopeId);
}
@@ -95,3 +125,30 @@ public static class PermissionTrieBuilder
/// applicable; or (for SystemPlatform kind) NamespaceId / FolderSegment / .../TagId.
/// </param>
public sealed record NodeAclPath(IReadOnlyList<string> Segments);
/// <summary>
/// Diagnostic emitted by <see cref="PermissionTrieBuilder.Build"/> when a row could not be
/// placed at its structurally-correct trie node. Production callers should log these so
/// orphaned grants surface instead of being silently dropped under an unreachable node
/// (Core-011).
/// </summary>
/// <param name="NodeAclId">The offending row's logical id.</param>
/// <param name="ScopeKind">The row's <see cref="NodeAclScopeKind"/>.</param>
/// <param name="ScopeId">The row's <c>ScopeId</c> that could not be located.</param>
/// <param name="Reason">Why the diagnostic fired.</param>
public sealed record PermissionTrieBuildDiagnostic(
string NodeAclId,
NodeAclScopeKind ScopeKind,
string ScopeId,
PermissionTrieBuildDiagnosticReason Reason);
/// <summary>Reasons <see cref="PermissionTrieBuildDiagnostic"/> can be emitted.</summary>
public enum PermissionTrieBuildDiagnosticReason
{
/// <summary>
/// The row's <c>ScopeId</c> was not present in the supplied <c>scopePaths</c> lookup.
/// The grant is placed as a direct child of the trie root keyed on <c>ScopeId</c> — a
/// position the production trie walker cannot reach for multi-level scopes.
/// </summary>
MissingScopePath,
}
@@ -52,7 +52,7 @@ public sealed class DriverHost : IAsyncDisposable
_drivers[id] = driver;
}
try { await driver.InitializeAsync(driverConfigJson, ct); }
try { await driver.InitializeAsync(driverConfigJson, ct).ConfigureAwait(false); }
catch
{
// Keep the driver registered — operator will see Faulted state and can reinitialize.
@@ -69,7 +69,7 @@ public sealed class DriverHost : IAsyncDisposable
_drivers.Remove(driverInstanceId);
}
try { await driver.ShutdownAsync(ct); }
try { await driver.ShutdownAsync(ct).ConfigureAwait(false); }
catch { /* shutdown is best-effort; logs elsewhere */ }
}
@@ -84,7 +84,7 @@ public sealed class DriverHost : IAsyncDisposable
foreach (var driver in snapshot)
{
try { await driver.ShutdownAsync(CancellationToken.None); } catch { /* ignore */ }
try { await driver.ShutdownAsync(CancellationToken.None).ConfigureAwait(false); } catch { /* ignore */ }
(driver as IDisposable)?.Dispose();
}
}
@@ -15,11 +15,13 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Observability;
/// → /readyz 503 (not yet ready).</item>
/// <item><see cref="DriverState.Healthy"/> → /readyz 200.</item>
/// <item><see cref="DriverState.Degraded"/> → /readyz 200 with flagged driver IDs.</item>
/// <item><see cref="DriverState.Reconnecting"/> → /readyz 200 with flagged driver IDs
/// (driver alive but not serving live data; same verdict as Degraded).</item>
/// <item><see cref="DriverState.Faulted"/> → /readyz 503.</item>
/// </list>
/// The overall verdict is computed across the fleet: any Faulted → Faulted; any
/// Unknown/Initializing → NotReady; any Degraded → Degraded; else Healthy. An empty fleet
/// is Healthy (nothing to degrade).
/// Unknown/Initializing → NotReady; any Degraded or Reconnecting → Degraded; else
/// Healthy. An empty fleet is Healthy (nothing to degrade).
/// </remarks>
public static class DriverHealthReport
{
@@ -39,8 +39,11 @@ public class GenericDriverNodeManager(IDriver driver) : IDisposable
/// If called a second time (e.g. Galaxy redeploy via <c>IRediscoverable.OnRediscoveryNeeded</c>)
/// the previous alarm subscription is torn down and the sink registry is cleared before
/// re-walking, preventing double delivery of alarm transitions.
/// Exception isolation (marking the driver's subtree Faulted) is the caller's responsibility —
/// exceptions from <see cref="ITagDiscovery.DiscoverAsync"/> propagate to the caller.
/// Exception isolation (per decision #12 — marking the driver's subtree Faulted while other
/// drivers stay available) is the caller's responsibility; exceptions from
/// <see cref="ITagDiscovery.DiscoverAsync"/> propagate unhandled to the caller. The Server
/// project's <c>OpcUaApplicationHost.PopulateAddressSpaces</c> wraps this call in a per-driver
/// try/catch that logs + leaves the driver's subtree empty until a Reinitialize succeeds.
/// </summary>
public async Task BuildAddressSpaceAsync(IAddressSpaceBuilder builder, CancellationToken ct)
{
@@ -118,11 +118,15 @@ public sealed class CapabilityInvoker
if (!isIdempotent)
{
var noRetryOptions = _optionsAccessor() with
// Snapshot the options exactly once per call — invoking _optionsAccessor twice can
// (a) observe two different snapshots if an Admin edit lands between them and
// (b) wastes an allocation on the per-write hot path (Phase 6.1 1% pipeline budget).
var snapshot = _optionsAccessor();
var noRetryOptions = snapshot with
{
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
{
[DriverCapability.Write] = _optionsAccessor().Resolve(DriverCapability.Write) with { RetryCount = 0 },
[DriverCapability.Write] = snapshot.Resolve(DriverCapability.Write) with { RetryCount = 0 },
},
};
var pipeline = _builder.GetOrCreate(_driverInstanceId, $"{hostName}::non-idempotent", DriverCapability.Write, noRetryOptions);
@@ -42,13 +42,27 @@ public sealed record DriverResilienceOptions
/// Look up the effective policy for a capability, falling back to tier defaults when no
/// override is configured. Never returns null.
/// </summary>
/// <exception cref="KeyNotFoundException">
/// Thrown when neither the override map nor the tier defaults carry an entry for the
/// requested capability. The <c>TierDefaults_Cover_EveryCapability</c> invariant test
/// in <c>DriverResilienceOptionsTests</c> guarantees every defined enum value is present
/// in each tier's table, so this only fires when a caller passes an out-of-range value
/// or someone adds a <see cref="DriverCapability"/> member without updating
/// <see cref="GetTierDefaults"/>. The message names the missing capability and tier.
/// </exception>
public CapabilityPolicy Resolve(DriverCapability capability)
{
if (CapabilityPolicies.TryGetValue(capability, out var policy))
return policy;
var defaults = GetTierDefaults(Tier);
return defaults[capability];
if (defaults.TryGetValue(capability, out var fallback))
return fallback;
throw new KeyNotFoundException(
$"No policy defined for capability '{capability}' under tier '{Tier}'. " +
$"This indicates a {nameof(DriverCapability)} enum value missing from {nameof(GetTierDefaults)} — " +
"add the capability to every tier's default table.");
}
/// <summary>
@@ -23,7 +23,15 @@ public sealed class WedgeDetector
/// <summary>Wedge-detection threshold; pass &lt; 60 s and the detector clamps to 60 s.</summary>
public TimeSpan Threshold { get; }
/// <summary>Whether the driver reported itself <see cref="DriverState.Healthy"/> at construction.</summary>
/// <summary>
/// Construct with the wedge-detection threshold; values below 60 s clamp to 60 s so
/// the detector never fires below the documented floor.
/// </summary>
/// <param name="threshold">
/// Time without a successful unit of work after which a Healthy driver with pending
/// work is considered Faulted. Clamped to a minimum of 60 s per the plan-default of
/// 5 × PublishingInterval.
/// </param>
public WedgeDetector(TimeSpan threshold)
{
Threshold = threshold < TimeSpan.FromSeconds(60) ? TimeSpan.FromSeconds(60) : threshold;
@@ -10,23 +10,56 @@ namespace ZB.MOM.WW.OtOpcUa.Analyzers;
/// <summary>
/// Diagnostic analyzer that flags direct invocations of Phase 6.1-wrapped driver-capability
/// methods when the call is NOT already running inside a <c>CapabilityInvoker.ExecuteAsync</c>,
/// <c>CapabilityInvoker.ExecuteWriteAsync</c>, or <c>AlarmSurfaceInvoker.*Async</c> lambda.
/// The wrapping is what gives us per-host breaker isolation, retry semantics, bulkhead-depth
/// accounting, and alarm-ack idempotence guards — raw calls bypass all of that.
/// methods when the call is NOT already running inside a <c>CapabilityInvoker.ExecuteAsync</c>
/// or <c>CapabilityInvoker.ExecuteWriteAsync</c> lambda. The wrapping is what gives us
/// per-host breaker isolation, retry semantics, bulkhead-depth accounting, and alarm-ack
/// idempotence guards — raw calls bypass all of that.
/// </summary>
/// <remarks>
/// The analyzer matches by receiver-interface identity using Roslyn's semantic model, not by
/// method name, so a driver with an unusually-named method implementing <c>IReadable.ReadAsync</c>
/// still trips the rule. Lambda-context detection walks up the syntax tree from the call site
/// and checks whether any enclosing <c>InvocationExpressionSyntax</c> targets one of the
/// specific wrapper methods listed in <c>WrapperMethods</c> (type + method name pair).
/// Matching by method name as well as containing type ensures that a future overload on
/// <c>CapabilityInvoker</c> that takes a predicate/selector lambda does not silently widen the
/// suppression scope. The rule is intentionally narrow: it does NOT try to enforce that the
/// capability argument matches the method (e.g. ReadAsync wrapped in
/// <c>ExecuteAsync(DriverCapability.Write, ...)</c> still passes) — that'd require flow
/// analysis beyond single-expression scope.
/// <para>
/// Guarded-call detection uses Roslyn's semantic model: the analyzer compares the invoked
/// method's containing-type symbol (and every implemented interface symbol) against the
/// <c>GuardedInterfaceTypes</c> set resolved once per compilation, then verifies the
/// method either is the interface member or is the concrete implementation discovered via
/// <see cref="ITypeSymbol.FindImplementationForInterfaceMember"/>. Matching by symbol
/// identity means a driver with an unusually-named method implementing
/// <c>IReadable.ReadAsync</c> still trips the rule.
/// </para>
/// <para>
/// <b>Default-interface-method handling:</b> <c>IHistoryProvider.ReadAtTimeAsync</c> and
/// <c>IHistoryProvider.ReadEventsAsync</c> ship as DIM bodies. When a driver inherits the
/// DIM (no override), <c>FindImplementationForInterfaceMember</c> returns the interface's
/// own method symbol, which still equals <paramref name="method"/> for an interface-typed
/// receiver. When a driver overrides the DIM, the override symbol equals
/// <paramref name="method"/> directly. Both paths are covered.
/// </para>
/// <para>
/// <b>Wrapper-lambda detection:</b> the analyzer walks up the syntax tree from the call
/// site and looks for an enclosing <c>InvocationExpressionSyntax</c> bound to
/// <c>CapabilityInvoker.ExecuteAsync</c> / <c>CapabilityInvoker.ExecuteWriteAsync</c>
/// whose argument list contains a lambda whose span contains the call. The match keys on
/// both the containing-type symbol AND the method name — a future overload on
/// <c>CapabilityInvoker</c> that took a non-<c>callSite</c> lambda (predicate, selector,
/// etc.) would still suppress the diagnostic; matching the method name is the strongest
/// containment check the analyzer can make without doing parameter-position binding on
/// every invocation in the IDE hot path. The known limitation is that any lambda
/// argument to <c>ExecuteAsync</c> / <c>ExecuteWriteAsync</c> suppresses the diagnostic
/// even if it is not bound to the <c>callSite</c> parameter — today both overload pairs
/// have exactly one lambda parameter (<c>callSite</c>), so the limitation is theoretical.
/// </para>
/// <para>
/// <b>Why <c>AlarmSurfaceInvoker</c> is NOT a wrapper home:</b> its public methods take
/// <c>IReadOnlyList&lt;...&gt;</c> / <c>IAlarmSubscriptionHandle</c> — no lambda
/// arguments — so no call to it can ever satisfy the lambda-containment check. Calls
/// inside <c>AlarmSurfaceInvoker</c>'s own implementation are covered transitively
/// because the surface routes through the inner <c>CapabilityInvoker.ExecuteAsync</c>
/// lambda.
/// </para>
/// <para>
/// The rule does NOT enforce that the capability argument matches the method (e.g.
/// <c>ReadAsync</c> wrapped in <c>ExecuteAsync(DriverCapability.Write, ...)</c> still
/// passes) — that would require flow analysis beyond single-expression scope.
/// </para>
/// </remarks>
[DiagnosticAnalyzer(Microsoft.CodeAnalysis.LanguageNames.CSharp)]
public sealed class UnwrappedCapabilityCallAnalyzer : DiagnosticAnalyzer
@@ -34,7 +67,7 @@ public sealed class UnwrappedCapabilityCallAnalyzer : DiagnosticAnalyzer
public const string DiagnosticId = "OTOPCUA0001";
/// <summary>Interfaces whose methods must be called through the capability invoker.</summary>
private static readonly string[] GuardedInterfaces =
private static readonly string[] GuardedInterfaceMetadataNames =
[
"ZB.MOM.WW.OtOpcUa.Core.Abstractions.IReadable",
"ZB.MOM.WW.OtOpcUa.Core.Abstractions.IWritable",
@@ -46,24 +79,24 @@ public sealed class UnwrappedCapabilityCallAnalyzer : DiagnosticAnalyzer
];
/// <summary>
/// Wrapper types paired with the method names that take a resilience <c>callSite</c> lambda.
/// Only a lambda bound to one of these specific methods is treated as a valid wrapper home;
/// other lambdas on the same type (e.g. future predicate/selector overloads) do not suppress
/// the diagnostic.
/// Wrapper-method (containing-type metadata name, method name) pairs. Only a lambda bound
/// to one of these specific methods is treated as a valid wrapper home; other lambdas on
/// the same type (e.g. future predicate/selector overloads) do not suppress the diagnostic
/// unless they take a lambda in an argument position. <c>AlarmSurfaceInvoker</c> is not in
/// this list because none of its public methods accept lambda arguments — calls inside its
/// own implementation are covered transitively by the <c>CapabilityInvoker.ExecuteAsync</c>
/// match.
/// </summary>
private static readonly (string TypeFqn, string MethodName)[] WrapperMethods =
private static readonly (string TypeMetadataName, string MethodName)[] WrapperMethodKeys =
[
("ZB.MOM.WW.OtOpcUa.Core.Resilience.CapabilityInvoker", "ExecuteAsync"),
("ZB.MOM.WW.OtOpcUa.Core.Resilience.CapabilityInvoker", "ExecuteWriteAsync"),
("ZB.MOM.WW.OtOpcUa.Core.Resilience.AlarmSurfaceInvoker", "SubscribeAsync"),
("ZB.MOM.WW.OtOpcUa.Core.Resilience.AlarmSurfaceInvoker", "UnsubscribeAsync"),
("ZB.MOM.WW.OtOpcUa.Core.Resilience.AlarmSurfaceInvoker", "AcknowledgeAsync"),
];
private static readonly DiagnosticDescriptor Rule = new(
id: DiagnosticId,
title: "Driver capability call must be wrapped in CapabilityInvoker",
messageFormat: "Call to '{0}' is not wrapped in CapabilityInvoker.ExecuteAsync / ExecuteWriteAsync / AlarmSurfaceInvoker.*. Without the wrapping, Phase 6.1 resilience (retry, breaker, bulkhead, tracker telemetry) is bypassed for this call.",
messageFormat: "Call to '{0}' is not wrapped in CapabilityInvoker.ExecuteAsync / ExecuteWriteAsync. Without the wrapping, Phase 6.1 resilience (retry, breaker, bulkhead, tracker telemetry) is bypassed for this call.",
category: "OtOpcUa.Resilience",
defaultSeverity: DiagnosticSeverity.Warning,
isEnabledByDefault: true,
@@ -75,20 +108,63 @@ public sealed class UnwrappedCapabilityCallAnalyzer : DiagnosticAnalyzer
{
context.ConfigureGeneratedCodeAnalysis(GeneratedCodeAnalysisFlags.None);
context.EnableConcurrentExecution();
context.RegisterOperationAction(AnalyzeInvocation, OperationKind.Invocation);
context.RegisterCompilationStartAction(OnCompilationStart);
}
private static void AnalyzeInvocation(OperationAnalysisContext context)
private static void OnCompilationStart(CompilationStartAnalysisContext context)
{
var invocation = (Microsoft.CodeAnalysis.Operations.IInvocationOperation)context.Operation;
// Resolve the guarded interfaces and wrapper types ONCE per compilation. If none of the
// guarded interfaces are referenced (e.g. an unrelated project building against this
// analyzer package), there's nothing to flag — register no further callbacks so the
// analyzer is a true no-op on cold compilations.
var guardedSet = new HashSet<INamedTypeSymbol>(SymbolEqualityComparer.Default);
foreach (var metadataName in GuardedInterfaceMetadataNames)
{
var sym = context.Compilation.GetTypeByMetadataName(metadataName);
if (sym is not null) guardedSet.Add(sym);
}
if (guardedSet.Count == 0) return;
// Wrapper methods: build a lookup keyed by (containing-type-symbol, method-name). Matching
// both means a future predicate/selector overload on CapabilityInvoker doesn't silently
// widen the suppression scope.
var wrapperMethodsByType = new Dictionary<INamedTypeSymbol, HashSet<string>>(SymbolEqualityComparer.Default);
foreach (var (typeMetadataName, methodName) in WrapperMethodKeys)
{
var typeSym = context.Compilation.GetTypeByMetadataName(typeMetadataName);
if (typeSym is null) continue;
if (!wrapperMethodsByType.TryGetValue(typeSym, out var names))
{
names = new HashSet<string>();
wrapperMethodsByType[typeSym] = names;
}
names.Add(methodName);
}
var state = new AnalyzerState(guardedSet, wrapperMethodsByType);
context.RegisterOperationAction(ctx => AnalyzeInvocation(ctx, state), OperationKind.Invocation);
}
private static void AnalyzeInvocation(OperationAnalysisContext context, AnalyzerState state)
{
var invocation = (IInvocationOperation)context.Operation;
// Defensive: a null SemanticModel means the IDE asked for analysis on a half-bound tree.
// We can't tell whether the call is inside a wrapper lambda without it, so SKIP rather
// than report — emitting a diagnostic in that state would be a false positive on code
// that is in fact correctly wrapped.
if (context.Operation.SemanticModel is null) return;
var method = invocation.TargetMethod;
if (method?.ContainingType is null) return;
if (method.ReturnType is null) return;
// Narrow the rule to async wire calls. Synchronous accessors like
// IHostConnectivityProbe.GetHostStatuses() are pure in-memory snapshots + would never
// benefit from the Polly pipeline; flagging them just creates false-positives.
if (!IsAsyncReturningType(method.ReturnType)) return;
if (!ImplementsGuardedInterface(method)) return;
if (IsInsideWrapperLambda(invocation.Syntax, context.Operation.SemanticModel, context.CancellationToken)) return;
if (!ImplementsGuardedInterface(method, state.GuardedInterfaces)) return;
if (IsInsideWrapperLambda(invocation.Syntax, context.Operation.SemanticModel, state.WrapperMethodsByType, context.CancellationToken)) return;
var diag = Diagnostic.Create(Rule, invocation.Syntax.GetLocation(), $"{method.ContainingType.Name}.{method.Name}");
context.ReportDiagnostic(diag);
@@ -103,59 +179,67 @@ public sealed class UnwrappedCapabilityCallAnalyzer : DiagnosticAnalyzer
or "global::System.Threading.Tasks.ValueTask<TResult>";
}
private static bool ImplementsGuardedInterface(IMethodSymbol method)
private static bool ImplementsGuardedInterface(IMethodSymbol method, HashSet<INamedTypeSymbol> guarded)
{
foreach (var iface in method.ContainingType.AllInterfaces.Concat(new[] { method.ContainingType }))
// The method may be defined directly on a guarded interface (interface-typed receiver) or
// be a concrete implementation. Walk every interface the containing type implements (plus
// the containing type itself, to catch the interface-typed-receiver case) and look for a
// member whose implementation in the containing type equals `method`, OR a member whose
// original definition equals `method.OriginalDefinition` (covers the DIM inherit case
// where FindImplementationForInterfaceMember returns the interface method itself).
var containingType = method.ContainingType;
foreach (var iface in containingType.AllInterfaces)
{
var ifaceFqn = iface.OriginalDefinition.ToDisplayString(SymbolDisplayFormat.FullyQualifiedFormat)
.Replace("global::", string.Empty);
if (!GuardedInterfaces.Contains(ifaceFqn)) continue;
foreach (var member in iface.GetMembers().OfType<IMethodSymbol>())
{
var impl = method.ContainingType.FindImplementationForInterfaceMember(member);
if (SymbolEqualityComparer.Default.Equals(impl, method) ||
SymbolEqualityComparer.Default.Equals(method.OriginalDefinition, member))
return true;
}
if (!guarded.Contains(iface.OriginalDefinition)) continue;
if (MethodImplementsInterfaceMember(method, containingType, iface)) return true;
}
// Interface-typed receiver: containingType *is* the guarded interface.
if (guarded.Contains(containingType.OriginalDefinition))
{
if (MethodImplementsInterfaceMember(method, containingType, containingType)) return true;
}
return false;
}
private static bool IsInsideWrapperLambda(SyntaxNode startNode, SemanticModel? semanticModel, System.Threading.CancellationToken ct)
private static bool MethodImplementsInterfaceMember(IMethodSymbol method, INamedTypeSymbol containingType, INamedTypeSymbol iface)
{
if (semanticModel is null) return false;
foreach (var member in iface.GetMembers())
{
if (member is not IMethodSymbol memberMethod) continue;
var impl = containingType.FindImplementationForInterfaceMember(memberMethod);
if (SymbolEqualityComparer.Default.Equals(impl, method)) return true;
if (SymbolEqualityComparer.Default.Equals(method.OriginalDefinition, memberMethod)) return true;
}
return false;
}
private static bool IsInsideWrapperLambda(
SyntaxNode startNode,
SemanticModel semanticModel,
Dictionary<INamedTypeSymbol, HashSet<string>> wrapperMethodsByType,
System.Threading.CancellationToken ct)
{
for (var node = startNode.Parent; node is not null; node = node.Parent)
{
// We only care about an enclosing invocation — the call we're auditing must literally
// live inside a lambda (ParenthesizedLambda / SimpleLambda / AnonymousMethod) that is
// an argument of a specific CapabilityInvoker.Execute* / AlarmSurfaceInvoker.*Async
// method. Matching both the containing type AND the method name closes the gap where a
// future overload on the same type that takes a predicate/selector lambda would
// otherwise incorrectly suppress the diagnostic.
// The call must literally live inside a lambda (ParenthesizedLambda / SimpleLambda /
// AnonymousMethod) that is an argument of one of the wrapper methods. Match both the
// containing-type symbol AND the method name — a future overload that took a
// predicate/selector lambda on the same type would still suppress the diagnostic IF it
// had the same method name, but a brand-new method (e.g. a hypothetical
// CapabilityInvoker.WithFilterAsync) would not. Today both wrapper method pairs have
// exactly one lambda parameter (the callSite), so the limitation is theoretical.
if (node is not InvocationExpressionSyntax outer) continue;
var sym = semanticModel.GetSymbolInfo(outer, ct).Symbol as IMethodSymbol;
if (sym is null) continue;
if (sym?.ContainingType is null) continue;
var outerTypeFqn = sym.ContainingType.OriginalDefinition.ToDisplayString(SymbolDisplayFormat.FullyQualifiedFormat)
.Replace("global::", string.Empty);
var methodName = sym.Name;
var isWrapperMethod = false;
foreach (var (typeFqn, name) in WrapperMethods)
{
if (typeFqn == outerTypeFqn && name == methodName)
{
isWrapperMethod = true;
break;
}
}
if (!isWrapperMethod) continue;
var containingDef = sym.ContainingType.OriginalDefinition;
if (!wrapperMethodsByType.TryGetValue(containingDef, out var methodNames)) continue;
if (!methodNames.Contains(sym.Name)) continue;
// The call is wrapped IFF our startNode is transitively inside one of the outer
// call's argument lambdas. Walk the outer invocation's argument list + check whether
// any lambda body contains the startNode's position.
// The call is wrapped IFF startNode is transitively inside one of the outer call's
// argument lambdas. Walk the outer invocation's argument list + check whether any
// lambda body contains the startNode's position.
foreach (var arg in outer.ArgumentList.Arguments)
{
if (arg.Expression is not AnonymousFunctionExpressionSyntax lambda) continue;
@@ -164,4 +248,18 @@ public sealed class UnwrappedCapabilityCallAnalyzer : DiagnosticAnalyzer
}
return false;
}
private sealed class AnalyzerState
{
public HashSet<INamedTypeSymbol> GuardedInterfaces { get; }
public Dictionary<INamedTypeSymbol, HashSet<string>> WrapperMethodsByType { get; }
public AnalyzerState(
HashSet<INamedTypeSymbol> guardedInterfaces,
Dictionary<INamedTypeSymbol, HashSet<string>> wrapperMethodsByType)
{
GuardedInterfaces = guardedInterfaces;
WrapperMethodsByType = wrapperMethodsByType;
}
}
}
@@ -128,4 +128,94 @@ public sealed class GenerationApplierTests
result.Succeeded.ShouldBeFalse();
result.Errors.ShouldContain(e => e.Contains("tag-bad") && e.Contains("simulated"));
}
// ------------------------------------------------------------------------------------
// Configuration-011 — pin the documented ordering behaviour: a thrown Removed callback
// records an entity error but the applier still runs the Added/Modified passes (the
// current contract — see GenerationApplier comment about cascades settling).
// ------------------------------------------------------------------------------------
[Fact]
public async Task Apply_continues_to_Added_pass_when_a_Removed_callback_throws()
{
var callLog = new List<string>();
var applier = new GenerationApplier(new ApplyCallbacks
{
OnTag = (c, _) =>
{
callLog.Add($"tag:{c.Kind}:{c.LogicalId}");
if (c.Kind == ChangeKind.Removed)
throw new InvalidOperationException("removed-failed");
return Task.CompletedTask;
},
});
var from = SnapshotWith(tags: [Tag("tag-old", "X")]);
var to = SnapshotWith(tags: [Tag("tag-new", "Y")]);
var result = await applier.ApplyAsync(from, to, CancellationToken.None);
result.Succeeded.ShouldBeFalse();
result.Errors.ShouldContain(e => e.Contains("tag-old") && e.Contains("removed-failed"));
// The Added pass still runs even though Removed failed.
callLog.ShouldContain("tag:Removed:tag-old");
callLog.ShouldContain("tag:Added:tag-new");
}
// ------------------------------------------------------------------------------------
// Configuration-007 — ApplyPass must propagate OperationCanceledException rather than
// recording it as an entity error. Cancellation between passes must also halt the apply.
// ------------------------------------------------------------------------------------
[Fact]
public async Task Apply_propagates_OperationCanceledException_from_callback_when_token_cancelled()
{
// A callback that observes a cancelled token and throws OperationCanceledException
// must abort the entire apply, not be silently swallowed and recorded as an error.
using var cts = new CancellationTokenSource();
var applier = new GenerationApplier(new ApplyCallbacks
{
OnTag = (c, ct) =>
{
cts.Cancel();
ct.ThrowIfCancellationRequested();
return Task.CompletedTask;
},
});
var to = SnapshotWith(tags: [Tag("tag-1", "A")]);
await Should.ThrowAsync<OperationCanceledException>(async () =>
await applier.ApplyAsync(from: null, to, cts.Token));
}
[Fact]
public async Task Apply_stops_between_passes_when_cancellation_requested()
{
// After a Removed pass completes, the applier should observe cancellation before
// running the Added/Modified passes — not silently keep walking.
var callLog = new List<string>();
using var cts = new CancellationTokenSource();
var applier = new GenerationApplier(new ApplyCallbacks
{
OnTag = (c, _) =>
{
callLog.Add($"tag:{c.Kind}:{c.LogicalId}");
// Cancel after the Removed pass finishes — before the Added pass runs.
if (c.Kind == ChangeKind.Removed) cts.Cancel();
return Task.CompletedTask;
},
});
// `from` has tag-1, `to` has tag-2 — produces one Removed + one Added.
var from = SnapshotWith(tags: [Tag("tag-1", "A")]);
var to = SnapshotWith(tags: [Tag("tag-2", "B")]);
await Should.ThrowAsync<OperationCanceledException>(async () =>
await applier.ApplyAsync(from, to, cts.Token));
callLog.ShouldContain("tag:Removed:tag-1");
callLog.ShouldNotContain("tag:Added:tag-2",
"Added pass must not run after cancellation observed between passes");
}
}
@@ -90,6 +90,38 @@ public sealed class LiteDbConfigCacheTests : IDisposable
(await cache.GetMostRecentAsync("c-1"))!.PayloadJson.ShouldBe("{\"v\":2}");
}
// ------------------------------------------------------------------------------------
// Configuration-005 — concurrent PutAsync for the same (ClusterId, GenerationId) must
// not produce duplicate rows. The original find-then-insert was non-atomic so two racing
// callers could both observe `existing is null` and both Insert.
// ------------------------------------------------------------------------------------
[Fact]
public async Task PutAsync_concurrent_for_same_cluster_and_generation_does_not_duplicate()
{
using var cache = new LiteDbConfigCache(_dbPath);
// Pre-seed gen=99 so prune keepLatest:1 has a sentinel that survives independent of
// any potential duplicate (gen=42) row count.
await cache.PutAsync(Snapshot("c-1", 99));
// Many parallel writes for the same key. Without serialization, racing find-then-insert
// would Insert multiple rows for the same (ClusterId, GenerationId=42).
var tasks = Enumerable.Range(0, 64).Select(_ => Task.Run(async () =>
{
var s = Snapshot("c-1", 42);
await cache.PutAsync(s);
})).ToArray();
await Task.WhenAll(tasks);
// Count rows for gen=42 directly by inspecting the LiteDB file via a fresh handle.
cache.Dispose();
using var verify = new LiteDB.LiteDatabase(_dbPath);
var col = verify.GetCollection<GenerationSnapshot>("generations");
var gen42Count = col.Find(s => s.ClusterId == "c-1" && s.GenerationId == 42).Count();
gen42Count.ShouldBe(1,
$"PutAsync must upsert atomically — found {gen42Count} rows for (c-1, gen=42) after 64 concurrent puts");
}
[Fact]
public void Corrupt_file_surfaces_as_LocalConfigCacheCorruptException()
{
@@ -0,0 +1,49 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
namespace ZB.MOM.WW.OtOpcUa.Configuration.Tests;
/// <summary>
/// Pins the underlying type of <see cref="NodePermissions"/> to <c>int</c> so the SQL
/// storage type (<c>HasConversion&lt;int&gt;()</c> in <c>OtOpcUaConfigDbContext</c>) and the
/// XML doc ("Stored as int") cannot drift back into the latent <c>uint→int</c> overflow
/// trap caught by Configuration-004.
/// </summary>
[Trait("Category", "Unit")]
public sealed class NodePermissionsTests
{
[Fact]
public void Underlying_type_is_int_so_it_matches_HasConversion_in_DbContext()
{
// Configuration-004: NodePermissions was declared : uint while NodeAcl.PermissionFlags
// is persisted via HasConversion<int>(). A bit-31 grant would overflow int and corrupt
// the round-trip. Lock the underlying type to int.
typeof(NodePermissions).GetEnumUnderlyingType().ShouldBe(typeof(int));
}
[Fact]
public void All_defined_bits_round_trip_through_int_cast_without_loss()
{
// Belt-and-braces: every declared bit must survive a (int) round-trip — fails today
// if anyone re-introduces a bit-31 flag while the underlying type is uint.
foreach (NodePermissions value in Enum.GetValues<NodePermissions>())
{
var asInt = (int)value;
var roundTripped = (NodePermissions)asInt;
roundTripped.ShouldBe(value);
}
}
[Fact]
public void Bitwise_combinations_round_trip_through_int_storage()
{
// The PermissionFlags column stores int; combinations of bits must survive the conversion.
var combo = NodePermissions.Engineer | NodePermissions.MethodCall;
var stored = (int)combo;
var rebuilt = (NodePermissions)stored;
rebuilt.ShouldBe(combo);
rebuilt.HasFlag(NodePermissions.WriteTune).ShouldBeTrue();
rebuilt.HasFlag(NodePermissions.MethodCall).ShouldBeTrue();
}
}
@@ -1,3 +1,4 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Polly.Timeout;
using Shouldly;
@@ -186,6 +187,52 @@ public sealed class ResilientConfigReaderTests : IDisposable
flag.IsStale.ShouldBeTrue("cache fallback marks the stale flag");
}
// ------------------------------------------------------------------------------------
// Configuration-010 — fallback warning log must scrub connection-string fragments and
// must not include the full exception object (which carries the stack and any inner-
// exception chain). Project rule: no credential or connection-string fragment in logs.
// ------------------------------------------------------------------------------------
[Fact]
public async Task FallbackWarning_does_not_log_full_exception_object_or_password_fragment()
{
var cache = new GenerationSealedCache(_root);
await cache.SealAsync(new GenerationSnapshot
{
ClusterId = "cluster-e", GenerationId = 1, CachedAt = DateTime.UtcNow,
PayloadJson = "{\"ok\":true}",
});
var flag = new StaleConfigFlag();
var capturing = new CapturingLogger<ResilientConfigReader>();
var reader = new ResilientConfigReader(cache, flag, capturing,
timeout: TimeSpan.FromSeconds(10), retryCount: 0);
// Simulated SqlException-style message carrying a connection-string fragment, the
// kind of thing a poorly-wrapped delegate could surface.
const string secretBearingMessage =
"Login failed for user 'sa'. (Server=sql.example.com,1433;User Id=sa;Password=SuperSecret123!)";
await reader.ReadAsync(
"cluster-e",
_ => throw new InvalidOperationException(secretBearingMessage),
snap => snap.PayloadJson,
CancellationToken.None);
var warning = capturing.Records.ShouldHaveSingleItem();
warning.LogLevel.ShouldBe(LogLevel.Warning);
// The exception object passed as the first arg to LogWarning(ex, ...) drives the
// formatter's stack-trace dump; capturing it lets us assert the scrubbing surface.
warning.Exception.ShouldBeNull(
"the warning must not attach the raw exception — it can carry connection-string fragments");
// The rendered message must not echo password / user-id strings even if the caller
// embedded them in the exception message.
warning.RenderedMessage.ShouldNotContain("Password=", Case.Insensitive);
warning.RenderedMessage.ShouldNotContain("SuperSecret123!");
warning.RenderedMessage.ShouldNotContain("User Id=", Case.Insensitive);
}
[Fact]
public async Task CallerCancellation_Propagates_NotFallback()
{
@@ -220,6 +267,26 @@ public sealed class ResilientConfigReaderTests : IDisposable
}
}
internal sealed record LogRecord(LogLevel LogLevel, string RenderedMessage, Exception? Exception);
internal sealed class CapturingLogger<T> : ILogger<T>
{
public List<LogRecord> Records { get; } = new();
public IDisposable BeginScope<TState>(TState state) where TState : notnull => NullScope.Instance;
public bool IsEnabled(LogLevel logLevel) => true;
public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception? exception, Func<TState, Exception?, string> formatter)
{
Records.Add(new LogRecord(logLevel, formatter(state, exception), exception));
}
private sealed class NullScope : IDisposable
{
public static readonly NullScope Instance = new();
public void Dispose() { }
}
}
[Trait("Category", "Unit")]
public sealed class StaleConfigFlagTests
{
@@ -0,0 +1,52 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests;
/// <summary>
/// Covers <see cref="DriverHealth"/> shape invariants — added to lock down the documented
/// contract after Core.Abstractions-008 reworded the <c>LastError</c> remark.
/// </summary>
public sealed class DriverHealthTests
{
/// <summary>
/// Core.Abstractions-008: <c>DriverHealth.LastError</c> is not constrained by the
/// <c>State</c> enum — a Healthy driver may legitimately retain the last error from a
/// recovered failure (for diagnostics), and Degraded / Reconnecting / Faulted states may
/// all carry a non-null message. The old XML doc "null when state is Healthy" was wrong;
/// this test makes the type's actual contract explicit so future doc churn cannot drift.
/// </summary>
[Theory]
[InlineData(DriverState.Unknown)]
[InlineData(DriverState.Initializing)]
[InlineData(DriverState.Healthy)]
[InlineData(DriverState.Degraded)]
[InlineData(DriverState.Reconnecting)]
[InlineData(DriverState.Faulted)]
public void LastError_IsIndependent_OfState(DriverState state)
{
var healthWithError = new DriverHealth(state, LastSuccessfulRead: DateTime.UtcNow, LastError: "earlier failure");
var healthWithoutError = new DriverHealth(state, LastSuccessfulRead: DateTime.UtcNow, LastError: null);
// Both shapes are constructible regardless of state — the type makes no enforcement.
healthWithError.LastError.ShouldBe("earlier failure");
healthWithoutError.LastError.ShouldBeNull();
healthWithError.State.ShouldBe(state);
healthWithoutError.State.ShouldBe(state);
}
[Fact]
public void DriverState_EnumContainsExpectedMembers()
{
// Pins the enum so finding-008's "more than Healthy can carry an error" claim
// does not bit-rot.
var names = Enum.GetNames<DriverState>();
names.ShouldContain(nameof(DriverState.Unknown));
names.ShouldContain(nameof(DriverState.Initializing));
names.ShouldContain(nameof(DriverState.Healthy));
names.ShouldContain(nameof(DriverState.Degraded));
names.ShouldContain(nameof(DriverState.Reconnecting));
names.ShouldContain(nameof(DriverState.Faulted));
}
}
@@ -1,5 +1,6 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests;
@@ -120,4 +121,77 @@ public sealed class DriverTypeRegistryTests
var registry = new DriverTypeRegistry();
Should.Throw<ArgumentException>(() => registry.Get(typeName!));
}
/// <summary>
/// Core.Abstractions-004: concurrent <see cref="DriverTypeRegistry.Register"/> calls for
/// two different driver types must both succeed and both end up visible to readers. A
/// non-atomic check-then-act would let the second swap silently discard the first
/// registration; this test asserts the "registered only once per process" guarantee
/// holds under concurrent writers too.
/// </summary>
[Fact]
public void Register_ConcurrentDistinctTypes_AllSucceed()
{
var registry = new DriverTypeRegistry();
const int parallelism = 32;
var ready = new ManualResetEventSlim(false);
var threads = Enumerable.Range(0, parallelism)
.Select(i => new Thread(() =>
{
ready.Wait();
registry.Register(SampleMetadata($"Driver-{i}"));
}))
.ToArray();
foreach (var t in threads) t.Start();
ready.Set(); // release all threads simultaneously
foreach (var t in threads) t.Join();
var all = registry.All();
all.Count.ShouldBe(parallelism);
for (var i = 0; i < parallelism; i++)
registry.TryGet($"Driver-{i}").ShouldNotBeNull(
$"Driver-{i} was lost to a race in concurrent Register");
}
/// <summary>
/// Core.Abstractions-004: concurrent <see cref="DriverTypeRegistry.Register"/> calls for
/// the SAME driver type must result in exactly one successful registration and exactly
/// (parallelism - 1) <see cref="InvalidOperationException"/> throws — not a silent
/// last-writer-wins replacement.
/// </summary>
[Fact]
public void Register_ConcurrentDuplicateType_ExactlyOneWins()
{
var registry = new DriverTypeRegistry();
const int parallelism = 16;
var ready = new ManualResetEventSlim(false);
var successes = 0;
var failures = 0;
var threads = Enumerable.Range(0, parallelism)
.Select(_ => new Thread(() =>
{
ready.Wait();
try
{
registry.Register(SampleMetadata("Contended"));
Interlocked.Increment(ref successes);
}
catch (InvalidOperationException)
{
Interlocked.Increment(ref failures);
}
}))
.ToArray();
foreach (var t in threads) t.Start();
ready.Set();
foreach (var t in threads) t.Join();
successes.ShouldBe(1);
failures.ShouldBe(parallelism - 1);
registry.All().Count.ShouldBe(1);
}
}
@@ -118,4 +118,61 @@ public sealed class IHistorianDataSourceContractTests
healthy.ShouldNotBe(unhealthy);
}
/// <summary>
/// Core.Abstractions-006: the <c>maxValuesPerNode</c> (raw / processed) and <c>maxEvents</c>
/// parameter types are intentionally asymmetric — raw/processed reads use <c>uint</c>
/// because OPC UA HistoryRead's NumValuesPerNode is unsigned, while event reads use
/// <c>int</c> to allow zero / negative as a "use backend default cap" sentinel
/// (see <c>WonderwareHistorianClient</c> / <c>HistorianDataSource</c> usage). This test
/// pins both shapes so accidental changes are caught.
/// </summary>
[Theory]
[InlineData("ReadRawAsync", "maxValuesPerNode", typeof(uint))]
[InlineData("ReadEventsAsync", "maxEvents", typeof(int))]
public void HistoryRead_MaxParameter_TypePinned(string methodName, string parameterName, Type expectedType)
{
var method = typeof(IHistorianDataSource).GetMethod(methodName);
method.ShouldNotBeNull();
var parameter = method!.GetParameters().FirstOrDefault(p => p.Name == parameterName);
parameter.ShouldNotBeNull($"Method {methodName} should expose a '{parameterName}' parameter.");
parameter!.ParameterType.ShouldBe(expectedType);
}
[Theory]
[InlineData("ReadRawAsync", "maxValuesPerNode", typeof(uint))]
[InlineData("ReadEventsAsync", "maxEvents", typeof(int))]
public void HistoryProvider_MaxParameter_TypePinned(string methodName, string parameterName, Type expectedType)
{
var method = typeof(IHistoryProvider).GetMethod(methodName);
method.ShouldNotBeNull();
var parameter = method!.GetParameters().FirstOrDefault(p => p.Name == parameterName);
parameter.ShouldNotBeNull($"Method {methodName} should expose a '{parameterName}' parameter.");
parameter!.ParameterType.ShouldBe(expectedType);
}
/// <summary>
/// Core.Abstractions-008: <see cref="IHistoryProvider.ReadAtTimeAsync"/> and
/// <see cref="IHistoryProvider.ReadEventsAsync"/> are C# default interface methods
/// (drivers opt in), whereas <see cref="IHistorianDataSource.ReadAtTimeAsync"/> and
/// <see cref="IHistorianDataSource.ReadEventsAsync"/> are required (the server-side
/// historian must implement them). This test pins the asymmetry so an implementer cannot
/// accidentally collapse the two surfaces and so the documented rationale stays load-bearing.
/// </summary>
[Theory]
[InlineData("ReadAtTimeAsync")]
[InlineData("ReadEventsAsync")]
public void HistoryProvider_OptionalMethods_HaveDefaultImplementation(string methodName)
{
var providerMethod = typeof(IHistoryProvider).GetMethod(methodName);
providerMethod.ShouldNotBeNull();
// Default interface methods carry a method body — IsAbstract is false.
providerMethod!.IsAbstract.ShouldBeFalse(
$"IHistoryProvider.{methodName} must remain a default-impl-throws method so legacy drivers continue to compile.");
var dataSourceMethod = typeof(IHistorianDataSource).GetMethod(methodName);
dataSourceMethod.ShouldNotBeNull();
dataSourceMethod!.IsAbstract.ShouldBeTrue(
$"IHistorianDataSource.{methodName} must remain required — server-side historians own the full surface.");
}
}
@@ -342,6 +342,108 @@ public sealed class PollGroupEngineTests
shortReadCount.ShouldBeGreaterThanOrEqualTo(2);
}
/// <summary>
/// Core.Abstractions-005: the engine documents that "transient poll errors are logged on
/// the driver health surface", but until an error callback exists the driver has no way
/// to observe a caught reader exception. Subscribing without supplying an error callback
/// must continue to swallow exceptions (backward compatible). When an error callback IS
/// supplied, every exception caught during a poll cycle must be routed to it.
/// </summary>
[Fact]
public async Task Reader_exception_is_reported_to_onError_callback()
{
var observed = new ConcurrentQueue<Exception>();
var readCount = 0;
Task<IReadOnlyList<DataValueSnapshot>> Reader(IReadOnlyList<string> refs, CancellationToken ct)
{
if (Interlocked.Increment(ref readCount) <= 3)
throw new InvalidOperationException($"boom-{readCount}");
var now = DateTime.UtcNow;
return Task.FromResult<IReadOnlyList<DataValueSnapshot>>(
refs.Select(_ => new DataValueSnapshot(1, 0u, now, now)).ToList());
}
await using var engine = new PollGroupEngine(
Reader,
(_, _, _) => { },
minInterval: TimeSpan.FromMilliseconds(50),
onError: ex => observed.Enqueue(ex));
var handle = engine.Subscribe(["X"], TimeSpan.FromMilliseconds(50));
await WaitForAsync(() => observed.Count >= 3, TimeSpan.FromSeconds(3));
engine.Unsubscribe(handle);
observed.Count.ShouldBeGreaterThanOrEqualTo(3);
observed.All(e => e is InvalidOperationException).ShouldBeTrue();
observed.All(e => e.Message.StartsWith("boom-")).ShouldBeTrue();
}
/// <summary>
/// Core.Abstractions-005: a contract-violating reader (Core.Abstractions-002 path) that
/// throws the descriptive <see cref="InvalidOperationException"/> from inside the engine
/// must also be routed to the error callback so the driver health surface can observe
/// repeated contract violations.
/// </summary>
[Fact]
public async Task Reader_contract_violation_routes_to_onError_callback()
{
var observed = new ConcurrentQueue<Exception>();
Task<IReadOnlyList<DataValueSnapshot>> Reader(IReadOnlyList<string> refs, CancellationToken ct)
{
// Always return zero snapshots — short-result-list contract violation.
return Task.FromResult<IReadOnlyList<DataValueSnapshot>>(new List<DataValueSnapshot>());
}
await using var engine = new PollGroupEngine(
Reader,
(_, _, _) => { },
minInterval: TimeSpan.FromMilliseconds(50),
onError: ex => observed.Enqueue(ex));
var handle = engine.Subscribe(["X"], TimeSpan.FromMilliseconds(50));
await WaitForAsync(() => observed.Count >= 2, TimeSpan.FromSeconds(2));
engine.Unsubscribe(handle);
observed.Count.ShouldBeGreaterThanOrEqualTo(2);
observed.All(e => e is InvalidOperationException
&& e.Message.Contains("Reader contract violation"))
.ShouldBeTrue();
}
/// <summary>
/// Core.Abstractions-005: the engine must defend itself against an <c>onError</c> handler
/// that itself throws — otherwise a buggy health-surface forwarder would crash the poll
/// loop and silently stall the subscription, defeating the whole point of the callback.
/// </summary>
[Fact]
public async Task OnError_handler_that_throws_does_not_crash_loop()
{
var readCount = 0;
var events = new ConcurrentQueue<string>();
Task<IReadOnlyList<DataValueSnapshot>> Reader(IReadOnlyList<string> refs, CancellationToken ct)
{
if (Interlocked.Increment(ref readCount) <= 2)
throw new InvalidOperationException("boom");
var now = DateTime.UtcNow;
return Task.FromResult<IReadOnlyList<DataValueSnapshot>>(
refs.Select(_ => new DataValueSnapshot(1, 0u, now, now)).ToList());
}
await using var engine = new PollGroupEngine(
Reader,
(_, r, _) => events.Enqueue(r),
minInterval: TimeSpan.FromMilliseconds(50),
onError: _ => throw new ApplicationException("error-handler-bug"));
var handle = engine.Subscribe(["X"], TimeSpan.FromMilliseconds(50));
// Wait long enough for the reader to recover and for the engine to deliver a change.
await WaitForAsync(() => events.Count >= 1, TimeSpan.FromSeconds(3));
engine.Unsubscribe(handle);
events.Count.ShouldBeGreaterThanOrEqualTo(1);
}
private sealed record DummyHandle : ISubscriptionHandle
{
public string DiagnosticId => "dummy";
@@ -609,6 +609,130 @@ public sealed class SqliteStoreAndForwardSinkTests : IDisposable
}
}
/// <summary>
/// Regression for Core.AlarmHistorian-008: <c>EnqueueAsync</c> must NOT run a
/// <c>SELECT COUNT(*)</c> on every enqueue when we are far below capacity. The
/// optimisation tracks the queue depth in memory and only consults the database
/// when the cached value indicates the capacity wall is in reach. This regression
/// pins the perf characteristic: after many enqueues below capacity, the
/// capacity-probe count must stay bounded — not grow proportionally to the
/// enqueue count as the un-optimised path did.
/// </summary>
[Fact]
public async Task EnqueueAsync_does_not_count_all_rows_on_every_call_below_capacity()
{
var writer = new FakeWriter();
using var sink = new SqliteStoreAndForwardSink(
_dbPath, writer, _log, batchSize: 100, capacity: 10_000);
for (var i = 0; i < 200; i++)
await sink.EnqueueAsync(Event($"E{i}"), CancellationToken.None);
// Pre-fix: probe count == enqueue count (200). Post-fix: ≤ a handful (initial
// load + occasional periodic re-syncs). 25 is generous headroom.
sink.DebugCapacityProbeCount.ShouldBeLessThan(25,
"EnqueueAsync must not run a per-call SELECT COUNT(*) below capacity");
}
/// <summary>
/// Regression for Core.AlarmHistorian-008: across every queue mutation (enqueue,
/// Ack drain, PermanentFail drain, capacity eviction, RetryDeadLettered) the
/// queue depth reported by <see cref="SqliteStoreAndForwardSink.GetStatus"/> must
/// stay aligned with a fresh <c>COUNT(*)</c> against the database. Catches drift
/// bugs in the in-memory counter introduced by the perf optimisation.
/// </summary>
[Fact]
public async Task Enqueue_and_drain_keep_queue_depth_consistent_with_storage()
{
var writer = new FakeWriter();
using var sink = new SqliteStoreAndForwardSink(
_dbPath, writer, _log, batchSize: 5, capacity: 8);
// Burst-enqueue below capacity — the in-memory counter must stay aligned with the
// SELECT COUNT(*) that GetStatus runs.
for (var i = 0; i < 6; i++)
await sink.EnqueueAsync(Event($"burst-{i}"), CancellationToken.None);
AssertQueueDepthMatchesStorage(sink);
sink.GetStatus().QueueDepth.ShouldBe(6);
// Push past capacity — capacity must still be enforced even when EnqueueAsync no
// longer runs COUNT(*) on every call.
for (var i = 0; i < 5; i++)
await sink.EnqueueAsync(Event($"overflow-{i}"), CancellationToken.None);
sink.GetStatus().QueueDepth.ShouldBe(8, "capacity must still be honoured by the perf-optimised path");
sink.GetStatus().EvictedCount.ShouldBe(3, "eviction counter must reflect every evicted row");
AssertQueueDepthMatchesStorage(sink);
// Drain a partial batch (Ack) — the in-memory counter must follow the deletes
// applied inside the single consolidated drain transaction.
await sink.DrainOnceAsync(CancellationToken.None);
AssertQueueDepthMatchesStorage(sink);
// Add a dead-lettered row and verify the counter does NOT include it (QueueDepth
// is non-dead-lettered only).
writer.NextOutcomePerEvent.Enqueue(HistorianWriteOutcome.PermanentFail);
await sink.EnqueueAsync(Event("to-dead-letter"), CancellationToken.None);
await sink.DrainOnceAsync(CancellationToken.None);
AssertQueueDepthMatchesStorage(sink);
sink.GetStatus().DeadLetterDepth.ShouldBeGreaterThanOrEqualTo(1);
// RetryDeadLettered moves DLQ rows back into the live queue — the counter must
// pick that up.
sink.RetryDeadLettered();
AssertQueueDepthMatchesStorage(sink);
}
/// <summary>
/// Stress regression for Core.AlarmHistorian-008: interleave many enqueues and
/// drains across threads and confirm the in-memory counter stays consistent
/// with storage. Catches drift bugs in the optimised path that would only show
/// up under contention.
/// </summary>
[Fact]
public async Task Counter_remains_consistent_under_concurrent_enqueue_and_drain()
{
var writer = new FakeWriter();
using var sink = new SqliteStoreAndForwardSink(_dbPath, writer, _log);
var enqueuers = Enumerable.Range(0, 3).Select(t => Task.Run(async () =>
{
for (var i = 0; i < 60; i++)
await sink.EnqueueAsync(Event($"T{t}-{i}"), CancellationToken.None);
}));
var drainers = Enumerable.Range(0, 2).Select(_ => Task.Run(async () =>
{
for (var i = 0; i < 30; i++)
{
await sink.DrainOnceAsync(CancellationToken.None);
await Task.Delay(2);
}
}));
await Task.WhenAll(enqueuers.Concat(drainers));
// Drain anything left over.
for (var i = 0; i < 10; i++)
await sink.DrainOnceAsync(CancellationToken.None);
AssertQueueDepthMatchesStorage(sink);
sink.GetStatus().QueueDepth.ShouldBe(0, "every event drained at the end of the run");
}
/// <summary>
/// Helper that confirms the queue depth surfaced by GetStatus matches a fresh
/// COUNT(*) read directly from storage — proves the in-memory counter has not
/// drifted from the persisted truth.
/// </summary>
private void AssertQueueDepthMatchesStorage(SqliteStoreAndForwardSink sink)
{
using var conn = new SqliteConnection($"Data Source={_dbPath}");
conn.Open();
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
var live = (long)(cmd.ExecuteScalar() ?? 0L);
sink.GetStatus().QueueDepth.ShouldBe(live, "GetStatus must agree with a fresh COUNT(*)");
}
/// <summary>Insert a queue row whose PayloadJson cannot deserialize into an AlarmHistorianEvent.</summary>
private void InsertCorruptRow(string alarmId)
{
@@ -0,0 +1,155 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Authorization;
/// <summary>
/// Core-011 regression coverage for <see cref="PermissionTrieBuilder.Build"/>'s
/// <c>Descend</c> helper:
/// <list type="bullet">
/// <item>With a <c>scopePaths</c> lookup the row must land at the correct multi-level
/// trie node — a deep <see cref="NodeAclScopeKind.UnsLine"/> grant must be visible
/// ONLY when the requested scope walks the same namespace/area/line chain.</item>
/// <item>Without a <c>scopePaths</c> entry the row falls back to a direct child of
/// the namespace root keyed on the row's <c>ScopeId</c>. The builder must surface
/// this fallback (warning callback) so callers know a grant was placed where the
/// walker can't reach it for production hierarchies — silently dropping the grant
/// is the Core-011 production hazard.</item>
/// </list>
/// </summary>
[Trait("Category", "Unit")]
public sealed class PermissionTrieBuilderTests
{
private static NodeAcl Row(string group, NodeAclScopeKind scope, string? scopeId, NodePermissions flags, string clusterId = "c1") =>
new()
{
NodeAclRowId = Guid.NewGuid(),
NodeAclId = $"acl-{Guid.NewGuid():N}",
GenerationId = 1,
ClusterId = clusterId,
LdapGroup = group,
ScopeKind = scope,
ScopeId = scopeId,
PermissionFlags = flags,
};
private static NodeScope EquipmentTag(string cluster, string ns, string area, string line, string equip, string tag) =>
new()
{
ClusterId = cluster,
NamespaceId = ns,
UnsAreaId = area,
UnsLineId = line,
EquipmentId = equip,
TagId = tag,
Kind = NodeHierarchyKind.Equipment,
};
[Fact]
public void Build_With_ScopePaths_Places_UnsLine_Row_At_Correct_Multi_Level_Node()
{
// Scope path mirrors the production hierarchy: namespace → area → line.
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
{
["line-42"] = new(new[] { "ns", "area-1", "line-42" }),
};
var rows = new[] { Row("cn=ops", NodeAclScopeKind.UnsLine, "line-42", NodePermissions.Read) };
var trie = PermissionTrieBuilder.Build("c1", 1, rows, paths);
// Walk through the same chain — the grant must be reachable.
var matchOnLine = trie.CollectMatches(
EquipmentTag("c1", "ns", "area-1", "line-42", "eq-A", "tag-A"),
["cn=ops"]);
matchOnLine.Count.ShouldBe(1, "row must land at the correct multi-level trie node");
// A different line under the same area must not pick up the grant.
var matchOnOtherLine = trie.CollectMatches(
EquipmentTag("c1", "ns", "area-1", "line-99", "eq-A", "tag-A"),
["cn=ops"]);
matchOnOtherLine.ShouldBeEmpty(
"grant anchored at line-42 must not leak to sibling line-99 under the same area");
}
[Fact]
public void Build_Without_ScopePaths_Falls_Back_To_Root_Child_For_Tests()
{
// Fallback path — deterministic tests pass without a scope-path lookup. The row
// is placed as a direct child of the trie root keyed by ScopeId.
var rows = new[] { Row("cn=ops", NodeAclScopeKind.UnsLine, "line-42", NodePermissions.Read) };
var trie = PermissionTrieBuilder.Build("c1", 1, rows);
// Root has one child — "line-42".
trie.Root.Children.ShouldContainKey("line-42");
var node = trie.Root.Children["line-42"];
node.Grants.Count.ShouldBe(1);
}
/// <summary>
/// Core-011 regression: when a sub-cluster row's ScopeId is not in the supplied
/// <c>scopePaths</c>, the fallback diagnostic callback must fire so the caller can
/// surface a warning. Silently dropping the grant under the wrong trie level is the
/// production hazard the finding flagged.
/// </summary>
[Fact]
public void Build_Missing_ScopePath_Entry_Invokes_Diagnostic_Callback()
{
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
{
["line-known"] = new(new[] { "ns", "area-1", "line-known" }),
};
// Row references a line that is NOT in the path lookup.
var rows = new[]
{
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-orphan", NodePermissions.Read),
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-known", NodePermissions.Read),
};
var diagnostics = new List<PermissionTrieBuildDiagnostic>();
var trie = PermissionTrieBuilder.Build("c1", 1, rows, paths, diagnostics.Add);
diagnostics.Count.ShouldBe(1, "exactly one row had no matching scope-path entry");
diagnostics[0].ScopeId.ShouldBe("line-orphan");
diagnostics[0].ScopeKind.ShouldBe(NodeAclScopeKind.UnsLine);
diagnostics[0].Reason.ShouldBe(PermissionTrieBuildDiagnosticReason.MissingScopePath);
}
[Fact]
public void Build_No_Diagnostic_When_All_Sub_Cluster_Rows_Have_ScopePaths()
{
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
{
["line-A"] = new(new[] { "ns", "area-1", "line-A" }),
["line-B"] = new(new[] { "ns", "area-1", "line-B" }),
};
var rows = new[]
{
Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read), // cluster-level — no descent
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-A", NodePermissions.Read),
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-B", NodePermissions.Read),
};
var diagnostics = new List<PermissionTrieBuildDiagnostic>();
PermissionTrieBuilder.Build("c1", 1, rows, paths, diagnostics.Add);
diagnostics.ShouldBeEmpty("no rows are missing a scope-path entry");
}
[Fact]
public void Build_Diagnostic_Callback_Optional_When_ScopePaths_Null()
{
// No diagnostics callback should fire when scopePaths itself is null — that's the
// "deterministic-test fallback" mode, not a production drop.
var rows = new[] { Row("cn=ops", NodeAclScopeKind.UnsLine, "line-42", NodePermissions.Read) };
var diagnostics = new List<PermissionTrieBuildDiagnostic>();
PermissionTrieBuilder.Build("c1", 1, rows, scopePaths: null, diagnostic: diagnostics.Add);
diagnostics.ShouldBeEmpty(
"scopePaths=null is the explicit test-fallback mode and must not emit per-row warnings");
}
}
@@ -77,4 +77,162 @@ public sealed class DriverHostTests
host.RegisteredDriverIds.ShouldNotContain("d-1");
driver.ShutDown.ShouldBeTrue();
}
/// <summary>
/// Core-004 regression — DriverHost is a library type whose async calls must use
/// ConfigureAwait(false) to match the convention used by CapabilityInvoker /
/// AlarmSurfaceInvoker. Asserts the awaited driver call does not post its
/// continuation back to a captured SynchronizationContext.
/// The driver awaits an unsettled TaskCompletionSource so it does not introduce its
/// own capture — only DriverHost's await of the returned Task can drive a post.
/// </summary>
[Fact]
public async Task RegisterAsync_Does_Not_Capture_SynchronizationContext()
{
var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
var driver = new TcsDriver("d-cfg-1", tcs);
var ctx = new TrackingSynchronizationContext();
// Run the DriverHost call on a dedicated thread that has our tracking SyncContext installed.
var workerCtx = await RunOnContextAsync(ctx, async () =>
{
var host = new DriverHost();
var registerTask = host.RegisterAsync(driver, "{}", CancellationToken.None);
// Complete the driver's InitializeAsync from a background thread so DriverHost's
// await must resume via the captured context if ConfigureAwait(false) was missing.
_ = Task.Run(() => tcs.SetResult());
await registerTask.ConfigureAwait(false);
await host.DisposeAsync().ConfigureAwait(false);
});
workerCtx.PostCount.ShouldBe(0,
"RegisterAsync's awaited driver call must use ConfigureAwait(false) so the continuation does not post back to the captured context");
}
[Fact]
public async Task UnregisterAsync_Does_Not_Capture_SynchronizationContext()
{
var initTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
initTcs.SetResult();
var shutdownTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
var driver = new TcsDriver("d-cfg-2", initTcs, shutdownTcs);
var ctx = new TrackingSynchronizationContext();
var workerCtx = await RunOnContextAsync(ctx, async () =>
{
var host = new DriverHost();
await host.RegisterAsync(driver, "{}", CancellationToken.None).ConfigureAwait(false);
// After RegisterAsync we re-enter the context. Reset the post counter so we only
// observe UnregisterAsync's behaviour from here on.
((TrackingSynchronizationContext)SynchronizationContext.Current!).Reset();
var task = host.UnregisterAsync("d-cfg-2", CancellationToken.None);
_ = Task.Run(() => shutdownTcs.SetResult());
await task.ConfigureAwait(false);
});
workerCtx.PostCount.ShouldBe(0,
"UnregisterAsync's awaited shutdown call must use ConfigureAwait(false)");
}
[Fact]
public async Task DisposeAsync_Does_Not_Capture_SynchronizationContext()
{
var initTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
initTcs.SetResult();
var shutdownTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
var driver = new TcsDriver("d-cfg-3", initTcs, shutdownTcs);
var ctx = new TrackingSynchronizationContext();
var workerCtx = await RunOnContextAsync(ctx, async () =>
{
var host = new DriverHost();
await host.RegisterAsync(driver, "{}", CancellationToken.None).ConfigureAwait(false);
((TrackingSynchronizationContext)SynchronizationContext.Current!).Reset();
var task = host.DisposeAsync();
_ = Task.Run(() => shutdownTcs.SetResult());
await task.ConfigureAwait(false);
});
workerCtx.PostCount.ShouldBe(0,
"DisposeAsync's awaited shutdown call must use ConfigureAwait(false)");
}
/// <summary>
/// Run <paramref name="body"/> on a dedicated thread with <paramref name="ctx"/>
/// installed as the current SynchronizationContext, and return <paramref name="ctx"/>
/// after the body completes. The dedicated thread guarantees that resuming via the
/// captured context observably routes through our Post hook (the ThreadPool would
/// otherwise clear the context on the resuming worker).
/// </summary>
private static Task<TrackingSynchronizationContext> RunOnContextAsync(TrackingSynchronizationContext ctx, Func<Task> body)
{
var done = new TaskCompletionSource<TrackingSynchronizationContext>(TaskCreationOptions.RunContinuationsAsynchronously);
var t = new Thread(() =>
{
SynchronizationContext.SetSynchronizationContext(ctx);
try
{
// Pump posted continuations until the body completes.
var task = body();
while (!task.IsCompleted)
{
if (ctx.TryDequeue(out var work)) work();
else Thread.Sleep(1);
}
// Drain any tail continuations.
while (ctx.TryDequeue(out var work)) work();
task.GetAwaiter().GetResult();
done.SetResult(ctx);
}
catch (Exception ex) { done.SetException(ex); }
}) { IsBackground = true };
t.Start();
return done.Task;
}
/// <summary>Driver whose Initialize / Shutdown completions are caller-controlled via TCS.</summary>
private sealed class TcsDriver(string id, TaskCompletionSource initTcs, TaskCompletionSource? shutdownTcs = null) : IDriver
{
public string DriverInstanceId { get; } = id;
public string DriverType => "Tcs";
public Task InitializeAsync(string _, CancellationToken ct) => initTcs.Task;
public Task ReinitializeAsync(string _, CancellationToken ct) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken ct) => (shutdownTcs ?? CompletedTcs).Task;
public DriverHealth GetHealth() => new(DriverState.Healthy, null, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken ct) => Task.CompletedTask;
private static readonly TaskCompletionSource CompletedTcs = MakeCompleted();
private static TaskCompletionSource MakeCompleted()
{
var t = new TaskCompletionSource();
t.SetResult();
return t;
}
}
/// <summary>SynchronizationContext that queues posts to a thread-safe work list and counts them.</summary>
private sealed class TrackingSynchronizationContext : SynchronizationContext
{
private readonly System.Collections.Concurrent.ConcurrentQueue<Action> _queue = new();
public int PostCount;
public int SendCount;
public override void Post(SendOrPostCallback d, object? state)
{
Interlocked.Increment(ref PostCount);
_queue.Enqueue(() => d(state));
}
public override void Send(SendOrPostCallback d, object? state)
{
Interlocked.Increment(ref SendCount);
d(state);
}
public bool TryDequeue(out Action work) => _queue.TryDequeue(out work!);
public void Reset() { Interlocked.Exchange(ref PostCount, 0); Interlocked.Exchange(ref SendCount, 0); }
}
}
@@ -143,6 +143,41 @@ public sealed class GenericDriverNodeManagerTests
nm.BuildAddressSpaceAsync(new RecordingBuilder(), CancellationToken.None));
}
/// <summary>
/// Core-008 regression: the XML doc states exception isolation is the caller's
/// responsibility — exceptions from <see cref="ITagDiscovery.DiscoverAsync"/> must propagate
/// out of <c>BuildAddressSpaceAsync</c> unhandled so the Server layer's per-driver try/catch
/// (<c>OpcUaApplicationHost.PopulateAddressSpaces</c>) can mark the subtree Faulted.
/// </summary>
[Fact]
public async Task BuildAddressSpaceAsync_Propagates_Discovery_Exceptions_To_Caller()
{
var driver = new ThrowingDiscoveryDriver();
using var nm = new GenericDriverNodeManager(driver);
var ex = await Should.ThrowAsync<InvalidOperationException>(() =>
nm.BuildAddressSpaceAsync(new RecordingBuilder(), CancellationToken.None));
ex.Message.ShouldBe("discovery boom",
"exceptions from DiscoverAsync must propagate unhandled — exception isolation is the caller's responsibility (e.g. OpcUaApplicationHost)");
}
/// <summary>Driver whose DiscoverAsync throws — exercises the exception-isolation boundary.</summary>
private sealed class ThrowingDiscoveryDriver : IDriver, ITagDiscovery
{
public string DriverInstanceId => "throwing";
public string DriverType => "Throwing";
public Task InitializeAsync(string _, CancellationToken __) => Task.CompletedTask;
public Task ReinitializeAsync(string _, CancellationToken __) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken _) => Task.CompletedTask;
public DriverHealth GetHealth() => new(DriverState.Healthy, null, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken _) => Task.CompletedTask;
public Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken ct)
=> throw new InvalidOperationException("discovery boom");
}
// --- test doubles ---
private sealed class FakeDriver : IDriver, ITagDiscovery, IAlarmSource
@@ -67,4 +67,53 @@ public sealed class DriverHealthReportTests
{
DriverHealthReport.HttpStatus(verdict).ShouldBe(expected);
}
/// <summary>
/// Core-012 regression: <see cref="DriverState.Reconnecting"/> must aggregate to
/// <see cref="ReadinessVerdict.Degraded"/> — the doc remarks state matrix lists this
/// mapping (after the Core-012 doc fix that added the Reconnecting row).
/// </summary>
[Fact]
public void Any_Reconnecting_WithoutFaultedOrNotReady_IsDegraded()
{
var verdict = DriverHealthReport.Aggregate([
new DriverHealthSnapshot("a", DriverState.Healthy),
new DriverHealthSnapshot("b", DriverState.Reconnecting),
]);
verdict.ShouldBe(ReadinessVerdict.Degraded,
"Reconnecting = driver alive but not serving live data → /readyz stays 200 while operators see the affected driver in the body");
}
/// <summary>
/// Core-012 regression: assert the XML <c>&lt;remarks&gt;</c> on
/// <see cref="DriverHealthReport"/> names <see cref="DriverState.Reconnecting"/> in its
/// state matrix. Catches a future doc-drift if someone re-aliases Reconnecting without
/// updating the matrix.
/// </summary>
[Fact]
public void Doc_State_Matrix_Includes_Reconnecting()
{
var xmlPath = Path.Combine(
AppContext.BaseDirectory,
"ZB.MOM.WW.OtOpcUa.Core.xml");
File.Exists(xmlPath).ShouldBeTrue($"expected XML doc file at {xmlPath}");
var content = File.ReadAllText(xmlPath);
var driverHealthReportRemarks = ExtractRemarksFor(content, "T:ZB.MOM.WW.OtOpcUa.Core.Observability.DriverHealthReport");
driverHealthReportRemarks.ShouldContain("Reconnecting");
}
private static string ExtractRemarksFor(string xml, string member)
{
var memberStart = xml.IndexOf($"<member name=\"{member}\"", StringComparison.Ordinal);
if (memberStart < 0) return string.Empty;
var memberEnd = xml.IndexOf("</member>", memberStart, StringComparison.Ordinal);
if (memberEnd < 0) return string.Empty;
var slice = xml.Substring(memberStart, memberEnd - memberStart);
var remarksStart = slice.IndexOf("<remarks>", StringComparison.Ordinal);
if (remarksStart < 0) return string.Empty;
var remarksEnd = slice.IndexOf("</remarks>", remarksStart, StringComparison.Ordinal);
return remarksEnd < 0 ? string.Empty : slice.Substring(remarksStart, remarksEnd - remarksStart);
}
}
@@ -148,4 +148,66 @@ public sealed class CapabilityInvokerTests
builder.CachedPipelineCount.ShouldBe(2);
}
/// <summary>
/// Core-009 regression: ExecuteWriteAsync's non-idempotent branch must snapshot
/// <c>_optionsAccessor</c> exactly once per call. Calling it multiple times allocates
/// redundant options objects on the per-write hot path and creates a consistency hazard
/// where an Admin edit mid-call could observe two different snapshots.
/// </summary>
[Fact]
public async Task ExecuteWriteAsync_NonIdempotent_Snapshots_Options_Once_Per_Call()
{
var options = new DriverResilienceOptions
{
Tier = DriverTier.A,
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
{
[DriverCapability.Write] = new(TimeoutSeconds: 2, RetryCount: 3, BreakerFailureThreshold: 5),
},
};
var accessorCalls = 0;
var invoker = new CapabilityInvoker(
new DriverResiliencePipelineBuilder(),
"drv-test",
() => { Interlocked.Increment(ref accessorCalls); return options; });
await invoker.ExecuteWriteAsync(
"host-1",
isIdempotent: false,
_ => ValueTask.FromResult(0),
CancellationToken.None);
accessorCalls.ShouldBe(1,
"ExecuteWriteAsync's non-idempotent branch must capture the options snapshot exactly once per call");
}
/// <summary>
/// Core-009 regression — companion consistency assertion: the non-idempotent branch must
/// not observe two different option snapshots even if the accessor's returned value changes
/// between calls (simulating an Admin edit landing mid-flight). With a single snapshot the
/// two derived values (<c>with</c> base + <c>Resolve(Write)</c>) come from the same options
/// instance.
/// </summary>
[Fact]
public async Task ExecuteWriteAsync_NonIdempotent_Uses_Consistent_Options_Snapshot()
{
var a = new DriverResilienceOptions { Tier = DriverTier.A };
var b = new DriverResilienceOptions { Tier = DriverTier.B };
var alternating = new[] { a, b, a, b }.AsEnumerable().GetEnumerator();
var invoker = new CapabilityInvoker(
new DriverResiliencePipelineBuilder(),
"drv-test",
() => { alternating.MoveNext(); return alternating.Current; });
// If options is read twice, the with-expression and Resolve(Write) come from
// different tier tables (A then B) — the resulting one-entry dictionary is
// inconsistent with the snapshot used for the rest of the options. Single-snapshot
// semantics guarantee the call sees a coherent view.
await Should.NotThrowAsync(async () => await invoker.ExecuteWriteAsync(
"host-1",
isIdempotent: false,
_ => ValueTask.FromResult(0),
CancellationToken.None));
}
}
@@ -99,4 +99,49 @@ public sealed class DriverResilienceOptionsTests
options.Resolve(DriverCapability.Write).ShouldBe(
DriverResilienceOptions.GetTierDefaults(DriverTier.A)[DriverCapability.Write]);
}
/// <summary>
/// Core-010 regression: every <see cref="DriverCapability"/> value must successfully resolve
/// under every tier with a default <see cref="DriverResilienceOptions"/>. A future
/// enum-only addition that forgets to update <c>GetTierDefaults</c> would otherwise blow up
/// on the hot path with <see cref="KeyNotFoundException"/>.
/// </summary>
[Theory]
[InlineData(DriverTier.A)]
[InlineData(DriverTier.B)]
[InlineData(DriverTier.C)]
public void Resolve_Returns_NonNull_Policy_For_Every_Capability(DriverTier tier)
{
var options = new DriverResilienceOptions { Tier = tier };
foreach (var capability in Enum.GetValues<DriverCapability>())
{
var policy = options.Resolve(capability);
policy.ShouldNotBeNull(
$"every DriverCapability must resolve to a non-null policy for tier {tier} — {capability} did not");
}
}
/// <summary>
/// Core-010 regression: when a capability is somehow missing from BOTH the override
/// map and the tier defaults (defensive — should be impossible thanks to the
/// <c>TierDefaults_Cover_EveryCapability</c> invariant, but is the failure mode the
/// finding flagged), <c>Resolve</c> must throw a diagnostic <see cref="KeyNotFoundException"/>
/// that names the missing capability and tier — not a bare lookup failure.
/// </summary>
[Fact]
public void Resolve_Throws_Diagnostic_When_Capability_Missing_From_Tier_Defaults()
{
// Use a CapabilityPolicies dict that purposely omits one capability and use reflection
// to confirm the message names the capability when the tier defaults also omit it.
// We can't easily mutate GetTierDefaults so we exercise the documented behavior on a
// synthetic non-tier-known capability (we cast an out-of-range enum value).
var options = new DriverResilienceOptions { Tier = DriverTier.A };
var bogus = (DriverCapability)int.MaxValue;
var ex = Should.Throw<KeyNotFoundException>(() => options.Resolve(bogus));
ex.Message.ShouldContain(bogus.ToString());
ex.Message.ShouldContain(DriverTier.A.ToString());
ex.Message.ShouldContain(nameof(DriverResilienceOptions.GetTierDefaults));
}
}
@@ -109,4 +109,40 @@ public sealed class WedgeDetectorTests
new DemandSignal(0, 0, 1, Now).HasPendingWork.ShouldBeTrue();
new DemandSignal(0, 0, 0, Now).HasPendingWork.ShouldBeFalse();
}
/// <summary>
/// Core-012 regression: the XML <c>&lt;summary&gt;</c> on the <see cref="WedgeDetector"/>
/// constructor must accurately describe what the constructor does (take + clamp the
/// threshold). The previous text — "Whether the driver reported itself Healthy at
/// construction" — referenced behaviour the constructor doesn't perform.
/// </summary>
[Fact]
public void Doc_Constructor_Summary_Describes_Threshold_Clamp()
{
var xmlPath = Path.Combine(
AppContext.BaseDirectory,
"ZB.MOM.WW.OtOpcUa.Core.xml");
File.Exists(xmlPath).ShouldBeTrue($"expected XML doc file at {xmlPath}");
var content = File.ReadAllText(xmlPath);
var ctorSummary = ExtractSummaryFor(content,
"M:ZB.MOM.WW.OtOpcUa.Core.Stability.WedgeDetector.#ctor(System.TimeSpan)");
ctorSummary.ShouldNotBeNullOrWhiteSpace();
ctorSummary.ShouldNotContain("reported itself");
ctorSummary.ShouldContain("threshold");
}
private static string ExtractSummaryFor(string xml, string member)
{
var memberStart = xml.IndexOf($"<member name=\"{member}\"", StringComparison.Ordinal);
if (memberStart < 0) return string.Empty;
var memberEnd = xml.IndexOf("</member>", memberStart, StringComparison.Ordinal);
if (memberEnd < 0) return string.Empty;
var slice = xml.Substring(memberStart, memberEnd - memberStart);
var sumStart = slice.IndexOf("<summary>", StringComparison.Ordinal);
if (sumStart < 0) return string.Empty;
var sumEnd = slice.IndexOf("</summary>", sumStart, StringComparison.Ordinal);
return sumEnd < 0 ? string.Empty : slice.Substring(sumStart, sumEnd - sumStart);
}
}
@@ -51,11 +51,14 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions {
IReadOnlyList<HostConnectivityStatus> GetHostStatuses();
}
public class HistoryReadResult { }
public class HistoricalEventsResult { }
public interface IHistoryProvider {
Task<HistoryReadResult> ReadRawAsync(string fullRef, DateTime start, DateTime end, uint max, CancellationToken ct);
Task<HistoryReadResult> ReadProcessedAsync(string fullRef, DateTime start, DateTime end, TimeSpan interval, CancellationToken ct);
Task<HistoryReadResult> ReadAtTimeAsync(string fullRef, IReadOnlyList<DateTime> timestamps, CancellationToken ct)
=> throw new NotSupportedException();
Task<HistoricalEventsResult> ReadEventsAsync(string? sourceName, DateTime start, DateTime end, int maxEvents, CancellationToken ct)
=> throw new NotSupportedException();
}
public enum DriverCapability { Read, Write, Discover, AlarmSubscribe, AlarmAcknowledge }
}
@@ -586,6 +589,246 @@ namespace ZB.MOM.WW.OtOpcUa.Server {
diags.ShouldBeEmpty();
}
// =======================================================================
// Analyzers-002 — AlarmSurfaceInvoker has no lambda parameters; calls inside
// its own method bodies are covered transitively by the CapabilityInvoker
// match because the actual wrapping lambda lives on _invoker.ExecuteAsync.
// =======================================================================
[Fact]
public async Task GuardedCall_InsideAlarmSurfaceInvokerMethod_WrappedByCapabilityInvoker_PassesCleanly()
{
// Mirrors the real AlarmSurfaceInvoker pattern: it owns an inner CapabilityInvoker
// and routes IAlarmSource calls through CapabilityInvoker.ExecuteAsync. The transitive
// CapabilityInvoker wrapper match is what suppresses the diagnostic — the analyzer
// does NOT need an AlarmSurfaceInvoker-typed lambda escape hatch (it has no lambda
// params anyway).
const string userSrc = """
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
namespace ZB.MOM.WW.OtOpcUa.Core.Resilience.Test {
public sealed class FakeAlarmSurface {
private readonly CapabilityInvoker _invoker;
private readonly IAlarmSource _source;
public FakeAlarmSurface(CapabilityInvoker i, IAlarmSource s) { _invoker = i; _source = s; }
public async Task<IAlarmSubscriptionHandle> SubscribeAsync(IReadOnlyList<string> ids, CancellationToken ct) {
return await _invoker.ExecuteAsync(DriverCapability.AlarmSubscribe, "h1",
async cct => await _source.SubscribeAlarmsAsync(ids, cct), ct);
}
}
}
""";
var diags = await Compile(userSrc);
diags.ShouldBeEmpty();
}
// =======================================================================
// Analyzers-003 — false positives must not appear on valid code. We can't
// easily force a null SemanticModel through RegisterOperationAction, but we
// can pin the no-false-positive contract for unrelated invocations.
// =======================================================================
[Fact]
public async Task NonGuardedAsyncCall_DoesNotTrip()
{
// Unrelated async call on a non-guarded type — must not be flagged regardless of
// analyzer internals (semantic-model availability, etc.).
const string userSrc = """
using System.Threading;
using System.Threading.Tasks;
namespace ZB.MOM.WW.OtOpcUa.Server {
public interface IUnrelated { Task DoStuffAsync(CancellationToken ct); }
public sealed class UnrelatedCaller {
public async Task DoIt(IUnrelated x) {
await x.DoStuffAsync(CancellationToken.None);
}
}
}
""";
var diags = await Compile(userSrc);
diags.ShouldBeEmpty();
}
// =======================================================================
// Analyzers-004 — when the Core.Abstractions guarded interfaces are not
// referenced at all, the analyzer must be a no-op for the compilation.
// =======================================================================
[Fact]
public async Task Compilation_WithoutGuardedInterfaceReferences_EmitsNoDiagnostics()
{
// A standalone compilation that does not pull in the StubSources at all — the
// RegisterCompilationStartAction symbol-resolution fast-path must skip cleanly
// when none of the guarded types exist.
const string userSrc = """
using System.Threading;
using System.Threading.Tasks;
namespace SomeOther {
public interface IReadable { Task ReadAsync(CancellationToken ct); }
public sealed class Caller {
public async Task DoIt(IReadable x) {
await x.ReadAsync(CancellationToken.None);
}
}
}
""";
var diags = await CompileWithoutStubs(userSrc);
diags.ShouldBeEmpty();
}
// =======================================================================
// Analyzers-005 — IHistoryProvider default-interface-method asymmetry
// =======================================================================
[Fact]
public async Task Direct_ReadAtTimeAsync_OnConcreteDriverInheritingDIM_TripsDiagnostic()
{
// ConcreteHistoryDriver does NOT override ReadAtTimeAsync (it inherits the DIM).
// The call site has a concrete-receiver type — the analyzer must still flag the call
// because the bound method (ReadAtTimeAsync) is the interface's own default impl.
const string userSrc = """
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Server {
public sealed class PartialHistoryDriver : IHistoryProvider {
public Task<HistoryReadResult> ReadRawAsync(string r, DateTime s, DateTime e, uint m, CancellationToken ct) => throw null!;
public Task<HistoryReadResult> ReadProcessedAsync(string r, DateTime s, DateTime e, TimeSpan i, CancellationToken ct) => throw null!;
// ReadAtTimeAsync + ReadEventsAsync NOT overridden — driver inherits the DIM throws.
}
public sealed class BadPartialHistoryCaller {
public async Task DoIt(PartialHistoryDriver driver) {
// Cast through the interface so the binder resolves to IHistoryProvider.ReadAtTimeAsync
// (the DIM). FindImplementationForInterfaceMember returns the interface method itself
// when nothing overrides it.
var iface = (IHistoryProvider)driver;
_ = await iface.ReadAtTimeAsync("tag", new List<DateTime>(), CancellationToken.None);
}
}
}
""";
var diags = await Compile(userSrc);
diags.Length.ShouldBe(1);
diags[0].GetMessage().ShouldContain("ReadAtTimeAsync");
}
[Fact]
public async Task Direct_ReadEventsAsync_OnConcreteDriverInheritingDIM_TripsDiagnostic()
{
// Same as above but for ReadEventsAsync — confirms both DIMs are caught when the
// driver doesn't override them.
const string userSrc = """
using System;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Server {
public sealed class PartialHistoryDriver2 : IHistoryProvider {
public Task<HistoryReadResult> ReadRawAsync(string r, DateTime s, DateTime e, uint m, CancellationToken ct) => throw null!;
public Task<HistoryReadResult> ReadProcessedAsync(string r, DateTime s, DateTime e, TimeSpan i, CancellationToken ct) => throw null!;
}
public sealed class BadPartialHistoryEventsCaller {
public async Task DoIt(PartialHistoryDriver2 driver) {
var iface = (IHistoryProvider)driver;
_ = await iface.ReadEventsAsync(null, DateTime.MinValue, DateTime.MaxValue, 100, CancellationToken.None);
}
}
}
""";
var diags = await Compile(userSrc);
diags.Length.ShouldBe(1);
diags[0].GetMessage().ShouldContain("ReadEventsAsync");
}
[Fact]
public async Task Direct_ReadAtTimeAsync_OnConcreteDriverOverridingDIM_TripsDiagnostic()
{
// Driver explicitly overrides ReadAtTimeAsync — the concrete-receiver call goes through
// the override; FindImplementationForInterfaceMember returns the override method, which
// SymbolEqualityComparer matches against method.
const string userSrc = """
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Server {
public sealed class FullHistoryDriver : IHistoryProvider {
public Task<HistoryReadResult> ReadRawAsync(string r, DateTime s, DateTime e, uint m, CancellationToken ct) => throw null!;
public Task<HistoryReadResult> ReadProcessedAsync(string r, DateTime s, DateTime e, TimeSpan i, CancellationToken ct) => throw null!;
public Task<HistoryReadResult> ReadAtTimeAsync(string r, IReadOnlyList<DateTime> ts, CancellationToken ct) => throw null!;
}
public sealed class BadFullHistoryCaller {
public async Task DoIt(FullHistoryDriver driver) {
_ = await driver.ReadAtTimeAsync("tag", new List<DateTime>(), CancellationToken.None);
}
}
}
""";
var diags = await Compile(userSrc);
diags.Length.ShouldBe(1);
diags[0].GetMessage().ShouldContain("ReadAtTimeAsync");
}
[Fact]
public async Task Wrapped_ReadAtTimeAsync_DIM_InsideCapabilityInvokerLambda_PassesCleanly()
{
// DIM wrapped properly — must not trip.
const string userSrc = """
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
namespace ZB.MOM.WW.OtOpcUa.Server {
public sealed class GoodHistoryAtTime {
public async Task DoIt(IHistoryProvider provider, CapabilityInvoker invoker) {
_ = await invoker.ExecuteAsync(DriverCapability.Read, "h1",
async ct => await provider.ReadAtTimeAsync("tag", new List<DateTime>(), ct),
CancellationToken.None);
}
}
}
""";
var diags = await Compile(userSrc);
diags.ShouldBeEmpty();
}
private static async Task<ImmutableArray<Diagnostic>> CompileWithoutStubs(string userSource)
{
var syntaxTrees = new[] { CSharpSyntaxTree.ParseText(userSource) };
var references = AppDomain.CurrentDomain.GetAssemblies()
.Where(a => !a.IsDynamic && !string.IsNullOrEmpty(a.Location))
.Select(a => MetadataReference.CreateFromFile(a.Location))
.Cast<MetadataReference>()
.ToList();
var compilation = CSharpCompilation.Create(
assemblyName: "AnalyzerTestAssembly_NoStubs",
syntaxTrees: syntaxTrees,
references: references,
options: new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));
var withAnalyzers = compilation.WithAnalyzers(
ImmutableArray.Create<DiagnosticAnalyzer>(new UnwrappedCapabilityCallAnalyzer()));
var allDiags = await withAnalyzers.GetAnalyzerDiagnosticsAsync();
return allDiags.Where(d => d.Id == UnwrappedCapabilityCallAnalyzer.DiagnosticId).ToImmutableArray();
}
private static async Task<ImmutableArray<Diagnostic>> Compile(string userSource)
{
var syntaxTrees = new[]