docs(code-reviews): re-review batch 2 at 39d737e — ConfigurationDatabase, DataConnectionLayer, DeploymentManager, ExternalSystemGateway, HealthMonitoring

17 new findings: ConfigurationDatabase-012..014, DataConnectionLayer-014..017, DeploymentManager-015..017, ExternalSystemGateway-015..017, HealthMonitoring-013..016.
This commit is contained in:
Joseph Doherty
2026-05-17 00:45:10 -04:00
parent e49846603e
commit 89636e2bbf
6 changed files with 895 additions and 64 deletions

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.ConfigurationDatabase` | | Module | `src/ScadaLink.ConfigurationDatabase` |
| Design doc | `docs/requirements/Component-ConfigurationDatabase.md` | | Design doc | `docs/requirements/Component-ConfigurationDatabase.md` |
| Status | Reviewed | | Status | Reviewed |
| Last reviewed | 2026-05-16 | | Last reviewed | 2026-05-17 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `9c60592` | | Commit reviewed | `39d737e` |
| Open findings | 0 | | Open findings | 3 |
## Summary ## Summary
@@ -37,6 +37,28 @@ repositories (`TemplateEngineRepository`, `DeploymentManagerRepository`,
`ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`, `ExternalSystemRepository`, `InboundApiRepository`, `NotificationRepository`,
`SiteRepository`, `InstanceLocator`) have little or no direct coverage. `SiteRepository`, `InstanceLocator`) have little or no direct coverage.
#### Re-review 2026-05-17 (commit `39d737e`)
Re-reviewed the module at commit `39d737e`. All eleven findings from the initial
review (`9c60592`) remain `Resolved` — the secret-column encryption
(`EncryptedStringConverter` + `ApplySecretColumnEncryption`), the fail-fast no-arg
DI overload, the `AsSplitQuery` conversions, the audit cycle-safe serializer, and the
added repository test coverage all verified present and consistent with their
resolutions. Three new findings were recorded. The most material is that the
encryption work done for CD-004 left one bearer credential out of scope:
`ApiKey.KeyValue` — the inbound-API authentication secret — is still persisted in
plaintext (`ConfigurationDatabase-012`); it cannot use the same non-deterministic
Data Protection converter because authentication looks the key up *by value*, so it
needs a hash-based scheme instead. The second is a resilience gap in the encryption
plumbing itself: `ApplySecretColumnEncryption` silently substitutes a throwaway
`EphemeralDataProtectionProvider` whenever no provider is supplied, so any context
constructed via the single-argument constructor on a write path would encrypt
secrets with a key discarded at process exit, yielding permanently undecryptable
ciphertext with no error (`ConfigurationDatabase-013`). The third is a minor
inconsistency — a redundant cast on one of the three `HasConversion` calls
(`ConfigurationDatabase-014`). The module is otherwise healthy and the prior fixes
hold up well.
## Checklist coverage ## Checklist coverage
| # | Category | Examined | Notes | | # | Category | Examined | Notes |
@@ -44,11 +66,11 @@ repositories (`TemplateEngineRepository`, `DeploymentManagerRepository`,
| 1 | Correctness & logic bugs | ✓ | `GetTemplateWithChildrenAsync` discards loaded children (CD-001); `GetApprovedKeysForMethodAsync` CSV parsing is brittle (CD-008). | | 1 | Correctness & logic bugs | ✓ | `GetTemplateWithChildrenAsync` discards loaded children (CD-001); `GetApprovedKeysForMethodAsync` CSV parsing is brittle (CD-008). |
| 2 | Akka.NET conventions | ✓ | No actors in this module; data-access layer only. No issues found. | | 2 | Akka.NET conventions | ✓ | No actors in this module; data-access layer only. No issues found. |
| 3 | Concurrency & thread safety | ✓ | DbContext correctly scoped; optimistic concurrency on `DeploymentRecord` correct. Repositories hold no shared mutable state. No issues found. | | 3 | Concurrency & thread safety | ✓ | DbContext correctly scoped; optimistic concurrency on `DeploymentRecord` correct. Repositories hold no shared mutable state. No issues found. |
| 4 | Error handling & resilience | ✓ | `WaitForDatabaseReadyAsync` is sound. No-arg DI overload fails late and silently (CD-003); audit JSON serialization failure handling (CD-007). | | 4 | Error handling & resilience | ✓ | `WaitForDatabaseReadyAsync` is sound. No-arg DI overload fails late and silently (CD-003, resolved); audit JSON serialization failure handling (CD-007, resolved). Re-review: ephemeral Data Protection fallback can silently produce undecryptable ciphertext (CD-013). |
| 5 | Security | ✓ | Hardcoded `sa` credential literal (CD-002); SMTP/DB-connection/auth secrets stored unencrypted (CD-004). | | 5 | Security | ✓ | Hardcoded `sa` credential literal (CD-002, resolved); SMTP/DB-connection/auth secrets unencrypted (CD-004, resolved). Re-review: `ApiKey.KeyValue` bearer credential still stored in plaintext (CD-012). |
| 6 | Performance & resource management | ✓ | `GetAllTemplatesAsync` / `GetTemplateTreeAsync` eager-load multiple collections without `AsSplitQuery` (CD-009). No N+1 in audited paths. | | 6 | Performance & resource management | ✓ | `GetAllTemplatesAsync` / `GetTemplateTreeAsync` eager-load multiple collections without `AsSplitQuery` (CD-009, resolved). No N+1 in audited paths. Re-review: no new issues. |
| 7 | Design-document adherence | ✓ | Audit `Id` type mismatch vs design doc (CD-005); seed data uses `HasData` consistent with design. | | 7 | Design-document adherence | ✓ | Audit `Id` type mismatch vs design doc (CD-005, resolved); seed data uses `HasData` consistent with design. Re-review: no new issues. |
| 8 | Code organization & conventions | ✓ | Mostly clean. `Grpc*` address columns unbounded (CD-006); inconsistent null-guard on injected context (CD-011). | | 8 | Code organization & conventions | ✓ | Mostly clean. `Grpc*` address columns unbounded (CD-006, resolved); inconsistent null-guard on injected context (CD-011, resolved). Re-review: redundant/inconsistent cast on one `HasConversion` call (CD-014). |
| 9 | Testing coverage | ✓ | Several repositories and `InstanceLocator` lack direct tests (CD-010). | | 9 | Testing coverage | ✓ | Several repositories and `InstanceLocator` lack direct tests (CD-010). |
| 10 | Documentation & comments | ✓ | `DeploymentManagerRepository` "WP-24 stub" XML comment is stale; noted in module context but not raised as a standalone finding. No issues found beyond items above. | | 10 | Documentation & comments | ✓ | `DeploymentManagerRepository` "WP-24 stub" XML comment is stale; noted in module context but not raised as a standalone finding. No issues found beyond items above. |
@@ -570,3 +592,128 @@ every data-access type behaves uniformly and a hand-constructed instance fails w
informative exception at construction rather than a later `NullReferenceException`. informative exception at construction rather than a later `NullReferenceException`.
Regression: `Constructor_NullContext_Throws` tests were added for all four affected types Regression: `Constructor_NullContext_Throws` tests were added for all four affected types
(`InboundApiRepositoryTests.cs`, `RepositoryCoverageTests.cs`). (`InboundApiRepositoryTests.cs`, `RepositoryCoverageTests.cs`).
### ConfigurationDatabase-012 — Inbound-API `ApiKey.KeyValue` bearer credential stored in plaintext
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Configurations/InboundApiConfiguration.cs:17-19` |
**Description**
`ApiKey.KeyValue` is the bearer credential presented in the `X-API-Key` header to
authenticate Inbound API requests (HighLevelReqs §7.27.3). It is mapped as an
ordinary `nvarchar(500)` column with a unique index and persisted verbatim. Anyone
with read access to the configuration database — or to any `AuditLogEntry.AfterStateJson`
into which an `ApiKey` entity is serialized — obtains live API credentials in cleartext.
`ConfigurationDatabase-004` introduced encryption-at-rest for the other secret-bearing
columns (SMTP credentials, external-system auth config, database connection strings)
but explicitly scoped `ApiKey.KeyValue` out. The omission is genuine: the
`EncryptedStringConverter` built for CD-004 is backed by ASP.NET Data Protection, which
is **non-deterministic** — the same plaintext encrypts to different ciphertext each
time — so it cannot be applied here, because `GetApprovedKeysForMethodAsync` and the
authentication path resolve a key by its value (`GetApiKeyByValueAsync` does
`FirstOrDefaultAsync(k => k.KeyValue == keyValue)`). A non-deterministic converter would
break that equality lookup. The result is that the one credential most exposed to
external callers is the one credential left unprotected.
**Recommendation**
Store a salted cryptographic hash of the key value instead of the plaintext (or
ciphertext): hash on create, and authenticate by hashing the presented key and
comparing. This keeps the equality lookup working (the hash is deterministic) while
ensuring the database never holds a usable credential. The plaintext key would then
only ever be shown once, at creation time, to the Admin who created it. This requires
a coordinated change with the Inbound API / Security components and the `ApiKey`
entity in Commons; record the chosen scheme in
`docs/requirements/Component-ConfigurationDatabase.md` and the Inbound API design doc.
At minimum, ensure `ApiKey` entities are never passed to `IAuditService` without the
key value redacted.
**Resolution**
_Unresolved._
### ConfigurationDatabase-013 — Secret-column encryption silently falls back to an ephemeral (throwaway) key
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs:107-124` |
**Description**
`ApplySecretColumnEncryption` resolves the Data Protection provider as
`_dataProtectionProvider ?? new EphemeralDataProtectionProvider()`. The `??` fallback
is reached whenever the context is constructed via the single-argument
`ScadaLinkDbContext(DbContextOptions)` constructor — i.e. whenever no provider was
injected. An `EphemeralDataProtectionProvider` generates a key ring that lives only in
process memory and is discarded at process exit.
For design-time `dotnet ef` tooling this is harmless (the XML remark correctly notes
it only emits schema). The risk is on a *runtime write path*. The runtime currently
gets the provider-bearing context only because `AddConfigurationDatabase` adds an
`AddScoped` factory registration that overrides EF's activator-based registration.
That override is the single thing standing between correct behaviour and silent data
corruption: any future change that resolves a `ScadaLinkDbContext` through a path the
override does not cover — an `AddPooledDbContextFactory`/`IDbContextFactory<ScadaLinkDbContext>`
registration, a second `AddDbContext` call, a hand-constructed context in server code —
would construct the context with the single-arg constructor, encrypt secret columns
with a throwaway key, and persist ciphertext that becomes **permanently undecryptable
the moment the process restarts**. There is no exception, no warning; the failure only
surfaces later as `CryptographicException` on read (mis-attributed by
`EncryptedStringConverter` to "the stored value was not written by this system").
**Recommendation**
Do not silently substitute an ephemeral provider for write-capable contexts. Either:
(a) require the provider unconditionally and have design-time tooling pass an explicit
ephemeral provider so the intent is visible at the call site; or (b) keep the
single-arg constructor but mark contexts built without a real provider as
schema-only — e.g. record a flag and have the encrypting converter throw a clear
`InvalidOperationException` ("secret columns cannot be written without a configured
Data Protection key ring") on the first `Protect`, instead of producing throwaway
ciphertext. Also harden the DI wiring so a `ScadaLinkDbContext` cannot be resolved
through the EF-activator registration at all (e.g. register only the factory, or use
`AddDbContextFactory` with the explicit constructor).
**Resolution**
_Unresolved._
### ConfigurationDatabase-014 — Redundant, inconsistent cast on one `HasConversion` call
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/ScadaLinkDbContext.cs:121-123` |
**Description**
`ApplySecretColumnEncryption` calls `HasConversion(converter)` three times. The first
two (`SmtpConfiguration.Credentials`, `ExternalSystemDefinition.AuthConfiguration`)
pass `converter` directly; the third (`DatabaseConnectionDefinition.ConnectionString`)
casts it to `(Microsoft.EntityFrameworkCore.Storage.ValueConversion.ValueConverter)`.
`EncryptedStringConverter` already derives from `ValueConverter<string?, string?>`
(itself a `ValueConverter`), and the first two call sites compile fine without the
cast, so the cast is redundant. The inconsistency makes the code look as though the
third call needs special handling when it does not, and the fully-qualified type name
inline adds noise.
**Recommendation**
Remove the cast so all three calls read identically as `HasConversion(converter)`.
If a `ValueConverter`-typed reference is genuinely wanted, give it a local variable of
that type once and use it for all three.
**Resolution**
_Unresolved._

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.DataConnectionLayer` | | Module | `src/ScadaLink.DataConnectionLayer` |
| Design doc | `docs/requirements/Component-DataConnectionLayer.md` | | Design doc | `docs/requirements/Component-DataConnectionLayer.md` |
| Status | Reviewed | | Status | Reviewed |
| Last reviewed | 2026-05-16 | | Last reviewed | 2026-05-17 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `9c60592` | | Commit reviewed | `39d737e` |
| Open findings | 0 | | Open findings | 4 |
## Summary ## Summary
@@ -30,20 +30,40 @@ the design doc's failover state machine and the implemented unstable-disconnect
heuristic. Test coverage is adequate for the happy paths and failover but absent for heuristic. Test coverage is adequate for the happy paths and failover but absent for
tag-resolution retry, disconnect/re-subscribe, and concurrency around `HandleSubscribe`. tag-resolution retry, disconnect/re-subscribe, and concurrency around `HandleSubscribe`.
#### Re-review 2026-05-17 (commit `39d737e`)
All 13 findings from the 2026-05-16 review remain `Resolved` and the fixes were
verified in place against the current source (`PipeTo(Self)` subscribe pattern,
`Resume` supervision, `ConcurrentDictionary` callback maps, atomic disconnect guards,
bounded write timeout, etc.). The re-review walked all 10 checklist categories again
and found **4 new findings**: one **High** — the DCL-012 security warning is never
seen in production because `RealOpcUaClientFactory.Create()` constructs
`RealOpcUaClient` with no logger, so the warning sinks into `NullLogger`; one
**Medium** — initial-connect failures in the `Connecting` state never count toward
failover, so a connection whose primary endpoint is unreachable at startup retries the
primary forever and never tries the configured backup; one **Medium**
`HandleSubscribeCompleted` always replies `SubscribeTagsResponse(success: true)` even
when a connection-level subscribe failure is driving the actor into `Reconnecting`,
telling the Instance Actor the subscribe succeeded when it did not; and one **Low**
`WriteBatchAsync` does not catch the `InvalidOperationException` from `EnsureConnected`,
so a mid-batch disconnect aborts the whole write batch (the same class of defect
DCL-007 fixed for `ReadBatchAsync`). New findings are numbered from
`DataConnectionLayer-014`.
## Checklist coverage ## Checklist coverage
| # | Category | Examined | Notes | | # | Category | Examined | Notes |
|---|----------|----------|-------| |---|----------|----------|-------|
| 1 | Correctness & logic bugs | x | `_resolvedTags` double-counting and stale counters after failover; `ReadBatchAsync` aborts mid-batch. | | 1 | Correctness & logic bugs | x | 2026-05-16 findings resolved. Re-review: finding 016 — `SubscribeTagsResponse` reports success on a connection-level subscribe failure. |
| 2 | Akka.NET conventions | x | `Task.Run` mutating actor state (critical); `Restart` supervision loses state; closures capturing `_subscriptionsByInstance`. | | 2 | Akka.NET conventions | x | 2026-05-16 findings resolved (`PipeTo(Self)` subscribe, `Resume` supervision). Re-review: no new issues. |
| 3 | Concurrency & thread safety | x | Actor state mutated off the actor thread; `RealOpcUaClient` callback dictionary unsynchronized. | | 3 | Concurrency & thread safety | x | 2026-05-16 findings resolved (`ConcurrentDictionary`, atomic disconnect guards). Re-review: no new issues. |
| 4 | Error handling & resilience | x | Subscription failures not surfaced; unbounded write with no timeout; reconnect after subscribe-time failure not handled. | | 4 | Error handling & resilience | x | Re-review: finding 015 — initial-connect failures never trigger failover; finding 017 — `WriteBatchAsync` aborts on mid-batch disconnect. |
| 5 | Security | x | `AutoAcceptUntrustedCerts` defaults to `true`; OPC UA password handling acceptable. See finding 012. | | 5 | Security | x | Re-review: finding 014 — the DCL-012 auto-accept-cert warning is never logged in production (`RealOpcUaClient` built without a logger). |
| 6 | Performance & resource management | x | `HandleUnsubscribe` O(n^2) over instances; initial-read loop serial per tag. | | 6 | Performance & resource management | x | 2026-05-16 finding 008 resolved (reverse index). Re-review: no new issues. |
| 7 | Design-document adherence | x | Failover heuristic (unstable-disconnect count) differs from documented state machine; `WriteTimeout` documented but unused. | | 7 | Design-document adherence | x | 2026-05-16 findings 005/009 resolved. Re-review: no new issues (finding 015 logged under resilience). |
| 8 | Code organization & conventions | x | No issues found — POCOs in Commons, options class owned by component, factory pattern consistent. | | 8 | Code organization & conventions | x | No issues found — POCOs in Commons, options class owned by component, factory pattern consistent. |
| 9 | Testing coverage | x | No tests for tag-resolution retry, disconnect/re-subscribe, bad-quality push, or `HandleSubscribe` concurrency. | | 9 | Testing coverage | x | DCL001013 regression tests present. Re-review: gaps remain for finding 014/015/016 scenarios (no test for production logger wiring, startup failover, or subscribe-response-on-failure). |
| 10 | Documentation & comments | x | XML comment on `RaiseDisconnected` claims thread safety it does not have; design doc round-robin description stale. | | 10 | Documentation & comments | x | 2026-05-16 finding 013 resolved. Re-review: no new issues. |
## Findings ## Findings
@@ -661,3 +681,179 @@ fanning 32 barrier-synchronised threads that raise the client's `ConnectionLost`
simultaneously, and asserts `Disconnected` fires exactly once per round; against a simultaneously, and asserts `Disconnected` fires exactly once per round; against a
non-atomic check-then-set it double-fires (verified by temporarily reverting the non-atomic check-then-set it double-fires (verified by temporarily reverting the
guard), and it passes against the atomic fix. guard), and it passes against the atomic fix.
### DataConnectionLayer-014 — DCL-012 security warning is never logged in production: `RealOpcUaClient` is created without a logger
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs:325`, `src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs:35-39,79-83` |
**Description**
Finding DataConnectionLayer-012 was resolved in part by adding a prominent
`ILogger` warning in `RealOpcUaClient.ConnectAsync` whenever the auto-accept
certificate validator is installed (`RealOpcUaClient.cs:79-83`). The
`ILogger<RealOpcUaClient>` constructor parameter was made optional, defaulting to
`NullLogger<RealOpcUaClient>.Instance` (`RealOpcUaClient.cs:35-39`).
However, the only production code path that constructs a `RealOpcUaClient` is
`RealOpcUaClientFactory.Create()` (`RealOpcUaClient.cs:325`), which calls
`new RealOpcUaClient(_globalOptions)` and passes **no logger**. The factory itself
holds only an `OpcUaGlobalOptions` and has no `ILoggerFactory`/`ILogger` available.
As a result the `_logger` field is always `NullLogger` for every real OPC UA
connection, and the man-in-the-middle warning the DCL-012 fix added is silently
discarded. An operator who deploys a connection with `AutoAcceptUntrustedCerts`
enabled — accepting any server certificate on an industrial control link — gets no
visible signal anywhere in the logs. The in-scope half of DCL-012's resolution is
therefore not actually effective in production; only the unit test
(`DCL012_OpcUaConnectionOptions_AutoAcceptUntrustedCerts_DefaultsToFalse`, which only
checks the default value) passes.
**Recommendation**
Thread a real logger through to `RealOpcUaClient`. `DataConnectionFactory` already
holds an `ILoggerFactory` and constructs `RealOpcUaClientFactory(globalOptions)`
give `RealOpcUaClientFactory` an `ILoggerFactory` (or an `ILogger<RealOpcUaClient>`)
constructor parameter and pass `_loggerFactory.CreateLogger<RealOpcUaClient>()` into
each `new RealOpcUaClient(...)`. Add a test that asserts the warning is emitted on a
real connect with auto-accept enabled (e.g. via a captured `ILogger`), not just that
the default is `false`.
**Resolution**
_Unresolved._
### DataConnectionLayer-015 — Initial-connect failures never trigger failover; an unreachable primary at startup never tries the backup
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:404-417`, `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:419-493` |
**Description**
Failover between the primary and backup endpoints is implemented in two places, both
reachable only after a connection has already been `Connected` at least once:
`HandleReconnectResult` (Reconnecting state) counts `_consecutiveFailures` and switches
endpoint, and `BecomeReconnecting` counts `_consecutiveUnstableDisconnects`.
`HandleConnectResult` — the handler for the *initial* connection attempt in the
`Connecting` state (`DataConnectionActor.cs:404-417`) — does neither. On failure it
only logs and re-arms the reconnect timer with `AttemptConnect`; it never increments
`_consecutiveFailures`, never consults `_backupConfig`, and never switches endpoint.
Consequence: if the primary endpoint is unreachable when the connection actor first
starts — which is the common case after a fresh artifact deployment, a site restart,
or a primary that is simply down at that moment — the actor retries the *primary*
endpoint indefinitely at `ReconnectInterval` and **never** attempts the configured
backup. The design doc's endpoint-redundancy promise ("automatic failover when the
active endpoint becomes unreachable") is silently not honoured for the
never-connected-yet case, and an operator sees a connection stuck `Connecting` forever
despite a healthy backup being configured.
**Recommendation**
Make `HandleConnectResult` participate in the failover counter the same way
`HandleReconnectResult` does: increment `_consecutiveFailures` on failure and, when
`_backupConfig != null && _consecutiveFailures >= _failoverRetryCount`, perform the
endpoint switch (dispose adapter, create the other adapter, bump `_adapterGeneration`,
log the failover event) before re-arming the timer. Alternatively, fold the initial
connect into the same reconnect path so there is a single failover decision point. Add
a regression test for "primary down at startup, backup configured → fails over to
backup".
**Resolution**
_Unresolved._
### DataConnectionLayer-016 — `HandleSubscribeCompleted` reports `SubscribeTagsResponse` success even on a connection-level subscribe failure
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:606,666-672`, `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:232-240` |
**Description**
`HandleSubscribeCompleted` computes `connectionLevelFailure` (line 606) and returns it
so the `Connected`-state handler can drive the actor into `Reconnecting`
(`DataConnectionActor.cs:232-240`). But before returning, it unconditionally replies
to the caller with `new SubscribeTagsResponse(..., true, null, ...)` (lines 666-667) —
`Success: true`, `Error: null` — regardless of whether any tag failed at connection
level.
So when a subscribe arrives while the adapter is silently down, the Instance Actor is
told the subscribe **succeeded**, while the connection actor simultaneously transitions
to `Reconnecting`. The tags were never actually subscribed at the adapter (the catch
block recorded `Success: false`); they are recovered later by `ReSubscribeAll` only if
and when reconnection succeeds. The caller has no way to distinguish "subscribed and
healthy" from "accepted, but the connection is currently down" — a misleading
success signal on a request that did not do what the response claims.
(Genuine tag-resolution failures are arguably also reported as overall `true`, but
that is defensible: those tags are tracked in `_unresolvedTags` and the design models
unresolved tags as a runtime quality concern, with a `Bad`-quality `TagValueUpdate`
already pushed. The connection-level case is the clear defect because the actor itself
treats it as a failure worth a state transition.)
**Recommendation**
When `connectionLevelFailure` is true, reply with
`SubscribeTagsResponse(..., success: false, error: "connection unavailable — will
re-subscribe on reconnect", ...)` (or an equivalent), so the caller's response matches
the actor's own assessment. Optionally carry per-tag outcomes in the response so the
Instance Actor can reflect partial success. Add a test asserting the response is not
`Success: true` when a connection-level subscribe failure drives `Reconnecting`.
**Resolution**
_Unresolved._
### DataConnectionLayer-017 — `WriteBatchAsync` aborts the whole batch on a mid-batch disconnect
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs:229-237`, `src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs:218-227` |
**Description**
`WriteBatchAsync` loops calling `WriteAsync` per tag (`OpcUaDataConnection.cs:229-237`).
`WriteAsync` returns a `WriteResult` for OPC-UA-level write rejections (good — a bad
status does not abort the batch), but it first calls `EnsureConnected()`
(`OpcUaDataConnection.cs:220`), which throws `InvalidOperationException` when the
client is disconnected. `WriteBatchAsync` does not catch that exception, so if the
connection drops partway through a batch the whole `WriteBatchAsync` throws and the
caller gets no result map — losing the per-tag outcomes for the tags that already
wrote. This is the same class of defect that DataConnectionLayer-007 fixed for
`ReadBatchAsync` (which now records a failed `ReadResult` per failing tag and only
propagates `OperationCanceledException`). `WriteBatchAsync` feeds
`WriteBatchAndWaitAsync` (line 246), so a disconnect during a flag-and-wait write
sequence surfaces as an unhandled exception rather than a clean `false`/per-tag result.
Severity is Low because device writes are real-time control operations with no
store-and-forward, the batch write paths are not on the primary `HandleWrite` hot path
(`HandleWrite` calls single-tag `WriteAsync`), and a disconnect mid-batch is itself an
error condition — but the inconsistent error shape (exception vs. per-tag result) is a
maintainability and correctness wart.
**Recommendation**
Mirror the DCL-007 fix: wrap the per-tag `WriteAsync` call in `WriteBatchAsync` in a
try/catch that records a failed `WriteResult(false, ex.Message)` for the failing tag
and continues, while still propagating `OperationCanceledException` to abort a
cancelled batch as a whole. This gives callers (including `WriteBatchAndWaitAsync`) a
complete, consistent result map.
**Resolution**
_Unresolved._

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.DeploymentManager` | | Module | `src/ScadaLink.DeploymentManager` |
| Design doc | `docs/requirements/Component-DeploymentManager.md` | | Design doc | `docs/requirements/Component-DeploymentManager.md` |
| Status | Reviewed | | Status | Reviewed |
| Last reviewed | 2026-05-16 | | Last reviewed | 2026-05-17 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `9c60592` | | Commit reviewed | `39d737e` |
| Open findings | 0 | | Open findings | 3 |
## Summary ## Summary
@@ -30,20 +30,43 @@ detail. Configuration is not bound to `appsettings.json`, leaving one option
entirely dead. Test coverage stops at the communication boundary and never entirely dead. Test coverage stops at the communication boundary and never
exercises a successful deployment or the lifecycle success paths. exercises a successful deployment or the lifecycle success paths.
#### Re-review 2026-05-17 (commit `39d737e`)
Re-reviewed at commit `39d737e` after the batch of fixes for
DeploymentManager-001..014. All fourteen prior findings remain `Resolved` and
verified against source — the broadened catch, non-cancellable cleanup writes,
ref-counted `OperationLockManager`, query-before-redeploy reconciliation,
structured diff, options binding, and the expanded TestKit-actor test suite are
all present and correct. The module is in markedly better shape than the
first review: error paths are now defensively handled and test coverage is
broad (successful deploy/lifecycle, lock serialization, reconciliation
matrix, artifact per-site matrix).
This re-review found **3 new findings**, all clustered on the
DeploymentManager-006 reconciliation path added since the last review. The
reconciliation shortcut (`TryReconcileWithSiteAsync`) marks a stale prior
record `Success` when the site already has the target revision, but it does
**not** perform the side effects the normal success path does — it never
updates the instance `State`, never refreshes the `DeployedConfigSnapshot`,
and never corrects the prior record's own `RevisionHash` (DeploymentManager-015,
DeploymentManager-016). The `GetDeploymentStatusAsync` XML doc is now stale —
it still describes the query-before-redeploy behaviour that actually moved into
`TryReconcileWithSiteAsync` (DeploymentManager-017).
## Checklist coverage ## Checklist coverage
| # | Category | Examined | Notes | | # | Category | Examined | Notes |
|---|----------|----------|-------| |---|----------|----------|-------|
| 1 | Correctness & logic bugs | ✓ | Stuck `InProgress` record on unexpected exception; cancelled-token failure write. | | 1 | Correctness & logic bugs | ✓ | Re-review 2026-05-17: reconciliation skips instance-state/snapshot updates (DeploymentManager-015) and keeps a stale `RevisionHash` (DeploymentManager-016). Prior: stuck `InProgress` / cancelled-token write (resolved). |
| 2 | Akka.NET conventions | ✓ | Module is a plain service layer; it calls `CommunicationService` which wraps Ask. No actors here. No issues. | | 2 | Akka.NET conventions | ✓ | Module is a plain service layer; it calls `CommunicationService` which wraps Ask. No actors here. No issues. |
| 3 | Concurrency & thread safety | ✓ | `OperationLockManager` is sound but leaks semaphores; `DeployToAllSitesAsync` correctly builds commands sequentially before parallel send. | | 3 | Concurrency & thread safety | ✓ | `OperationLockManager` ref-counts and reclaims semaphores; `DeployToAllSitesAsync` correctly builds commands sequentially before parallel send. No issues at re-review. |
| 4 | Error handling & resilience | ✓ | Several gaps — see DeploymentManager-001/002/003/004. | | 4 | Error handling & resilience | ✓ | Prior gaps DeploymentManager-001/002/003/004 resolved and verified. No new issues. |
| 5 | Security | ✓ | SMTP credentials are serialized and broadcast to sites — see DeploymentManager-013. No injection vectors; no authz here (enforced upstream). | | 5 | Security | ✓ | SMTP credential handling documented as an accepted design decision (DeploymentManager-013). No injection vectors; no authz here (enforced upstream). No new issues. |
| 6 | Performance & resource management | ✓ | Semaphore leak (DeploymentManager-005); artifact rebuild does N+1 method queries per external system. | | 6 | Performance & resource management | ✓ | Semaphore leak resolved (DeploymentManager-005). No new issues. |
| 7 | Design-document adherence | ✓ | Missing query-before-redeploy (DeploymentManager-006); Diff View not implemented (DeploymentManager-007). | | 7 | Design-document adherence | ✓ | Query-before-redeploy and Diff View implemented (DeploymentManager-006/007). Re-review: reconciliation path breaks the deployed-snapshot/instance-state invariants — see DeploymentManager-015. |
| 8 | Code organization & conventions | ✓ | Options class not bound to configuration — DeploymentManager-008. POCO/repo placement correct. | | 8 | Code organization & conventions | ✓ | Options binding resolved (DeploymentManager-008). POCO/repo placement correct. No new issues. |
| 9 | Testing coverage | ✓ | No successful-deploy test, no lifecycle success test — DeploymentManager-011; dead `CreateCommand` helper — DeploymentManager-014. | | 9 | Testing coverage | ✓ | Broad coverage added (success, lifecycle, lock serialization, reconciliation, artifact matrix). Re-review: reconciled-success path's missing side effects (DeploymentManager-015) are untested. |
| 10 | Documentation & comments | ✓ | Misleading timeout comment — DeploymentManager-009; stale option XML doc — DeploymentManager-012. | | 10 | Documentation & comments | ✓ | Prior comment findings resolved. Re-review: `GetDeploymentStatusAsync` XML doc is now stale — DeploymentManager-017. |
## Findings ## Findings
@@ -710,3 +733,126 @@ the communication boundary. New tests:
`DeployToAllSitesAsync_AllPerSiteCommandsShareTheSummaryDeploymentId` (also `DeployToAllSitesAsync_AllPerSiteCommandsShareTheSummaryDeploymentId` (also
covers DeploymentManager-010), `DeployToAllSitesAsync_PartialFailure_ReportsPerSiteMatrix` covers DeploymentManager-010), `DeployToAllSitesAsync_PartialFailure_ReportsPerSiteMatrix`
(per-site success/failure matrix), `RetryForSiteAsync_SiteSucceeds_ReturnsSuccessAndAudits`. (per-site success/failure matrix), `RetryForSiteAsync_SiteSucceeds_ReturnsSuccessAndAudits`.
### DeploymentManager-015 — Site-query reconciliation marks a deployment `Success` but skips instance-state and snapshot updates
| | |
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:631-655` |
**Description**
`TryReconcileWithSiteAsync` (the DeploymentManager-006 query-before-redeploy
path) handles the case where a prior `InProgress`/timeout-`Failed` record exists
and the site reports it already has the target revision hash. In that case it
marks the prior `DeploymentRecord` `Success`, audit-logs `DeployReconciled`, and
returns it — the caller then returns `Result.Success` and **never enters the
normal deploy body**.
The normal success path (`DeployInstanceAsync.cs:215-223`) does three things on
a successful site response: writes the deployment record terminal status, sets
`instance.State = InstanceState.Enabled` + `UpdateInstanceAsync`, and calls
`StoreDeployedSnapshotAsync`. The reconciliation shortcut performs only the
first. Consequently, after a reconciled deployment:
- The instance `State` is left at whatever it was (e.g. `NotDeployed` for a
first-time deploy that timed out, or `Disabled`) even though the site is
actually running the configuration — the central state machine and the site
diverge, and a subsequent `DisableInstanceAsync`/`EnableInstanceAsync` will be
rejected or allowed incorrectly by `StateTransitionValidator`.
- No `DeployedConfigSnapshot` is created or refreshed. A first-time deploy that
is resolved purely by reconciliation leaves `GetDeploymentComparisonAsync`
permanently returning `"No deployed snapshot found for this instance."`, and a
redeploy reconciliation leaves the stored snapshot showing the *old* config
even though the deployment record claims `Success` for the new revision.
The design ("Deployed vs. Template-Derived State", WP-4/WP-8) requires the
deployed snapshot and instance state to reflect the last successful deployment;
the reconciliation path silently breaks both invariants.
**Recommendation**
In the reconciled-success branch of `TryReconcileWithSiteAsync`, perform the
same post-success side effects as the normal path: set `instance.State =
InstanceState.Enabled` (+ `UpdateInstanceAsync`) and call
`StoreDeployedSnapshotAsync` with the target deployment ID / revision hash /
config JSON. Factor the shared post-success logic into one helper so the normal
and reconciliation paths cannot drift. Add a regression test asserting that a
reconciled deployment leaves the instance `Enabled` and a snapshot stored.
**Resolution**
_Unresolved._
### DeploymentManager-016 — Reconciled prior record keeps its stale `RevisionHash`
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:639-651` |
**Description**
When `TryReconcileWithSiteAsync` reconciles a prior record, it mutates
`prior.Status`, `prior.ErrorMessage`, and `prior.CompletedAt`, but **not**
`prior.RevisionHash`. The reconciliation condition only compares the *site's*
`AppliedRevisionHash` against the *freshly-flattened* `targetRevisionHash` — it
does not require `prior.RevisionHash` to equal either of them.
The prior record can legitimately carry a different revision hash than the
current target: e.g. a deploy timed out at revision `R1`, the template was then
edited so the current flatten yields `R2`, and meanwhile the site actually
applied `R2` through some other path (or `R1` and `R2` are equal-by-content but
the prior record predates a hash recompute). After reconciliation the record's
`Status` is `Success` but its `RevisionHash` still says `R1`, so staleness
checks and any UI that reads `DeploymentRecord.RevisionHash` will report the
instance as deployed at the wrong revision. The audit `DeployReconciled` entry
records `RevisionHash = targetRevisionHash`, contradicting the persisted record.
**Recommendation**
In the reconciled-success branch, also set `prior.RevisionHash =
targetRevisionHash` so the persisted record, the audit entry, and the site's
actual applied revision all agree. Alternatively, only reconcile when
`prior.RevisionHash == targetRevisionHash` and otherwise fall through to a
normal deploy.
**Resolution**
_Unresolved._
### DeploymentManager-017 — `GetDeploymentStatusAsync` XML doc describes behaviour it does not implement
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:562-570` |
**Description**
The XML summary on `GetDeploymentStatusAsync` reads: *"WP-2: After
failover/timeout, query site for current deployment state before
re-deploying."* The method body does no such thing — it is a one-line
pass-through to `_repository.GetDeploymentByDeploymentIdAsync`, a pure local DB
read. The query-the-site-before-redeploy behaviour the comment describes was
implemented separately in `TryReconcileWithSiteAsync` (DeploymentManager-006).
The stale comment is a leftover of the original design intent and misleads a
reader into thinking this method contacts the site.
**Recommendation**
Reword the summary to describe what the method actually does — "returns the
current persisted `DeploymentRecord` for the given deployment ID from the
configuration database" — and, if useful, cross-reference
`TryReconcileWithSiteAsync` as the place the site-query reconciliation lives.
**Resolution**
_Unresolved._

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.ExternalSystemGateway` | | Module | `src/ScadaLink.ExternalSystemGateway` |
| Design doc | `docs/requirements/Component-ExternalSystemGateway.md` | | Design doc | `docs/requirements/Component-ExternalSystemGateway.md` |
| Status | Reviewed | | Status | Reviewed |
| Last reviewed | 2026-05-16 | | Last reviewed | 2026-05-17 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `9c60592` | | Commit reviewed | `39d737e` |
| Open findings | 0 | | Open findings | 3 |
## Summary ## Summary
@@ -30,20 +30,41 @@ Test coverage is thin — `CachedCall` transient/buffering paths and `DatabaseGa
are entirely untested. Themes: incomplete wiring against the S&F engine, and design-doc are entirely untested. Themes: incomplete wiring against the S&F engine, and design-doc
requirements (timeout, retry settings) that are declared but not implemented. requirements (timeout, retry settings) that are declared but not implemented.
#### Re-review 2026-05-17 (commit `39d737e`)
All fourteen prior findings remain `Resolved`; the resolutions for findings 001014
were spot-checked against the current source and hold. The re-review walked the full
10-category checklist again and surfaced **three new findings**. The most serious
(`ExternalSystemGateway-015`, High) is a regression *introduced by* the
`ExternalSystemGateway-004` resolution: `CachedCall`/`CachedWrite` now pass a
per-system/per-connection `MaxRetries` of `0` through verbatim, but
`StoreAndForwardService.RetryMessageAsync` interprets a stored `MaxRetries` of `0` as
**retry forever**, not "never retry" — so the very `0` the ESG-004 fix claims to
"honour as never retry" actually produces an unbounded retry loop, and two ESG tests
assert the broken behaviour. `ExternalSystemGateway-016` (Medium) is that the
`ExternalSystemGateway-013` resolution used `ConfigureHttpClientDefaults`, which is a
**process-global** registration — it forces a `SocketsHttpHandler` (capped at the ESG
option) onto every `HttpClient` in the host, including the Notification Service's
OAuth2 token client, not just the gateway's per-system clients. `ExternalSystemGateway-017`
(Low) is a trailing-`?` URL nit when a GET method's parameters are all null. Theme:
both substantive findings are second-order defects in earlier fixes — the earlier
resolutions did not verify the downstream contract of the S&F engine they integrate
with.
## Checklist coverage ## Checklist coverage
| # | Category | Examined | Notes | | # | Category | Examined | Notes |
|---|----------|----------|-------| |---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | URL building edge cases, dropped S&F result, classification gaps — findings 003, 006, 009. | | 1 | Correctness & logic bugs | ☑ | Prior: URL edge cases, dropped S&F result, classification — 003, 006, 009. Re-review: `MaxRetries == 0` retry-forever vs never-retry contradiction (015); trailing-`?` URL nit (017). |
| 2 | Akka.NET conventions | ☑ | No actors in this module; `AddExternalSystemGatewayActors` is a no-op. Blocking-I/O isolation is delegated to Site Runtime. No issues found in this module. | | 2 | Akka.NET conventions | ☑ | No actors in this module; `AddExternalSystemGatewayActors` is a no-op. Blocking-I/O isolation is delegated to Site Runtime. No issues found. |
| 3 | Concurrency & thread safety | ☑ | Services are stateless and DI-scoped; `ExternalCallResult.Response` lazy-parse is not thread-safe but instances are single-use. No findings raised. | | 3 | Concurrency & thread safety | ☑ | Services are stateless and DI-scoped; the S&F delivery handlers resolve in a fresh DI scope on the sweep thread. No findings raised. |
| 4 | Error handling & resilience | ☑ | S&F handler never registered, double-dispatch, timeout not applied, cancellation conflation — findings 001, 002, 003, 008. | | 4 | Error handling & resilience | ☑ | Prior: handler registration, double-dispatch, timeout, cancellation — 001, 002, 003, 008. Re-review: the unbounded-retry consequence of finding 015 is also a resilience defect (recorded under category 1). |
| 5 | Security | ☑ | Auth secrets logged-safe, but error bodies echoed verbatim — finding 007. | | 5 | Security | ☑ | Error bodies now truncated (007). No new issues — auth secrets not logged, body capped. |
| 6 | Performance & resource management | ☑ | `HttpRequestMessage`/`HttpResponseMessage` and failed `SqlConnection` not disposed; full repository scan per call — findings 005, 010, 011. | | 6 | Performance & resource management | ☑ | Prior: disposal and repository-scan findings 005, 010, 011 — all resolved and verified. No new issues. |
| 7 | Design-document adherence | ☑ | Timeout, retry settings, audit logging gaps — findings 002, 004, 012. | | 7 | Design-document adherence | ☑ | Prior: timeout, retry settings, logging — 002, 004, 012. Re-review: finding 015 is also a design-adherence gap (S&F retry contract); recorded under category 1. |
| 8 | Code organization & conventions | ☑ | Options class correctly owned by module; `MaxConcurrentConnectionsPerSystem` unused — finding 013. | | 8 | Code organization & conventions | ☑ | Prior: `MaxConcurrentConnectionsPerSystem` wiring — 013. Re-review: that wiring uses process-global `ConfigureHttpClientDefaults`, leaking the ESG cap onto every host `HttpClient` — finding 016. |
| 9 | Testing coverage | ☑ | CachedCall buffering and DatabaseGateway untested — finding 014. | | 9 | Testing coverage | ☑ | Coverage is broad after finding 014. Re-review note: the `ZeroMaxRetries...` tests assert the persisted column, not the sweep outcome, and so lock in the finding-015 defect. |
| 10 | Documentation & comments | ☑ | XML docs reference WP numbers; permanent-failure logging requirement unverified — folded into finding 012. | | 10 | Documentation & comments | ☑ | Inline comments at `ExternalSystemClient.cs:118-119` / `DatabaseGateway.cs:99-101` assert a "never retry" semantic that the code does not deliver — see finding 015. |
## Findings ## Findings
@@ -760,3 +781,144 @@ headers and body so URL/auth/body construction is now verified, not just status
These are new-coverage tests against already-correct behaviour, so they pass on the These are new-coverage tests against already-correct behaviour, so they pass on the
current source; the `BuildUrl` and `ApplyAuth` paths they exercise are now protected current source; the `BuildUrl` and `ApplyAuth` paths they exercise are now protected
against regression. against regression.
### ExternalSystemGateway-015 — `MaxRetries == 0` is buffered as "retry forever", contradicting the ExternalSystemGateway-004 "never retry" claim
| | |
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:120-127`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:102-108` |
**Description**
The `ExternalSystemGateway-004` fix removed the `MaxRetries > 0 ? ... : null` guard so
that `CachedCallAsync` and `CachedWriteAsync` now pass the definition's `MaxRetries`
to `StoreAndForwardService.EnqueueAsync` verbatim. The stated rationale (and the inline
comments at `ExternalSystemClient.cs:118-119` / `DatabaseGateway.cs:99-101`, plus the
tests `CachedCall_TransientFailure_ZeroMaxRetriesIsHonouredNotTreatedAsUnset` and
`CachedWrite_ZeroMaxRetriesIsHonouredNotTreatedAsUnset`) is that a configured
`MaxRetries` of `0` means **"never retry"**.
That is the opposite of what the Store-and-Forward engine actually does with the
value. `EnqueueAsync` stores the passed `maxRetries` directly into
`StoreAndForwardMessage.MaxRetries` (`StoreAndForwardService.cs:139`), whose own XML
doc states **"Maximum retry-sweep attempts before parking (0 = no limit)"**
(`StoreAndForwardMessage.cs:30`). The retry sweep enforces it as
`if (message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries)`
(`StoreAndForwardService.cs:285`) — when `MaxRetries == 0` that guard is false on
every sweep, so the message is **never parked and is retried forever**.
Consequences for a `CachedCall`/`CachedWrite` against a system or connection
configured with `MaxRetries = 0`:
1. A transiently-failing message that the operator intended never to retry is instead
retried on every sweep indefinitely, accumulating in the buffer and repeatedly
re-dispatching the request — the exact unbounded-retry / duplicate-delivery hazard
the idempotency note in the design doc warns about.
2. The two ESG regression tests cited above assert `max_retries == 0` is *stored* and
describe that as "honoured" — they verify the persisted column, never the resulting
sweep behaviour, so they lock in the broken semantics.
3. Because the SiteRuntime repository still always supplies `MaxRetries == 0` (the
open companion gap noted in `ExternalSystemGateway-004`), **every** cached call and
cached write at every site currently buffers as retry-forever. Before the ESG-004
fix the `> 0` guard sent `null`, so the S&F `DefaultMaxRetries` (a bounded value)
applied — i.e. the ESG-004 fix turned a bounded retry into an unbounded one for the
common path.
**Recommendation**
Reconcile the ESG and S&F interpretations of `MaxRetries == 0` — they must agree.
Either: (a) treat the entity's `MaxRetries == 0` as "unset" and pass `null` so the
bounded S&F default applies (reverting to the pre-ESG-004 behaviour, and accepting
that "never retry" then needs a different representation such as a nullable field or a
`MaxRetries == 1` convention); or (b) if `0` genuinely must mean "never retry", add an
explicit no-retry path — e.g. do not enqueue at all on transient failure when
`MaxRetries == 0`, or introduce a distinct sentinel — and fix the
`StoreAndForwardMessage.MaxRetries` doc and `RetryMessageAsync` guard so `0` no longer
means "no limit". Update the two `ZeroMaxRetries...` tests to assert the *sweep*
outcome (parked / not retried), not just the stored column value.
**Resolution**
_Unresolved._
### ExternalSystemGateway-016 — `ConfigureHttpClientDefaults` applies the ESG connection cap to every `HttpClient` in the host process
| | |
|--|--|
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs:21-29` |
**Description**
The `ExternalSystemGateway-013` fix wires `MaxConcurrentConnectionsPerSystem` by
calling `services.ConfigureHttpClientDefaults(builder => builder.ConfigurePrimaryHttpMessageHandler(...))`.
The inline comment claims this "applies to the dynamically-named clients created by
`ExternalSystemClient`" — but `ConfigureHttpClientDefaults` is **process-global**: it
adds the configuration to *every* `HttpClient`/`IHttpClientFactory` client created
anywhere in the host, regardless of name.
The Host registers the External System Gateway alongside other components that also
use `IHttpClientFactory` — notably `ScadaLink.NotificationService` (`OAuth2TokenService`
and its `ServiceCollectionExtensions` call `AddHttpClient`). With the ESG registration
present, the OAuth2 token client (and any future `HttpClient` consumer in the host)
has its **primary handler replaced** by a `SocketsHttpHandler` whose
`MaxConnectionsPerServer` is the ESG's `MaxConcurrentConnectionsPerSystem`. That:
1. Silently caps unrelated clients at a value owned by, and named for, a different
component — an operator tuning the ESG option unknowingly throttles Microsoft 365
OAuth2 token acquisition.
2. Overrides/discards any primary-handler configuration those other components add for
their own clients (e.g. a custom handler, proxy, or certificate settings).
This is a leaky, surprising side effect for what the option claims to be a per-ESG
setting; `ConfigureHttpClientDefaults` should not be used to express a single
component's policy.
**Recommendation**
Scope the handler configuration to the gateway's own clients only. The ESG already
creates per-system clients with a deterministic name pattern
(`ExternalSystem_{system.Name}`); register a typed/named `HttpClient` (or a small
factory abstraction) for that pattern and call `ConfigurePrimaryHttpMessageHandler`
on that registration instead of on the global defaults. If a name-pattern registration
is impractical, document the global effect explicitly and rename the option, but the
preferred fix is to stop using `ConfigureHttpClientDefaults`.
**Resolution**
_Unresolved._
### ExternalSystemGateway-017 — `BuildUrl` appends a bare trailing `?` when a GET method's parameters are all null
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:324-333` |
**Description**
In `BuildUrl`, the GET/DELETE query-string branch is entered when
`parameters != null && parameters.Count > 0`, but the projection then filters out
null-valued entries (`parameters.Where(p => p.Value != null)`). When a GET/DELETE
method is invoked with a non-empty parameter dictionary whose values are *all* null,
`queryString` is the empty string and the code still executes `url += "?" + queryString`,
producing a URL ending in a bare `?` (e.g. `https://host/api/resource?`). Most servers
tolerate a trailing `?`, but it is an unintended artifact, can defeat response caching
on some endpoints, and makes captured request URLs harder to read in logs.
**Recommendation**
Only append `"?" + queryString` when `queryString` is non-empty (compute the joined
string first and check it), so a method whose effective parameter set is empty
produces a clean URL identical to the no-parameters case.
**Resolution**
_Unresolved._

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.HealthMonitoring` | | Module | `src/ScadaLink.HealthMonitoring` |
| Design doc | `docs/requirements/Component-HealthMonitoring.md` | | Design doc | `docs/requirements/Component-HealthMonitoring.md` |
| Status | Reviewed | | Status | Reviewed |
| Last reviewed | 2026-05-16 | | Last reviewed | 2026-05-17 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `9c60592` | | Commit reviewed | `39d737e` |
| Open findings | 0 | | Open findings | 4 |
## Summary ## Summary
@@ -32,20 +32,38 @@ heartbeat path, and most collector setters. None of the findings are crash-class
but the concurrency issues are Medium/High and the missing S&F metric is a real but the concurrency issues are Medium/High and the missing S&F metric is a real
design-adherence gap. design-adherence gap.
#### Re-review 2026-05-17 (commit `39d737e`)
All twelve prior findings (HealthMonitoring-001..012) are confirmed `Resolved`
`SiteHealthState` is now an immutable `sealed record` mutated only via atomic
compare-and-swap, the store-and-forward buffer-depth metric is populated, the
central-site offline grace and the unknown-site heartbeat registration are in
place, and the test suite has grown to cover the report loop, heartbeat path, and
collector setters. This re-review found **4 new findings, all Low/Medium, none
crash-class**. They are residual polish items rather than behaviour regressions:
an inaccurate offline-check-interval comment (HealthMonitoring-013), unvalidated
`HealthMonitoringOptions` intervals that crash the hosted service on
misconfiguration (HealthMonitoring-014), a heartbeat-only registered site left
with a year-0001 `LastReportReceivedAt` that the UI's staleness display must
special-case (HealthMonitoring-015), and `CollectReport` reading
`DateTimeOffset.UtcNow` directly instead of the module's now-standard injected
`TimeProvider` (HealthMonitoring-016). The module remains small, readable, and
broadly faithful to the design intent.
## Checklist coverage ## Checklist coverage
| # | Category | Examined | Notes | | # | Category | Examined | Notes |
|---|----------|----------|-------| |---|----------|----------|-------|
| 1 | Correctness & logic bugs | x | `MarkHeartbeat` drops heartbeats for unregistered sites (HealthMonitoring-007); central self-report has no heartbeat grace (HealthMonitoring-005). | | 1 | Correctness & logic bugs | x | `MarkHeartbeat` drops heartbeats for unregistered sites (HealthMonitoring-007); central self-report has no heartbeat grace (HealthMonitoring-005). Re-review: heartbeat-registered site left with year-0001 `LastReportReceivedAt` (HealthMonitoring-015). |
| 2 | Akka.NET conventions | x | Module itself contains no actors (transport abstracted via `IHealthReportTransport`); `AddHealthMonitoringActors` is a dead placeholder (HealthMonitoring-011). Actor-side wiring lives in Communication and is out of scope. | | 2 | Akka.NET conventions | x | Module itself contains no actors (transport abstracted via `IHealthReportTransport`); `AddHealthMonitoringActors` is a dead placeholder (HealthMonitoring-011). Actor-side wiring lives in Communication and is out of scope. |
| 3 | Concurrency & thread safety | x | Unguarded mutable `SiteHealthState` (HealthMonitoring-002); mutation inside `AddOrUpdate` delegate (HealthMonitoring-003); `GetAllSiteStates` leaks live mutable references (HealthMonitoring-008). Collector counters correctly use `Interlocked`. | | 3 | Concurrency & thread safety | x | Unguarded mutable `SiteHealthState` (HealthMonitoring-002); mutation inside `AddOrUpdate` delegate (HealthMonitoring-003); `GetAllSiteStates` leaks live mutable references (HealthMonitoring-008). Collector counters correctly use `Interlocked`. |
| 4 | Error handling & resilience | x | `HealthReportSender` silently swallows inner failures with bare `catch {}` (HealthMonitoring-010); top-level loop error handling is sound. | | 4 | Error handling & resilience | x | `HealthReportSender` silently swallows inner failures with bare `catch {}` (HealthMonitoring-010, resolved); top-level loop error handling is sound. Re-review: `HealthMonitoringOptions` intervals unvalidated — a zero/negative value crashes the hosted service at `PeriodicTimer` construction (HealthMonitoring-014). |
| 5 | Security | x | No issues found. Module handles only numeric/string operational metrics, no secrets, no external input parsing, no auth surface. | | 5 | Security | x | No issues found. Module handles only numeric/string operational metrics, no secrets, no external input parsing, no auth surface. |
| 6 | Performance & resource management | x | `PeriodicTimer` instances correctly disposed via `using`. Dictionary snapshots per report are acceptable at the documented scale. No issues found. | | 6 | Performance & resource management | x | `PeriodicTimer` instances correctly disposed via `using`. Dictionary snapshots per report are acceptable at the documented scale. No issues found. |
| 7 | Design-document adherence | x | Store-and-forward buffer depth metric unimplemented (HealthMonitoring-001); sequence seeding deviates from doc's "starting at 1" wording (HealthMonitoring-006). | | 7 | Design-document adherence | x | Store-and-forward buffer depth metric unimplemented (HealthMonitoring-001); sequence seeding deviates from doc's "starting at 1" wording (HealthMonitoring-006). |
| 8 | Code organization & conventions | x | Options class correctly owned by the component; POCO/messages in Commons. Dead placeholder method noted (HealthMonitoring-011). | | 8 | Code organization & conventions | x | Options class correctly owned by the component; POCO/messages in Commons. Dead placeholder method noted (HealthMonitoring-011, resolved). Re-review: `SiteHealthCollector.CollectReport` reads `DateTimeOffset.UtcNow` directly instead of the module's now-standard injected `TimeProvider` (HealthMonitoring-016). |
| 9 | Testing coverage | x | No tests for `CentralHealthReportLoop`, `MarkHeartbeat`, offline-via-heartbeat, replica idempotency, or most collector setters (HealthMonitoring-009). | | 9 | Testing coverage | x | No tests for `CentralHealthReportLoop`, `MarkHeartbeat`, offline-via-heartbeat, replica idempotency, or most collector setters (HealthMonitoring-009). |
| 10 | Documentation & comments | x | Heartbeat interval is described inconsistently (~2s vs ~5s) across XML docs (HealthMonitoring-004); `LatestReport = null!` misrepresents the contract (HealthMonitoring-012). | | 10 | Documentation & comments | x | Heartbeat interval is described inconsistently (~2s vs ~5s) across XML docs (HealthMonitoring-004, resolved); `LatestReport = null!` misrepresents the contract (HealthMonitoring-012, resolved). Re-review: offline-check-interval comment claims "(shorter)" timeout but code only uses `OfflineTimeout` (HealthMonitoring-013). |
## Findings ## Findings
@@ -560,3 +578,148 @@ has not yet sent a report"). A codebase-wide search confirms no `null!` suppress
remains anywhere in `src/ScadaLink.HealthMonitoring`. This is exactly the change remains anywhere in `src/ScadaLink.HealthMonitoring`. This is exactly the change
HealthMonitoring-002 made when converting `SiteHealthState` to an immutable record, so HealthMonitoring-002 made when converting `SiteHealthState` to an immutable record, so
the contract is now honest and no further code change was required. the contract is now honest and no further code change was required.
### HealthMonitoring-013 — Offline-check interval comment claims "shorter timeout" but only ever uses `OfflineTimeout`
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:194-196` |
**Description**
`ExecuteAsync` derives the `PeriodicTimer` cadence with the comment "Check at half
the (shorter) offline timeout interval for timely detection", but the code only
reads `_options.OfflineTimeout`:
```csharp
var checkInterval = TimeSpan.FromMilliseconds(_options.OfflineTimeout.TotalMilliseconds / 2);
```
`CentralOfflineTimeout` (HealthMonitoring-005's fix) is never considered. With the
default options (`OfflineTimeout` 60s, `CentralOfflineTimeout` 3m) `OfflineTimeout`
genuinely is the shorter of the two, so the parenthetical happens to hold. But the
comment states an invariant the code does not enforce: if an operator configures
`CentralOfflineTimeout` *smaller* than `OfflineTimeout`, the check cadence stays
tied to `OfflineTimeout`, and central offline detection is delayed by up to a full
`OfflineTimeout / 2` beyond the intended `CentralOfflineTimeout` window. The comment
misleads a reader into believing the cadence already adapts to whichever timeout is
shorter.
**Recommendation**
Either compute `checkInterval` from `Math.Min(OfflineTimeout, CentralOfflineTimeout)`
so the code matches the comment, or drop the "(shorter)" wording and state plainly
that the cadence is derived from `OfflineTimeout` only (acceptable while the default
`CentralOfflineTimeout` is the larger value).
**Resolution**
_Unresolved._
### HealthMonitoring-014 — `HealthMonitoringOptions` intervals are unvalidated; a zero/negative value crashes the hosted service
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.HealthMonitoring/HealthMonitoringOptions.cs:3-20`, `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:196`, `src/ScadaLink.HealthMonitoring/HealthReportSender.cs:67`, `src/ScadaLink.HealthMonitoring/CentralHealthReportLoop.cs:63` |
**Description**
`HealthMonitoringOptions` is bound from the `ScadaLink:HealthMonitoring` config
section (`SiteServiceRegistration.BindSharedOptions`) with no validation —
no `IValidateOptions<HealthMonitoringOptions>`, no `ValidateDataAnnotations`, no
`ValidateOnStart`. `ReportInterval`, `OfflineTimeout`, and `CentralOfflineTimeout`
are all fed straight into `new PeriodicTimer(...)` (and `OfflineTimeout` into a
division for the check interval). `PeriodicTimer`'s constructor throws
`ArgumentOutOfRangeException` for a zero or negative period. A misconfigured
`appsettings.json` (e.g. `"ReportInterval": "00:00:00"`, an empty/garbled value
that binds to `TimeSpan.Zero`, or a negative span) therefore crashes the
`HealthReportSender` / `CentralHealthReportLoop` / `CentralHealthAggregator`
hosted service at startup with an opaque exception that does not name the
offending config key, rather than failing fast with a clear validation message.
**Recommendation**
Add an options validator (DataAnnotations `[Range]`-style on the spans, or an
`IValidateOptions<HealthMonitoringOptions>`) that rejects non-positive
`ReportInterval`/`OfflineTimeout`/`CentralOfflineTimeout` and ideally requires
`CentralOfflineTimeout >= OfflineTimeout`, and call `.ValidateOnStart()` so a bad
configuration fails fast with a message naming the section and key.
**Resolution**
_Unresolved._
### HealthMonitoring-015 — Heartbeat-registered site is left with a year-0001 `LastReportReceivedAt`
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:122-130`, `src/ScadaLink.HealthMonitoring/SiteHealthState.cs:27` |
**Description**
When `MarkHeartbeat` registers a previously-unknown site (HealthMonitoring-007's
fix), it sets `LastReportReceivedAt = default` — i.e. `DateTimeOffset.MinValue`
(`0001-01-01`). The XML doc on `SiteHealthState.LastReportReceivedAt` states the
field is "Used by the UI to surface report staleness during failover." A
heartbeat-only site therefore has `LatestReport == null` **and**
`LastReportReceivedAt == DateTimeOffset.MinValue`. Any UI code that computes
"last report N ago" as `now - LastReportReceivedAt` without first checking
`LatestReport != null` will render a nonsensical staleness of roughly two
thousand years for a site that is, in fact, freshly reachable. The two
"no report yet" signals (`LatestReport == null`, `LastReportReceivedAt == default`)
are independent and both must be special-cased; the sentinel value is an easy trap.
**Recommendation**
Make `LastReportReceivedAt` nullable (`DateTimeOffset?`) so "no report received
yet" is an explicit, unmissable state rather than a magic sentinel — consistent
with how `LatestReport` was already made nullable for the same case — and have UI
consumers render staleness only when it has a value. Alternatively, document the
`default` sentinel prominently on the field and audit every UI reader, but the
nullable option is safer and matches the existing `LatestReport` treatment.
**Resolution**
_Unresolved._
### HealthMonitoring-016 — `SiteHealthCollector.CollectReport` reads `DateTimeOffset.UtcNow` directly instead of an injected `TimeProvider`
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.HealthMonitoring/SiteHealthCollector.cs:151` |
**Description**
`CollectReport` stamps each report with `ReportTimestamp: DateTimeOffset.UtcNow`,
read directly from the system clock. Every other time-dependent class in the
module — `CentralHealthAggregator`, `HealthReportSender`, `CentralHealthReportLoop`
— was deliberately refactored (HealthMonitoring-006) to take an injectable
`TimeProvider` so the behaviour is deterministically testable and the clock
dependency is explicit. `SiteHealthCollector` is the lone holdout: the report
timestamp cannot be controlled in a unit test, which is why
`SiteHealthCollectorTests.CollectReport_IncludesUtcTimestamp` can only assert the
timestamp falls in a `before`/`after` wall-clock window rather than equalling a
known instant. This is a minor consistency/testability gap, not a behaviour bug.
**Recommendation**
Add an optional `TimeProvider` constructor parameter to `SiteHealthCollector`
(defaulting to `TimeProvider.System`, mirroring the other classes) and derive
`ReportTimestamp` from `GetUtcNow()`, so the report timestamp is deterministically
testable and the module is consistent.
**Resolution**
_Unresolved._

View File

@@ -40,10 +40,10 @@ module file and counted in **Total**.
| Severity | Open findings | | Severity | Open findings |
|----------|---------------| |----------|---------------|
| Critical | 0 | | Critical | 0 |
| High | 2 | | High | 5 |
| Medium | 5 | | Medium | 12 |
| Low | 10 | | Low | 17 |
| **Total** | **17** | | **Total** | **34** |
## Module Status ## Module Status
@@ -54,11 +54,11 @@ module file and counted in **Total**.
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-16 | `9c60592` | 0/0/1/1 | 2 | 10 | | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-16 | `9c60592` | 0/0/1/1 | 2 | 10 |
| [Commons](Commons/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/2 | 2 | 14 | | [Commons](Commons/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/2 | 2 | 14 |
| [Communication](Communication/findings.md) | 2026-05-16 | `9c60592` | 0/1/1/2 | 4 | 15 | | [Communication](Communication/findings.md) | 2026-05-16 | `9c60592` | 0/1/1/2 | 4 | 15 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 11 | | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/1 | 3 | 14 |
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 13 | | [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-16 | `9c60592` | 0/1/2/1 | 4 | 17 |
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 14 | | [DeploymentManager](DeploymentManager/findings.md) | 2026-05-16 | `9c60592` | 0/1/1/1 | 3 | 17 |
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 14 | | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-16 | `9c60592` | 0/1/1/1 | 3 | 17 |
| [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 12 | | [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-16 | `9c60592` | 0/0/1/3 | 4 | 16 |
| [Host](Host/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 11 | | [Host](Host/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 11 |
| [InboundAPI](InboundAPI/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 13 | | [InboundAPI](InboundAPI/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 13 |
| [ManagementService](ManagementService/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 13 | | [ManagementService](ManagementService/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 13 |
@@ -80,14 +80,17 @@ description, location, recommendation — lives in the module's `findings.md`.
_None open._ _None open._
### High (2) ### High (5)
| ID | Module | Title | | ID | Module | Title |
|----|--------|-------| |----|--------|-------|
| CentralUI-020 | [CentralUI](CentralUI/findings.md) | Idle-session redirect never fires: `SessionExpiry` polls a frozen auth-state snapshot | | CentralUI-020 | [CentralUI](CentralUI/findings.md) | Idle-session redirect never fires: `SessionExpiry` polls a frozen auth-state snapshot |
| Communication-012 | [Communication](Communication/findings.md) | gRPC client factory ignores the endpoint on a cache hit, breaking NodeA→NodeB stream failover | | Communication-012 | [Communication](Communication/findings.md) | gRPC client factory ignores the endpoint on a cache hit, breaking NodeA→NodeB stream failover |
| DataConnectionLayer-014 | [DataConnectionLayer](DataConnectionLayer/findings.md) | DCL-012 security warning is never logged in production: `RealOpcUaClient` is created without a logger |
| DeploymentManager-015 | [DeploymentManager](DeploymentManager/findings.md) | Site-query reconciliation marks a deployment `Success` but skips instance-state and snapshot updates |
| ExternalSystemGateway-015 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `MaxRetries == 0` is buffered as "retry forever", contradicting the ExternalSystemGateway-004 "never retry" claim |
### Medium (5) ### Medium (12)
| ID | Module | Title | | ID | Module | Title |
|----|--------|-------| |----|--------|-------|
@@ -96,8 +99,15 @@ _None open._
| CentralUI-022 | [CentralUI](CentralUI/findings.md) | `Deployments` push handler fires `InvokeAsync` with no disposal guard | | CentralUI-022 | [CentralUI](CentralUI/findings.md) | `Deployments` push handler fires `InvokeAsync` with no disposal guard |
| ClusterInfrastructure-009 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | `DownIfAlone` is an inert configuration knob — never consumed by the HOCON builder | | ClusterInfrastructure-009 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | `DownIfAlone` is an inert configuration knob — never consumed by the HOCON builder |
| Communication-013 | [Communication](Communication/findings.md) | Site gRPC address changes are never applied; `RemoveSiteAsync` has no production caller | | Communication-013 | [Communication](Communication/findings.md) | Site gRPC address changes are never applied; `RemoveSiteAsync` has no production caller |
| ConfigurationDatabase-012 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | Inbound-API `ApiKey.KeyValue` bearer credential stored in plaintext |
| ConfigurationDatabase-013 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | Secret-column encryption silently falls back to an ephemeral (throwaway) key |
| DataConnectionLayer-015 | [DataConnectionLayer](DataConnectionLayer/findings.md) | Initial-connect failures never trigger failover; an unreachable primary at startup never tries the backup |
| DataConnectionLayer-016 | [DataConnectionLayer](DataConnectionLayer/findings.md) | `HandleSubscribeCompleted` reports `SubscribeTagsResponse` success even on a connection-level subscribe failure |
| DeploymentManager-016 | [DeploymentManager](DeploymentManager/findings.md) | Reconciled prior record keeps its stale `RevisionHash` |
| ExternalSystemGateway-016 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `ConfigureHttpClientDefaults` applies the ESG connection cap to every `HttpClient` in the host process |
| HealthMonitoring-015 | [HealthMonitoring](HealthMonitoring/findings.md) | Heartbeat-registered site is left with a year-0001 `LastReportReceivedAt` |
### Low (10) ### Low (17)
| ID | Module | Title | | ID | Module | Title |
|----|--------|-------| |----|--------|-------|
@@ -111,3 +121,10 @@ _None open._
| Commons-014 | [Commons](Commons/findings.md) | `OpcUaEndpointConfigSerializer.Deserialize` can mislabel a corrupt typed row as `Legacy` | | Commons-014 | [Commons](Commons/findings.md) | `OpcUaEndpointConfigSerializer.Deserialize` can mislabel a corrupt typed row as `Legacy` |
| Communication-014 | [Communication](Communication/findings.md) | Untrusted gRPC `correlation_id` flows directly into an Akka actor name | | Communication-014 | [Communication](Communication/findings.md) | Untrusted gRPC `correlation_id` flows directly into an Akka actor name |
| Communication-015 | [Communication](Communication/findings.md) | No test exercises the real gRPC client factory across a node flip | | Communication-015 | [Communication](Communication/findings.md) | No test exercises the real gRPC client factory across a node flip |
| ConfigurationDatabase-014 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | Redundant, inconsistent cast on one `HasConversion` call |
| DataConnectionLayer-017 | [DataConnectionLayer](DataConnectionLayer/findings.md) | `WriteBatchAsync` aborts the whole batch on a mid-batch disconnect |
| DeploymentManager-017 | [DeploymentManager](DeploymentManager/findings.md) | `GetDeploymentStatusAsync` XML doc describes behaviour it does not implement |
| ExternalSystemGateway-017 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `BuildUrl` appends a bare trailing `?` when a GET method's parameters are all null |
| HealthMonitoring-013 | [HealthMonitoring](HealthMonitoring/findings.md) | Offline-check interval comment claims "shorter timeout" but only ever uses `OfflineTimeout` |
| HealthMonitoring-014 | [HealthMonitoring](HealthMonitoring/findings.md) | `HealthMonitoringOptions` intervals are unvalidated; a zero/negative value crashes the hosted service |
| HealthMonitoring-016 | [HealthMonitoring](HealthMonitoring/findings.md) | `SiteHealthCollector.CollectReport` reads `DateTimeOffset.UtcNow` directly instead of an injected `TimeProvider` |