From f93b7b99bb92f290dd0b45076013b887bad145cf Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 28 May 2026 02:55:47 -0400 Subject: [PATCH] code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Re-applies the full 10-category checklist to every src/ project — including first-time reviews of the four newer components (AuditLog, NotificationOutbox, SiteCallAudit, Transport) — so the code-reviews/ index reflects today's codebase rather than the 2026-05-16 baseline. 172 new Open findings (0 Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules. regen-readme.py now derives each module's Last reviewed + Commit from its findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future single-module re-reviews show their own date in the Module Status table. --- code-reviews/AuditLog/findings.md | 500 ++++++++++++++++ code-reviews/CLI/findings.md | 332 ++++++++++- code-reviews/CentralUI/findings.md | 330 ++++++++++- .../ClusterInfrastructure/findings.md | 236 +++++++- code-reviews/Commons/findings.md | 469 ++++++++++++++- code-reviews/Communication/findings.md | 338 ++++++++++- .../ConfigurationDatabase/findings.md | 482 ++++++++++++++- code-reviews/DataConnectionLayer/findings.md | 322 ++++++++++- code-reviews/DeploymentManager/findings.md | 348 ++++++++++- .../ExternalSystemGateway/findings.md | 346 ++++++++++- code-reviews/HealthMonitoring/findings.md | 361 +++++++++++- code-reviews/Host/findings.md | 328 ++++++++++- code-reviews/InboundAPI/findings.md | 392 ++++++++++++- code-reviews/ManagementService/findings.md | 338 ++++++++++- code-reviews/NotificationOutbox/findings.md | 488 ++++++++++++++++ code-reviews/NotificationService/findings.md | 267 ++++++++- code-reviews/README.md | 237 +++++++- code-reviews/Security/findings.md | 274 ++++++++- code-reviews/SiteCallAudit/findings.md | 322 +++++++++++ code-reviews/SiteEventLogging/findings.md | 384 +++++++++++- code-reviews/SiteRuntime/findings.md | 410 ++++++++++++- code-reviews/StoreAndForward/findings.md | 547 +++++++++++++++++- code-reviews/TemplateEngine/findings.md | 372 +++++++++++- code-reviews/Transport/findings.md | 456 +++++++++++++++ code-reviews/regen-readme.py | 29 +- 25 files changed, 8793 insertions(+), 115 deletions(-) create mode 100644 code-reviews/AuditLog/findings.md create mode 100644 code-reviews/NotificationOutbox/findings.md create mode 100644 code-reviews/SiteCallAudit/findings.md create mode 100644 code-reviews/Transport/findings.md diff --git a/code-reviews/AuditLog/findings.md b/code-reviews/AuditLog/findings.md new file mode 100644 index 00000000..4355dd03 --- /dev/null +++ b/code-reviews/AuditLog/findings.md @@ -0,0 +1,500 @@ +# Code Review — AuditLog + +| Field | Value | +|-------|-------| +| Module | `src/ScadaLink.AuditLog` | +| Design doc | `docs/requirements/Component-AuditLog.md` | +| Status | Reviewed | +| Last reviewed | 2026-05-28 | +| Reviewer | claude-agent | +| Commit reviewed | `1eb6e97` | +| Open findings | 11 | + +## Summary + +AuditLog is one of the larger and most carefully-engineered modules in the codebase. +The site-side hot-path (`SqliteAuditWriter` + `FallbackAuditWriter` + `RingBufferFallback`) +implements a textbook bounded-channel + dedicated-writer pattern with batched transactions, +UTF-8-safe truncation, additive schema migration, and a drop-oldest fallback that +genuinely honours the "audit-write must NEVER abort the user-facing action" contract. +The central side mirrors that with per-row try/catch on batch ingest, a transactional +dual-write for the cached-telemetry path, per-site cursor isolation in reconciliation, +and a partition-switch purge that is metadata-only. The payload filter is well-factored +with a compile-time regex cache, per-stage failure isolation, and per-target overrides. +Test coverage is broad — ~12 000 lines spanning unit, integration, and end-to-end paths. + +Themes across findings: (1) the largest issue is a **specced-but-unwired transport path** — +`ISiteStreamAuditClient.IngestCachedTelemetryAsync` and `AuditLogIngestActor.OnCachedTelemetryAsync` +both exist and the protobuf RPC is plumbed, but no production code ever calls the cached-telemetry +client; the cached-call lifecycle audit rows ride the audit-only `IngestAuditEventsAsync` drain +and the central dual-write transaction is dead code (AuditLog-001). (2) Several +**Akka.NET supervisor-strategy comments are inaccurate** — multiple actors document +"`SupervisorStrategy` uses Resume" but the code returns `DefaultDecider` (which Restarts), and +the strategy applies to children, not to the actor itself (AuditLog-002). (3) The +**`SqliteAuditWriter` hot-path lock is contended by the 30 s backlog probe** — `GetBacklogStatsAsync` +takes the same `_writeLock` that serialises every batch INSERT, so a large-backlog scan can +park the hot-path writer (AuditLog-005). (4) **Sync-over-async in `Dispose`** can deadlock under +an ASP.NET sync context (AuditLog-006). (5) A handful of **misleading code comments and minor +configuration drift** (AuditLog-007, AuditLog-008, AuditLog-009). (6) `CancellationToken` +parameters on the actor drain paths are accepted but immediately replaced with +`CancellationToken.None` (AuditLog-010). (7) The site-only `AddAuditLogHealthMetricsBridge` +registers the `SiteAuditBacklogReporter` hosted service but the `AddAuditLog` registration +chain doesn't reject a central composition root that mistakenly calls the site bridge +(AuditLog-011). No Critical-severity issues; three Medium, eight Low. + +## Checklist coverage + +| # | Category | Examined | Notes | +|---|----------|----------|-------| +| 1 | Correctness & logic bugs | Yes | `AuditLogIngestActor.OnCachedTelemetryAsync` is unreachable production code (AuditLog-001); reconciliation cursor advances on persistent insert failure (AuditLog-004); `Dispose` comment about `_disposed` ordering is misleading (AuditLog-009). | +| 2 | Akka.NET conventions | Yes | `SupervisorStrategy` returned by actors does not do what the surrounding doc says (AuditLog-002); per-actor strategy applies to children only, but comments imply self-protection. | +| 3 | Concurrency & thread safety | Yes | `GetBacklogStatsAsync` contends with hot-path writes on `_writeLock` (AuditLog-005); sync DI scopes block on async EF disposal (AuditLog-003); `_disposed` is set after the wait, contradicting comment (AuditLog-009); no cooperative cancellation through the drain paths (AuditLog-010). | +| 4 | Error handling & resilience | Yes | Best-effort contract is honoured throughout; `Dispose()` sync-over-async is the one remaining hazard (AuditLog-006); reconciliation silently discards permanently-failing rows (AuditLog-004). | +| 5 | Security | Yes | Append-only enforcement, redaction stack, and "never under-redact" safety net all present. Test composition roots that omit the filter SILENTLY pass payloads through unredacted (AuditLog-008). | +| 6 | Performance & resource management | Yes | Hot-path batched + back-pressured. Backlog scan holds the write lock (AuditLog-005); `MarkForwardedAsync` interpolates an `IN (...)` list inside the lock, fine in practice but scales linearly with batch size. | +| 7 | Design-document adherence | Yes | Combined telemetry transport plumbed but never called (AuditLog-001); other than that the implementation closely tracks the design doc. | +| 8 | Code organization & conventions | Yes | Composition root well-segmented; `INodeIdentityProvider` resolution mixes `GetService` and `GetRequiredService` for the same dependency across registrations (AuditLog-007); `AddAuditLog*` helpers register hosted services and option bindings without idempotency guards (AuditLog-011). | +| 9 | Testing coverage | Yes | Excellent surface coverage. Integration tests exist for the dual-write path in `AuditLogIngestActorCombinedTelemetryTests` and `CachedCallCombinedTelemetryTests`, but those drive the actor directly via the test harness — there is no integration test that asserts the production end-to-end emits a `CachedTelemetryBatch` from the site (because nothing does). | +| 10 | Documentation & comments | Yes | Several large XML-doc paragraphs are accurate, but the `SupervisorStrategy` comments (AuditLog-002), the `Dispose` ordering comment (AuditLog-009), and a few stale "Bundle X" references could mislead a new reader. | + +## Findings + +### AuditLog-001 — Combined-telemetry transport is plumbed end-to-end but never invoked in production + +| | | +|--|--| +| Severity | Medium | +| Category | Design-document adherence | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Site/Telemetry/ISiteStreamAuditClient.cs:45`, `src/ScadaLink.AuditLog/Site/Telemetry/ClusterClientSiteAuditClient.cs:86`, `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs:198` | + +**Description** + +The design (Component-AuditLog.md §"Cached Operations — Combined Telemetry") specifies a +single `CachedCallTelemetry` packet per lifecycle event that carries BOTH the audit row +AND the operational `SiteCalls` upsert, with central writing both rows in one transaction. +The infrastructure exists: `ISiteStreamAuditClient.IngestCachedTelemetryAsync` is on the +interface; `ClusterClientSiteAuditClient.IngestCachedTelemetryAsync` builds the +`IngestCachedTelemetryCommand`; the proto carries `CachedTelemetryBatch`; +`AuditLogIngestActor.OnCachedTelemetryAsync` performs the dual `InsertIfNotExists` + +`UpsertAsync` inside a `BeginTransactionAsync`. But a `grep` for callers of +`IngestCachedTelemetryAsync` in `src/ScadaLink.AuditLog` shows only the interface +declaration and the two implementations — nothing produces a `CachedTelemetryBatch` for +the site to push. The `SiteAuditTelemetryActor.OnDrainAsync` only calls +`IngestAuditEventsAsync` (the audit-only path); cached-call audit rows written by +`CachedCallTelemetryForwarder` to local SQLite are drained as ordinary audit events, +and the `SiteCalls` operational half rides a separate `UpsertSiteCallCommand` channel +into `SiteCallAuditActor`. The "central writes AuditLog + SiteCalls in one transaction" +guarantee is therefore not delivered — the two writes are now uncorrelated across +actors and can fail independently, and the dual-write path in `AuditLogIngestActor` +is dead production code. + +**Recommendation** + +Either (a) wire the combined path: build a `CachedTelemetryBatch` from the audit rows +the forwarder writes (alongside the operational half held by `IOperationTrackingStore`), +add a parallel drain loop that calls `IngestCachedTelemetryAsync`, and gate the audit-only +drain so cached-call rows don't double-emit; or (b) update the design doc + the +`AuditLogIngestActor` / `ClusterClientSiteAuditClient` / interface XML comments to +acknowledge that the two halves now flow via separate transports, and delete the +unreachable `OnCachedTelemetryAsync` dual-write code (after confirming the +`AuditLogIngestActorCombinedTelemetryTests` integration tests exercise it via direct +actor injection only). + +**Resolution** + +_Unresolved._ + +### AuditLog-002 — `SupervisorStrategy` comments claim Resume semantics but code returns the default Restart decider + +| | | +|--|--| +| Severity | Low | +| Category | Akka.NET conventions | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs:99-103`, `src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs:109-115`, `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs:315-321` | + +**Description** + +Three central actors (`AuditLogIngestActor`, `AuditLogPurgeActor`, `SiteAuditReconciliationActor`) +all override `SupervisorStrategy()` and return +`new OneForOneStrategy(maxNrOfRetries: 0, withinTimeRange: TimeSpan.Zero, decider: DefaultDecider)`. +The surrounding XML / inline comments variously claim "uses `Resume` so a thrown exception +inside `ReceiveAsync` does not restart the actor" (AuditLogIngestActor remarks), +"uses Resume so any leaked exception keeps the singleton alive for the next tick" +(AuditLogPurgeActor remarks), and "the actor's supervisor strategy keeps it alive +across any leaked exception with `DefaultDecider`'s Restart semantics — restart resets +the in-memory cursors, but as noted above that's a safe (over-pull, idempotent) recovery" +(SiteAuditReconciliationActor remarks — at least correctly says Restart, but conflicts +with the other two). Two things are wrong: (1) the strategy returned by an actor's +`SupervisorStrategy()` override governs how that actor supervises its CHILDREN, not how +its own parent treats it — so it is not the mechanism that protects these singletons +from their own throws; (2) `DefaultDecider` Restarts for most exceptions, not Resumes. +The actors are in fact protected by the per-row / per-batch try/catch blocks inside +the receive handlers — the supervisor override is effectively unused, since these +actors have no children. The comments mislead a reader into trusting a guarantee +that the code does not deliver. + +**Recommendation** + +Pick one of two corrections: either delete the `SupervisorStrategy` override (these +actors have no children, so the override is dead) and rewrite the comments to credit +the try/catch blocks for the alive-on-throw guarantee; or — if the override is kept +as a forward-compat hedge — change the decider to `Decider.From(_ => Directive.Stop)` +or similar to match the comment, AND add a clear note that the per-row catch is what +keeps the actor running across handler throws, not the supervisor strategy. + +**Resolution** + +_Unresolved._ + +### AuditLog-003 — `AuditLogIngestActor.OnIngestAsync` uses `CreateScope`, but `OnCachedTelemetryAsync` uses `CreateAsyncScope` — and only one disposes asynchronously + +| | | +|--|--| +| Severity | Low | +| Category | Concurrency & thread safety | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs:133`, `src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs:139`, `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs:178` | + +**Description** + +`OnCachedTelemetryAsync` opens `_serviceProvider!.CreateAsyncScope()` and lets +`await using` dispose it. `OnIngestAsync`, `OnTickAsync` in +`SiteAuditReconciliationActor`, and `OnTickAsync` in `AuditLogPurgeActor` all open +`_services.CreateScope()` (the synchronous variant) and dispose it with a synchronous +`scope.Dispose()` in a `finally` block — even though the per-message work is async and +the scoped `IAuditLogRepository` resolves an EF Core `DbContext`, which implements +`IAsyncDisposable`. The synchronous `Dispose()` on a `DbContext` blocks on any pending +async connection cleanup; under load this can hold the actor thread for the duration +of a connection close, which on SQL Server may include sending a `SET TRANSACTION +ISOLATION LEVEL` reset round-trip. Switching to `CreateAsyncScope()` + `await using` +is the recommended pattern for scoped EF resources. + +**Recommendation** + +Change `_services.CreateScope()` to `_services.CreateAsyncScope()` in +`OnIngestAsync`, `SiteAuditReconciliationActor.OnTickAsync`, and +`AuditLogPurgeActor.OnTickAsync`, and replace the `try/finally { scope?.Dispose(); }` +pattern with `await using var scope = _services.CreateAsyncScope();`. The DI scope +will dispose asynchronously and the EF Core context will be released without +blocking the actor thread. + +**Resolution** + +_Unresolved._ + +### AuditLog-004 — `SiteAuditReconciliationActor` advances cursor even on per-row insert failure, silently abandoning permanently-failing rows + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs:233-265` | + +**Description** + +`PullSiteAsync` iterates the pulled events, calls `InsertIfNotExistsAsync` inside a +per-row try/catch, and unconditionally updates `maxOccurred = evt.OccurredAtUtc` after +the try/catch — regardless of whether the insert succeeded or threw. The comment at +line 247 acknowledges this: "the cursor still advances based on OccurredAtUtc — the +row was returned by the site, so the next tick won't re-fetch it; if it permanently +fails to persist, that's an operational concern surfaced by the log, not a hot-loop +trigger." For a transient fault that flips to success on the next pull the design +holds. But if a row throws on EVERY central attempt (truly permanent persistence fault — +e.g. column-too-long, FK violation that won't resolve) the cursor advance still moves +past it, and central will simply log on every reconciliation tick. No alert escalates +beyond a log line. Worse, the site keeps the row `Pending` (because `MarkReconciledAsync` +is only called for rows the puller flipped centrally) AND will trip the +`SiteAuditTelemetryStalled` signal because the backlog never drains, but the central +log message is the only place an operator could correlate the stall with the +persistent insert failure. + +**Recommendation** + +Either (a) only advance the cursor for rows whose `InsertIfNotExistsAsync` returned +cleanly — leave `maxOccurred` at the previous value for the failing row so the next +tick retries; or (b) increment a dedicated `CentralAuditPermanentInsertFailure` health +counter on the per-row catch so the failure is observable on the dashboard instead of +buried in the log. Option (a) needs a guard against the same row throwing forever +(saturate the puller) — a small per-event retry counter held in the actor's state with +a permanent-skip + `LogCritical` threshold is the standard escape valve. + +**Resolution** + +_Unresolved._ + +### AuditLog-005 — `GetBacklogStatsAsync` holds the SQLite hot-path write lock for the full COUNT+MIN scan + +| | | +|--|--| +| Severity | Medium | +| Category | Performance & resource management | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs:597-657` | + +**Description** + +`SqliteAuditWriter.GetBacklogStatsAsync` takes `_writeLock` (the same lock that +serialises every batch INSERT in `FlushBatch`) and holds it for the duration of a +`SELECT COUNT(*), MIN(OccurredAtUtc) FROM AuditLog WHERE ForwardState = 'Pending'`. +`SiteAuditBacklogReporter` calls this on a 30-second timer. On a healthy site with +few `Pending` rows the index-only scan is fast; under the scenario the metric exists +to detect — a prolonged central outage growing the backlog "indefinitely" per +Component-AuditLog.md — a `COUNT(*)` over hundreds of thousands of `Pending` rows +on the `IX_SiteAuditLog_ForwardState_Occurred` index is no longer cheap, and the +duration of that scan is added to the hot-path write latency for every concurrent +script. The hot path is supposed to be "durable in microseconds" per the design doc; +a multi-hundred-millisecond probe stall in the same period would not be visible +externally but would back-pressure the bounded write channel. `ReadPendingAsync` and +`ReadPendingSinceAsync` share the same lock for the same reason and have the same +exposure under backlog growth. + +**Recommendation** + +Either (a) move the SELECT outside the write lock by using a second, dedicated +read-only SQLite connection (Microsoft.Data.Sqlite supports concurrent connections +to the same file when journal_mode=WAL is enabled — which would also benefit the +hot path); or (b) cache the last snapshot inside the writer and recompute it +lazily on a dedicated background tick so the reporter reads a pre-computed snapshot +without acquiring the write lock. Option (a) also unblocks `ReadPendingAsync` / +`ReadPendingSinceAsync` from competing with the writer. + +**Resolution** + +_Unresolved._ + +### AuditLog-006 — `SqliteAuditWriter.Dispose()` does sync-over-async and may deadlock + +| | | +|--|--| +| Severity | Low | +| Category | Concurrency & thread safety | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs:697-700` | + +**Description** + +```csharp +public void Dispose() +{ + DisposeAsync().AsTask().GetAwaiter().GetResult(); +} +``` + +This is the classic sync-over-async anti-pattern. `DisposeAsync` `await`s the +writer-loop task with `.ConfigureAwait(false)`, so on a thread with no synchronization +context (the typical .NET 10 host shutdown path) it's fine; but if any caller invokes +`Dispose()` from a context that captures (an ASP.NET request, a SynchronizationContext +test runner, an Akka.NET dispatcher in some configurations) the `GetResult()` blocks +the captured thread while the continuation tries to resume on it — classic deadlock. +The writer is registered as a DI singleton, so this is unlikely to bite during the +host's `IAsyncDisposable` shutdown (DI prefers `DisposeAsync` when available), but +an integration test or future code path that constructs the writer manually inside +a sync context will hang. + +**Recommendation** + +Drop the `IDisposable` interface and rely on `IAsyncDisposable` only — the DI container +will call `DisposeAsync` on singletons that implement it. If a sync `Dispose` is +required for compatibility with consumers that don't honour `IAsyncDisposable`, +implement it as a best-effort that calls `_writeQueue.Writer.TryComplete()` + a +short wait, without blocking the thread for the full async drain. + +**Resolution** + +_Unresolved._ + +### AuditLog-007 — `INodeIdentityProvider` resolution mixes `GetService` and `GetRequiredService` inconsistently across `AddAuditLog` registrations + +| | | +|--|--| +| Severity | Low | +| Category | Code organization & conventions | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs:148-218` | + +**Description** + +`AddAuditLog` registers three components that depend on `INodeIdentityProvider`: +- `CachedCallTelemetryForwarder` — resolves with `sp.GetService()` + (optional, falls back to a null `SourceNode`). +- `CachedCallLifecycleBridge` — resolves with `sp.GetService()` + (optional, same fallback). +- `CentralAuditWriter` — resolves with `sp.GetRequiredService()` + (required, throws at first resolution if unregistered). + +The XML comments at lines 153 / 175 / 215 explain the reasoning — the first two are +optional because tests may skip the registration; the third is required because "the +production composition root in `SiteServiceRegistration` registers the provider as a +singleton on both site and central paths". But this is a fragile guarantee — `AddAuditLog` +itself does NOT register the provider, so a future composition root that calls +`AddAuditLog` without first calling whatever registers `INodeIdentityProvider` will fail +on the FIRST resolution of `ICentralAuditWriter` (which is a lazy factory) rather than +at `AddAuditLog` time. The result: site nodes that "happen to work" because they hold +a registered provider, central composition test fixtures that fail at runtime instead +of DI-build time, and a `GetService`/`GetRequiredService` split that gives no clear +contract to the reader. + +**Recommendation** + +Either (a) make all three optional: `CentralAuditWriter` already handles a null provider +gracefully (line 113-116 — null-coalescing the caller's value); the asymmetry buys +nothing. Or (b) make all three required and either add `services.AddSingleton()` +inside `AddAuditLog` (with a sensible default — null node name returns ``) or +add an explicit guard at the top of `AddAuditLog` that throws if no provider has been +registered yet (`services.Any(d => d.ServiceType == typeof(INodeIdentityProvider))`). + +**Resolution** + +_Unresolved._ + +### AuditLog-008 — Test composition roots that omit `IAuditPayloadFilter` silently pass UNREDACTED payloads through the writer chain + +| | | +|--|--| +| Severity | Low | +| Category | Security | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs:51-77`, `src/ScadaLink.AuditLog/Central/CentralAuditWriter.cs:77-104`, `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs:125,155` | + +**Description** + +`FallbackAuditWriter`, `CentralAuditWriter`, and `AuditLogIngestActor` all accept an +`IAuditPayloadFilter` as an optional dependency, defaulting to `null = pass-through`. +The justification in every XML comment is the same: "the M4 test composition roots +that don't pass one keep working (they only ever write small payloads)". This is fine +for size — but the filter also performs HEADER REDACTION (`Authorization`, `Cookie`, +`Set-Cookie`, `X-API-Key`), GLOBAL BODY REDACTORS, and SQL PARAMETER REDACTION. A test +fixture (or any future composition root that bypasses `AddAuditLog`) that injects a +real `RequestSummary` will see secrets written to SQLite / MS SQL with no redaction. +The combination "audit-write must never abort the user-facing action" + "unredacted +secrets must never persist" (Component-AuditLog.md §Payload Capture Policy) makes the +no-filter fallback genuinely dangerous — over-redacting on a missing filter is the +contract the production setup honours, but the code itself defaults to under-redact. + +**Recommendation** + +Change the three null-coalesce sites to default to a non-null sentinel filter that +performs the header redaction (`HeaderRedactList`) using the hard-coded defaults +from `AuditLogOptions`, even when no `IAuditPayloadFilter` is registered. The +truncation stage can remain optional; the header redaction must not. Alternatively, +make `IAuditPayloadFilter` non-optional and have `AddAuditLog` register the real +filter unconditionally — tests that don't bind the options section will resolve the +default `AuditLogOptions` and get the production-default redact list automatically. + +**Resolution** + +_Unresolved._ + +### AuditLog-009 — `SqliteAuditWriter.DisposeAsync` comment claims `_disposed` is set early, but it isn't + +| | | +|--|--| +| Severity | Low | +| Category | Documentation & comments | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs:706-740` | + +**Description** + +The first `lock (_writeLock)` block in `DisposeAsync` is commented: + +> Stop accepting new events. Setting _disposed first ensures any FlushBatch entered +> after we mark disposed will fault its pending events rather than touching the +> about-to-close connection. + +But the block does NOT set `_disposed = true` — it only calls +`_writeQueue.Writer.TryComplete()` and captures `_writerLoop`. The `_disposed` flag is +flipped in the SECOND lock block (line 738), AFTER the 5-second wait on the writer +loop. During the wait window, a concurrent `WriteAsync` that observed the channel +NOT-yet-completed (race: it ran before `TryComplete`) and got past `TryWrite` would +land on the writer loop's `FlushBatch`, which then takes the lock and checks +`_disposed` — and finds it still `false`. The check at the top of `FlushBatch` +(line 265) `if (_disposed) { fault pending; return; }` therefore does NOT fire during +the dispose window. In practice the channel being completed drains the loop cleanly +and the disposable race is benign, but the comment claims a guarantee that the code +does not implement. + +**Recommendation** + +Either set `_disposed = true` in the first lock block to match the comment (and remove +the duplicate `_disposed` check in the second block); or rewrite the comment to +describe the actual ordering: the channel is completed first, the loop drains +remaining items under the lock, and `_disposed = true` is set only after the loop +exits. The current code is correct; the comment is wrong. + +**Resolution** + +_Unresolved._ + +### AuditLog-010 — Actor drain paths accept a `CancellationToken` parameter but always pass `CancellationToken.None` downstream + +| | | +|--|--| +| Severity | Low | +| Category | Concurrency & thread safety | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs:92,107,124`, `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs:228` | + +**Description** + +The drain loops on `SiteAuditTelemetryActor.OnDrainAsync` and the per-site pull on +`SiteAuditReconciliationActor.PullSiteAsync` both pass `CancellationToken.None` to +every async dependency call (queue reads, gRPC client, repository writes). The actor +has no `CancellationToken` field, so there's no in-flight cancellation source — +graceful shutdown relies entirely on `PostStop` being called and the actor's +`Receive` continuation completing naturally. For a healthy gRPC client this is fine, +but a stuck `IngestAuditEventsAsync` call (slow central, partition switch in progress) +holds the actor's continuation indefinitely; the host's coordinated-shutdown will then +time out the actor system and leave the actor in an undefined state. The brief +references "cancellation on stop" in the partition-maintenance comments but +`SiteAuditTelemetryActor` does not implement it. + +**Recommendation** + +Introduce a per-actor `CancellationTokenSource` populated in `PreStart` and cancelled +in `PostStop`; pass `_lifecycleCts.Token` instead of `CancellationToken.None` in +every async dependency call. Same change for `SiteAuditReconciliationActor`. The +existing `OperationCanceledException` is already swallowed by the top-level catch +in `OnDrainAsync` (line 128), so plumbing the token through is a localised change. + +**Resolution** + +_Unresolved._ + +### AuditLog-011 — `AddAuditLogHealthMetricsBridge` and `AddAuditLogCentralMaintenance` are non-idempotent and register hosted services on every call + +| | | +|--|--| +| Severity | Low | +| Category | Code organization & conventions | +| Status | Open | +| Location | `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs:53-55, 263-276, 301-346` | + +**Description** + +The XML doc on `AddAuditLog` is explicit: "Idempotent re-registration is not supported; +call this exactly once per `IServiceCollection`." But `AddAuditLogHealthMetricsBridge` +calls `services.AddHostedService()` (line 275), which is +NOT idempotent — every call registers another descriptor, and the host will spin up +N reporters and have them all poll SQLite every 30 s, all push the same snapshot into +`ISiteHealthCollector`. The site composition path is supposed to call this exactly +once, but tests or composition refactors that accidentally call twice will pay 2x the +SQL probe rate and overwrite the snapshot with conflicting numbers (no race, just +wasted work). Worse, `AddAuditLogCentralMaintenance` (line 301) is also non-idempotent — +`AddOptions` and `AddHostedService` +will pile up. + +**Recommendation** + +Either (a) guard each Add* helper with a "has the marker been seen?" sentinel +(register a private marker descriptor on first call, no-op on subsequent calls); +or (b) explicitly document idempotency on the public surface of every helper and +verify with a unit test in `AddAuditLogTests`. Option (a) matches the pattern other +SDK extensions use and removes a foot-gun. + +**Resolution** + +_Unresolved._ diff --git a/code-reviews/CLI/findings.md b/code-reviews/CLI/findings.md index d1c45f8b..c2b1272e 100644 --- a/code-reviews/CLI/findings.md +++ b/code-reviews/CLI/findings.md @@ -5,10 +5,10 @@ | Module | `src/ScadaLink.CLI` | | Design doc | `docs/requirements/Component-CLI.md` | | Status | Reviewed | -| Last reviewed | 2026-05-17 | +| Last reviewed | 2026-05-28 | | Reviewer | claude-agent | -| Commit reviewed | `39d737e` | -| Open findings | 0 | +| Commit reviewed | `1eb6e97` | +| Open findings | 7 | ## Summary @@ -47,6 +47,36 @@ and `WriteAsTable` derives table columns from only the first array element, sile dropping columns for any later element with a different shape (CLI-016). No Critical/High issues; the module remains healthy. +#### Re-review 2026-05-28 (commit `1eb6e97`) + +The CLI has grown two substantial new command groups since the last re-review — +`scadalink audit` (Audit Log #23 M8) and `scadalink bundle` (Transport #24) — together +adding ~1,500 lines of new production code. The new `audit` surface is well-tested and +well-factored (pure helpers + a clear `IAuditFormatter` seam), but the new `bundle` +surface is untested, duplicates the URL/credential resolution that already exists in +`CommandHelpers`, and inherits a partial authorization-exit-code regression that also +appears in the audit path. Two longstanding fragility gaps that the prior reviews missed +also surface in this pass: `CliConfig.Load` parses the config file with no try/catch, and +`CommandTreeTests` still pins the old 14-group count so the two new groups are excluded +from the leaf-action and registry-resolution coverage that protected the rest of the +tree. Module health is broadly good but the consolidated count is now seven Open +findings (none Critical, three Medium). + +- **CLI-017** — `BundleCommands` duplicates `ExecuteCommandAsync` and skips the + `FORBIDDEN`/`UNAUTHORIZED` exit-code mapping (auth exit 2 contract regression). +- **CLI-018** — `AuditQueryHelpers.RunQueryAsync` / `AuditExportHelpers.RunExportAsync` + return exit 1 on every error, never the documented exit 2 for authorization failure. +- **CLI-019** — `BundleCommands.bundle export` decodes the entire base64 bundle in + memory and writes synchronously — 100 MB bundles double-buffer. +- **CLI-020** — `BundleCommands.bundle export` parses the success body with bare + `JsonDocument.Parse` + `GetProperty` and throws on a malformed/abbreviated envelope. +- **CLI-021** — `CliConfig.Load` crashes the whole CLI when `~/.scadalink/config.json` + is malformed or unreadable, even if `--url` was supplied on the command line. +- **CLI-022** — `AuditCommands` and `BundleCommands` are absent from `CommandTreeTests`; + the test still pins `Equal(14, groups.Count)` and silently excludes the new groups. +- **CLI-023** — `Component-CLI.md` says the audit commands ride `POST /management`, + but the implementation calls a new `GET /api/audit/*` REST endpoint pair. + ## Checklist coverage _Original review (2026-05-16, `9c60592`):_ @@ -79,6 +109,21 @@ _Re-review (2026-05-17, `39d737e`):_ | 9 | Testing coverage | ☑ | Substantially expanded (`CommandTreeTests`, `ManagementHttpClientTests`, `DebugStreamTests`). No new gaps. | | 10 | Documentation & comments | ☑ | XML docs accurate. `Component-CLI.md` drift folded into CLI-015. | +_Re-review (2026-05-28, `1eb6e97`):_ + +| # | Category | Examined | Notes | +|---|----------|----------|-------| +| 1 | Correctness & logic bugs | ☑ | `BundleCommands.BuildExport` unguarded `JsonDocument.Parse` + `GetProperty` (CLI-020); `CliConfig.Load` unguarded JSON parse (CLI-021). | +| 2 | Akka.NET conventions | ☑ | Not applicable — pure HTTP/SignalR/REST client. No issues. | +| 3 | Concurrency & thread safety | ☑ | No new concurrency surface; `debug stream` unchanged since CLI-011/012. No issues. | +| 4 | Error handling & resilience | ☑ | Bundle and audit paths skip the auth exit-code contract (CLI-017, CLI-018); bundle JSON-envelope parse is brittle (CLI-020); config-file parse aborts the process (CLI-021). | +| 5 | Security | ☑ | No new credential or trust-boundary issues. No issues. | +| 6 | Performance & resource management | ☑ | `bundle export` double-buffers the whole bundle in memory (CLI-019). | +| 7 | Design-document adherence | ☑ | `Component-CLI.md` claims audit commands ride `POST /management`; implementation uses new REST endpoints (CLI-023). | +| 8 | Code organization & conventions | ☑ | `BundleCommands.RunBundleCommandAsync` re-implements credential/URL resolution that `CommandHelpers.ExecuteCommandAsync` already provides — drift waiting to happen (CLI-017). | +| 9 | Testing coverage | ☑ | `BundleCommands` has no tests; `CommandTreeTests` pins `Equal(14, …)` and excludes the new `AuditCommands` + `BundleCommands` groups (CLI-022). | +| 10 | Documentation & comments | ☑ | XML docs accurate; doc-vs-code transport drift folded into CLI-023. No other issues. | + ## Findings ### CLI-001 — `SCADALINK_FORMAT` env var and config-file format are dead; format precedence broken @@ -741,3 +786,284 @@ list and `OutputFormatter.WriteTable` pads missing cells, so heterogeneous array render every column. Regression tests added in `TableHeaderUnionTests` (3 tests: later-element-only column included, first-seen column order preserved, first-element-extra column still rendered). + +### CLI-017 — `BundleCommands.RunBundleCommandAsync` duplicates `ExecuteCommandAsync` and breaks the auth exit-code contract + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `src/ScadaLink.CLI/Commands/BundleCommands.cs:244-289` (vs. `src/ScadaLink.CLI/Commands/CommandHelpers.cs:20-73`, `:159-174`) | + +**Description** + +`BundleCommands.RunBundleCommandAsync` re-implements the URL/credential resolution, +validation, and HTTP plumbing that `CommandHelpers.ExecuteCommandAsync` already provides +for every other command group — to attach a 5-minute timeout (`BundleCommandTimeout`) +and a caller-supplied success handler. In duplicating it, two contracts that +`CommandHelpers` carefully establishes were dropped: + +1. **Authorization exit code.** `CommandHelpers.HandleResponse` routes through + `IsAuthorizationFailure`, which returns exit 2 for **either** HTTP 403 **or** an + `UNAUTHORIZED`/`FORBIDDEN` error code on any status (resolution of CLI-009). The + bundle path at line 287 uses a bare `if (response.StatusCode == 403) return 2;` — a + server that signals authorization failure via the `code` field on a non-403 status + (the same channel the rest of the CLI honours) will exit 1 instead of 2 from + `bundle export`/`preview`/`import`. `Component-Transport.md:289` explicitly states + "Exit codes follow the project convention: `0` = success, `1` = command failure, + `2` = authorization failure," so this is a contract regression. +2. **Error-message phrasing drift.** The two duplicated error paths + (`bundle:258-260`, `:264-266`) emit shorter messages that omit the + `SCADALINK_MANAGEMENT_URL` / `SCADALINK_USERNAME` env-var hints the canonical paths + give — confusing if the user is trying to debug what's missing. + +**Recommendation** + +Refactor `CommandHelpers.ExecuteCommandAsync` to accept an optional `TimeSpan` timeout +and an optional success handler, and have `BundleCommands` call it. Failing that, +extract `CommandHelpers.IsAuthorizationFailure` to `internal` and call it from +`RunBundleCommandAsync` in place of the bare 403 check, and copy the canonical error +messages verbatim. + +**Resolution** + +_Unresolved._ + +### CLI-018 — `audit query` and `audit export` never return exit 2 for an authorization failure + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `src/ScadaLink.CLI/Commands/AuditQueryHelpers.cs:186-193`, `src/ScadaLink.CLI/Commands/AuditExportHelpers.cs:147-153` | + +**Description** + +The two audit-log subcommands (`audit query`, `audit export`) ride a new REST surface +(`GET /api/audit/query` and `GET /api/audit/export`) — not the `POST /management` +envelope that goes through `CommandHelpers.HandleResponse`. Both helpers map *any* +non-success response to a generic `OutputFormatter.WriteError(...)` + `return 1`: + +- `AuditQueryHelpers.RunQueryAsync:186-193` returns 1 unconditionally when `JsonData` + is null (i.e. any error). It never inspects `StatusCode` or `ErrorCode`. +- `AuditExportHelpers.RunExportAsync:147-153` returns 1 for every non-success status, + again with no 403 / `FORBIDDEN` carve-out. + +`Component-CLI.md:295-296` documents exit code 2 for "Authorization failure (insufficient +role)". `Component-AuditLog.md` (Security & Tamper-Evidence) and `Component-CLI.md:184-187` +both call out that the audit endpoints are gated by `OperationalAudit` and `AuditExport` +permissions enforced server-side — i.e. these are exactly the commands most likely to +return 403 in routine use. The exit-code regression silently downgrades a 403 to a +generic command failure, breaking the CI/CD scripting contract. + +**Recommendation** + +Promote `CommandHelpers.IsAuthorizationFailure` to `internal` (or move it to a small +shared helper) and have `RunQueryAsync` / `RunExportAsync` return 2 when it matches. +The check needs to use the `ManagementResponse.StatusCode` / `ErrorCode` pair the +audit `SendGetAsync` already populates. + +**Resolution** + +_Unresolved._ + +### CLI-019 — `bundle export` decodes the entire base64 bundle into memory before writing + +| | | +|--|--| +| Severity | Medium | +| Category | Performance & resource management | +| Status | Open | +| Location | `src/ScadaLink.CLI/Commands/BundleCommands.cs:117-124`, `src/ScadaLink.CLI/ManagementHttpClient.cs:47-92` | + +**Description** + +`Component-Transport.md:271` ceilings the raw bundle at 100 MB and notes the +per-request body cap is raised to 200 MB once base64-inflated. The CLI's export path +goes through `ManagementHttpClient.SendCommandAsync`, which reads the entire response +body into a string (`responseBody = await httpResponse.Content.ReadAsStringAsync(...)`) +and returns it as `ManagementResponse.JsonData`. `BundleCommands.BuildExport` then: + +1. `JsonDocument.Parse(jsonOk)` re-allocates the JSON DOM (~200 MB string + DOM). +2. `doc.RootElement.GetProperty("base64Bundle").GetString()` materializes the base64 + payload as another ~200 MB `string`. +3. `Convert.FromBase64String(base64)` allocates a fresh ~100 MB `byte[]`. +4. `File.WriteAllBytes(output, bytes)` writes synchronously. + +Peak working-set for a 100 MB bundle is therefore ~600 MB, all on the LOH, plus the +file-I/O is fully synchronous. The streaming `SendGetStreamAsync` path the audit +export uses (line 155-156) shows the right pattern is already available for plain GETs, +but bundles ride a `POST /management` envelope so they currently can't reuse it. + +**Recommendation** + +For the export path specifically, add a streaming variant — either a new +`POST /api/bundle/export` REST endpoint mirroring the audit pattern, or a chunk-fetch +follow-up `GET /api/bundle/` so the CLI can stream bytes through +`Stream.CopyToAsync` without buffering the whole envelope. If a v1 stop-gap is needed, +at minimum switch to `File.WriteAllBytesAsync` and use `Convert.TryFromBase64Chars` +with a rented buffer to avoid the double-LOH allocation. + +**Resolution** + +_Unresolved._ + +### CLI-020 — `bundle export` success-envelope parse is unguarded + +| | | +|--|--| +| Severity | Low | +| Category | Correctness & logic bugs | +| Status | Open | +| Location | `src/ScadaLink.CLI/Commands/BundleCommands.cs:117-126` | + +**Description** + +The export success handler does: + +```csharp +using var doc = JsonDocument.Parse(jsonOk); +var base64 = doc.RootElement.GetProperty("base64Bundle").GetString()!; +var byteCount = doc.RootElement.GetProperty("byteCount").GetInt32(); +var bytes = Convert.FromBase64String(base64); +``` + +None of these calls are wrapped in a `try/catch`. A server-side bug that omits one of +the two properties, returns a `null` `base64Bundle`, sends invalid base64, or sends a +malformed JSON envelope will surface as one of `KeyNotFoundException` / +`InvalidOperationException` / `FormatException` — an unhandled stack trace, not a clean +`INVALID_RESPONSE` / exit 1, contradicting the "graceful-degradation" theme that the +prior reviews (CLI-002, CLI-003, CLI-005) repeatedly hardened. + +**Recommendation** + +Wrap the parse + base64-decode in a `try` block that catches `JsonException`, +`KeyNotFoundException`, `InvalidOperationException`, and `FormatException` and emits a +clean `OutputFormatter.WriteError(..., "INVALID_RESPONSE")` + `return 1`. Add a +regression test against a malformed-envelope stub `HttpMessageHandler`. + +**Resolution** + +_Unresolved._ + +### CLI-021 — `CliConfig.Load` crashes the CLI on a malformed config file + +| | | +|--|--| +| Severity | Low | +| Category | Error handling & resilience | +| Status | Open | +| Location | `src/ScadaLink.CLI/CliConfig.cs:41-53` | + +**Description** + +`CliConfig.Load` is the first thing every command runs (via `ExecuteCommandAsync`, +`AuditCommandHelpers.ResolveConnection`, and `BundleCommands.RunBundleCommandAsync`). +Its config-file branch is: + +```csharp +if (File.Exists(configPath)) +{ + var json = File.ReadAllText(configPath); + var fileConfig = JsonSerializer.Deserialize(json, ...); + ... +} +``` + +Neither call is guarded. If `~/.scadalink/config.json` exists but is malformed +(stale, partial, or someone's `vim` swap), `JsonSerializer.Deserialize` throws +`JsonException`. If the file exists but isn't readable (mode 0000), +`File.ReadAllText` throws `UnauthorizedAccessException`. Either fault aborts every +CLI invocation with an unhandled stack trace — even invocations that supply every +input on the command line and don't need the config file at all (`--url`, +`--username`, `--password`, `--format` all on the CLI). + +**Recommendation** + +Wrap the file-read and the `JsonSerializer.Deserialize` in a single +`try/catch (Exception)` (or specifically `JsonException` + +`UnauthorizedAccessException` + `IOException`). On failure, write a single one-line +warning to `Console.Error` ("ignoring malformed `~/.scadalink/config.json`: {message}") +and return the default `CliConfig`, so the rest of the precedence chain (env vars + +command-line flags) still works. + +**Resolution** + +_Unresolved._ + +### CLI-022 — `CommandTreeTests` excludes the two new command groups + +| | | +|--|--| +| Severity | Low | +| Category | Testing coverage | +| Status | Open | +| Location | `tests/ScadaLink.CLI.Tests/CommandTreeTests.cs:21-37`, `:55-58` (vs. `src/ScadaLink.CLI/Program.cs:21-36`) | + +**Description** + +`CommandTreeTests.AllCommandGroups()` builds 14 command groups; `Program.cs` now +registers 16 (`AuditCommands` and `BundleCommands` were added since the last +re-review). Worse, the smoke test pins `Assert.Equal(14, groups.Count)`, so the +test list intentionally matches the harness's array and stays green even though the +real production tree is two groups larger. The downstream assertions +(`EveryLeafCommand_HasAnAction`, `CommandPayloadTypes_ResolveViaRegistry`) therefore +also do NOT cover the new audit / bundle leaves — and `BundleCommands` has zero +test coverage of any kind (no parsing tests, no success-handler tests, no +registry-resolution tests). + +**Recommendation** + +Add `AuditCommands.Build(...)` and `BundleCommands.Build(...)` to the +`AllCommandGroups()` array, bump the assertion to `Equal(16, groups.Count)`, and add +representative payload types to `CommandPayloadTypes_ResolveViaRegistry` +(`ExportBundleCommand`, `PreviewBundleCommand`, `ImportBundleCommand`). Optionally, +add a `BundleCommandsTests` file covering the success-envelope parse and the +`NameListOption` comma-split parser. + +**Resolution** + +_Unresolved._ + +### CLI-023 — `Component-CLI.md` claims audit commands ride `POST /management`; implementation uses REST endpoints + +| | | +|--|--| +| Severity | Low | +| Category | Design-document adherence | +| Status | Open | +| Location | `docs/requirements/Component-CLI.md:310-311` (vs. `src/ScadaLink.CLI/Commands/AuditQueryHelpers.cs:186`, `src/ScadaLink.CLI/Commands/AuditExportHelpers.cs:126`, `src/ScadaLink.CLI/ManagementHttpClient.cs:94-156`) | + +**Description** + +`Component-CLI.md:310` states: "The `scadalink audit` command group rides this same +transport — there is no separate audit endpoint." But the implementation calls a +new REST surface — `GET /api/audit/query` and `GET /api/audit/export` — via two new +methods on `ManagementHttpClient` (`SendGetAsync`, `SendGetStreamAsync`), distinct +from the `POST /management` envelope. The plan document +(`docs/plans/2026-05-20-audit-log-code-roadmap.md:1583`) corroborates the +implementation: "REST endpoints `GET /api/audit/query` (paged) and +`GET /api/audit/export` (streaming)" — i.e. the design doc is the stale one. + +A reader following `Component-CLI.md` would expect the audit endpoints to share +the management envelope's authentication + dispatch path and route through +`ManagementActor`, neither of which is true. The auth-exit-code regression +(CLI-018) is itself a direct consequence of this divergence: the audit helpers +duplicate the management envelope's response handling instead of riding it, and +forgot to copy the auth carve-out. + +**Recommendation** + +Update `Component-CLI.md:310-311` (and the Dependencies bullet at `:311`) to +describe the actual REST surface: `GET /api/audit/query` (paged) and +`GET /api/audit/export` (streaming), with HTTP Basic Auth shared with the +management envelope and permission checks enforced by the server-side +`AuditController`. Optionally cross-link to +`docs/plans/2026-05-20-audit-log-code-roadmap.md` (M8 task list) as the +authoritative source. + +**Resolution** + +_Unresolved._ diff --git a/code-reviews/CentralUI/findings.md b/code-reviews/CentralUI/findings.md index 007bb2ec..25599720 100644 --- a/code-reviews/CentralUI/findings.md +++ b/code-reviews/CentralUI/findings.md @@ -5,10 +5,10 @@ | Module | `src/ScadaLink.CentralUI` | | Design doc | `docs/requirements/Component-CentralUI.md` | | Status | Reviewed | -| Last reviewed | 2026-05-17 | +| Last reviewed | 2026-05-28 | | Reviewer | claude-agent | -| Commit reviewed | `39d737e` | -| Open findings | 0 | +| Commit reviewed | `1eb6e97` | +| Open findings | 8 | ## Summary @@ -73,6 +73,55 @@ cross-thread `Dictionary`; CentralUI-022 unguarded `InvokeAsync`), category 4 claims), category 9 (CentralUI-025 untested `SessionExpiry` poll). Categories 1, 2, 5, 6, 7, 10 produced no new findings. +_Re-review (2026-05-28, `1eb6e97`):_ + +| # | Category | Examined | Notes | +|---|----------|----------|-------| +| 1 | Correctness & logic bugs | ☑ | CentralUI-026 (AuditFilterBar UTC), CentralUI-027 (3 other pages with same UTC bug). | +| 2 | Akka.NET conventions | ☑ | No new findings — module is presentation; `DebugStreamService` actor usage unchanged. | +| 3 | Concurrency & thread safety | ☑ | CentralUI-030 (StringWriter capture buffer not thread-safe under intra-script `Task.WhenAll`). | +| 4 | Error handling & resilience | ☑ | No new findings — the prior CentralUI-018/023 patterns hold. | +| 5 | Security | ☑ | CentralUI-028 (NotificationReport + SiteCallsReport not site-scoped — CentralUI-002 regression on new pages). | +| 6 | Performance & resource management | ☑ | CentralUI-031 (TransportImport buffers full bundle bytes in component state). | +| 7 | Design-document adherence | ☑ | CentralUI-032 (AuditResultsGrid forward-only paging diverges from "keyset paginated" implied bi-directional). | +| 8 | Code organization & conventions | ☑ | CentralUI-029 (`JS.InvokeAsync("eval", ...)` in ConfigurationAuditLog vs the `_content/.../BrowserTime` module pattern). | +| 9 | Testing coverage | ☑ | CentralUI-033 (TransportImport / SiteCallsReport query-string drill-in code paths untested). | +| 10 | Documentation & comments | ☑ | No new findings — code comments accurately describe intent. | + +#### Re-review 2026-05-28 (commit `1eb6e97`) + +All 25 prior findings remain closed. This re-review re-examined the full +module against the 10-category checklist with attention to the +recently-added Transport export/import wizards (`TransportExport`, +`TransportImport`) and the operational Audit Log page (Bundle B..G). The +most consequential pattern in this pass is that the **CentralUI-008 +local-input-treated-as-UTC** bug, fixed for the legacy +`AuditLog.razor` via the `BrowserTime.LocalInputToUtc` helper, has been +silently recreated on every other page that exposes a +`` filter — `AuditFilterBar` (the new +operational Audit Log filter, CentralUI-026), `SiteCallsReport`, +`NotificationReport`, and `EventLogs` (CentralUI-027). The Audit Log +page CSV export URL therefore mis-shifts the From/To filter window by +the operator's UTC offset, and the same offset bug silently corrupts +audit-style queries on Site Calls / Notification Report / Event Logs. +Second-most consequential is **CentralUI-028**: the new `NotificationReport` +and `SiteCallsReport` pages (both `[Authorize(RequireDeployment)]`) do +NOT filter their site dropdown or row data through `SiteScopeService`, +and the relay actions (`RetryNotification`/`DiscardNotification`, +`RetrySiteCall`/`DiscardSiteCall`) issue no server-side site-scope +re-check before relaying to the owning site — so a site-scoped Deployment +user can read and act on notifications and cached calls for sites +outside their grant, replicating the original CentralUI-002 defect on +the two pages added after the CentralUI-002 fix landed. The remaining +new findings (CentralUI-029..CentralUI-033) cover a residual `JS.InvokeAsync("eval", ...)` +in `ConfigurationAuditLog`, a single-thread `StringWriter` capture buffer +in the Test Run sandbox (a sandboxed script that uses `Task.WhenAll` can +write concurrently), a `using var` `MemoryStream` followed by `ms.ToArray()` +buffering the full bundle in memory in `TransportImport`, the +`AuditResultsGrid` having no Previous-page control (forward-only navigation, +a UX/design adherence gap), and the un-tested `TransportImport` / +`SiteCallsReport` query-string drill-in code paths. + ## Findings ### CentralUI-001 — Test Run sandbox executes arbitrary C# with no trust-model enforcement @@ -1216,3 +1265,278 @@ also forces the CentralUI-020 fix. **Resolution** 2026-05-17 — added `SessionExpiryComponentTests` (bUnit): an expired ping (401) redirects to `/login`, a live ping (200) and a transient failure (status 0) do not, and on the `/login` route the component neither pings nor redirects; also added `AuthPingEndpointTests` covering the `/auth/ping` endpoint contract. + +### CentralUI-026 — `AuditFilterBar` From/To filters treat browser-local datetimes as UTC + +| | | +|--|--| +| Severity | Medium | +| Category | Correctness & logic bugs | +| Status | Open | +| Location | `src/ScadaLink.CentralUI/Components/Audit/AuditFilterBar.razor:97-104`; `src/ScadaLink.CentralUI/Components/Audit/AuditQueryModel.cs:56-58,150-178,203-213` | + +**Description** + +The new operational Audit Log filter bar binds two `` controls +straight to `AuditQueryModel.CustomFromUtc` / `CustomToUtc` (`DateTime?`), and `ToFilter` +emits those values as `AuditLogQueryFilter.FromUtc` / `ToUtc` without converting from +the browser's local time zone. A `datetime-local` input yields the user's *browser-local* +wall-clock value, so for any non-UTC user the audit query window is shifted by their UTC +offset — returning the wrong rows from the central `AuditLog` table and producing a +mis-shifted CSV export through `AuditLogPage.BuildExportUrl`, which round-trips the +filter's `FromUtc`/`ToUtc` straight into `?from=`/`?to=` query params. This is the same +defect CentralUI-008 fixed for the legacy `Components/Pages/Monitoring/AuditLog.razor` +via the `BrowserTime.LocalInputToUtc(value, _browserUtcOffsetMinutes)` helper — but the +new Audit Log v2 filter bar does not use that helper, so a Bundle B/C/D/E/F regression +re-introduced the bug for the page-replacement target. The CLAUDE.md "all timestamps are +UTC throughout" decision is satisfied at the wire level but violated at the input +boundary, exactly as the original finding called out. + +**Recommendation** + +Fetch the browser offset once via JS interop (mirroring `ConfigurationAuditLog.OnAfterRenderAsync` +and `AuditLog.razor`'s implementation), pipe both `CustomFromUtc` and `CustomToUtc` through +`BrowserTime.LocalInputToUtc(value, offsetMinutes)` inside `AuditQueryModel.ToFilter` +(or in the filter-bar Apply path before calling `ToFilter`), and add a regression test +that pins the non-UTC behaviour (mirroring `BrowserTimeTests.LocalInputToUtc_NonUtcBrowser_DoesNotEqualNaiveRelabelling`). +The label "Custom From / To" should also be clarified ("UTC" vs "local") in the UI. + +### CentralUI-027 — Same UTC misinterpretation in `SiteCallsReport`, `NotificationReport`, and `EventLogs` + +| | | +|--|--| +| Severity | Medium | +| Category | Correctness & logic bugs | +| Status | Open | +| Location | `src/ScadaLink.CentralUI/Components/Pages/SiteCalls/SiteCallsReport.razor:74-80`; `src/ScadaLink.CentralUI/Components/Pages/SiteCalls/SiteCallsReport.razor.cs:421-425`; `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor:75-81,639-640`; `src/ScadaLink.CentralUI/Components/Pages/Monitoring/EventLogs.razor:62-73,261-262` | + +**Description** + +The same `datetime-local`-treated-as-UTC bug from CentralUI-008 and CentralUI-026 is +present on three other pages: + +- `SiteCallsReport.ToUtc` stamps `DateTimeKind.Utc` on the local-input value + (`DateTime.SpecifyKind(value.Value, DateTimeKind.Utc)`). +- `NotificationReport.ToUtc` does the same — `new DateTimeOffset(DateTime.SpecifyKind(local.Value, DateTimeKind.Utc))`. +- `EventLogs.FetchPage` emits `new DateTimeOffset(_filterFrom.Value, TimeSpan.Zero)`, + which labels the browser-local wall-clock value as UTC (the exact pre-fix shape of + CentralUI-008). + +For any non-UTC operator, every Site-Calls / Notification / Event-Log query is silently +shifted by their UTC offset. The bug is mass-recreated on every page added after +CentralUI-008 landed — the `BrowserTime` helper exists but is only used by the legacy +Audit Log page and `ConfigurationAuditLog`. + +**Recommendation** + +Plumb the browser offset (via `eval` interop or a dedicated JS module, mirroring +`ConfigurationAuditLog`/`AuditLog.razor`) into each of these pages and route every +local-input value through `BrowserTime.LocalInputToUtc(value, offsetMinutes)` before +constructing the wire filter. Add regression tests pinning the non-UTC behaviour for +at least one representative page so the helper's continued use is enforced. + +### CentralUI-028 — `NotificationReport` and `SiteCallsReport` bypass `SiteScopeService` — Deployment role site-scoping defeated on the two new central-mirror pages + +| | | +|--|--| +| Severity | High | +| Category | Security | +| Status | Open | +| Location | `src/ScadaLink.CentralUI/Components/Pages/Notifications/NotificationReport.razor:2,434,472,502`; `src/ScadaLink.CentralUI/Components/Pages/SiteCalls/SiteCallsReport.razor:2,52-59`; `src/ScadaLink.CentralUI/Components/Pages/SiteCalls/SiteCallsReport.razor.cs:97-110,201,250-251,278-279` | + +**Description** + +Both pages are `[Authorize(Policy = RequireDeployment)]` and, per CLAUDE.md "Security & +Auth", the Deployment role must be site-scoped. CentralUI-002 fixed this for every +Deployment/Monitoring page that existed at the time by introducing `SiteScopeService` +and threading `FilterSitesAsync` / `IsSiteAllowedAsync` through the site dropdowns and +mutating calls. The two new central-mirror pages — Notification Report (Notification +Outbox queryable list) and Site Calls Report (Site Call Audit queryable list) — do NOT +inject `SiteScopeService`, do NOT filter their Source-Site `