docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked

Full per-module re-review of the 16 stale modules (last seen 1eb6e97 / 2026-05-28)
plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD 4307c381.

67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commit
fd618cf1 closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix
with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings
(IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision.

Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to
sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024),
an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001),
and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary
is semantically sound (symbol-based) in the production cluster config.

README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
Joseph Doherty
2026-06-20 18:02:32 -04:00
parent fd618cf1dc
commit d39089f4ed
19 changed files with 4031 additions and 69 deletions
+206 -2
View File
@@ -5,9 +5,9 @@
| Module | `src/ZB.MOM.WW.ScadaBridge.DeploymentManager` |
| Design doc | `docs/requirements/Component-DeploymentManager.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Last reviewed | 2026-06-20 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Commit reviewed | `4307c381` |
| Open findings | 0 |
## Summary
@@ -83,8 +83,51 @@ The remaining six findings are medium/low: lifecycle-timeout audit gap
(DeploymentManager-023), and shared static state across `*ProbeActor` tests
(DeploymentManager-024).
#### Re-review 2026-06-20 (commit `4307c381`) — full review
Re-reviewed the whole current module at HEAD after the rename, the
cert-broadcast / Transport `IStaleInstanceProbe` work, and milestone changes.
DeploymentManager-001..024 all remain `Resolved` and verified against source —
the ref-counted `OperationLockManager`, the broadened/non-cancellable failure
writes, the `ApplyPostSuccessSideEffectsAsync` shared helper with
`forceEnabledState` (Disabled-preservation), the lifecycle-timeout audit helper,
the structured diff with List-value normalization, the hoisted global artifact
fetch, and the instance-state-aware reconciliation are all present and correct.
The two flagged cross-module/architectural seams the prompt called out — the
`TrustServerCert`/`RemoveServerCert` broadcast-to-both-nodes and the
`DeploymentManagerActor` deploy-state query handler — live in **SiteRuntime /
Communication / CentralUI**, not this module, so they are out of scope here.
This review found **3 new findings**. The material one is DeploymentManager-025:
the system-wide artifact path still fetches and broadcasts **notification lists
and SMTP configurations (including SMTP credentials)** to every site, in direct
contradiction of the now-explicit design decision that these are central-only
and "no SMTP credential is ever distributed to sites" (Component-DeploymentManager.md
lines 142-146; CLAUDE.md notification-central-only decision). This supersedes
the earlier accepted-deployable-artifact framing of the closed
DeploymentManager-013. DeploymentManager-026 (deployment records are insert-only
— a new row per deploy accumulates per instance, contradicting "only current
status stored, no history table", and the same-tick `OrderByDescending(DeployedAt)`
read has no tiebreaker) and DeploymentManager-027 (artifact tests *assert* the
forbidden notif/SMTP shipping, cementing the DeploymentManager-025 violation)
are the remaining two.
## Checklist coverage
#### Re-review 2026-06-20 (commit `4307c381`)
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ✓ | New: deployment records are insert-only — `DeployInstanceAsync` Adds a new row per deploy; reconciliation's `GetCurrentDeploymentStatusAsync` orders by `DeployedAt` with no tiebreaker (DeploymentManager-026). |
| 2 | Akka.NET conventions | ✓ | Module remains a plain service layer; no actors. The deploy-state-query/cert-broadcast actors live in SiteRuntime, out of scope. No issues. |
| 3 | Concurrency & thread safety | ✓ | `OperationLockManager` ref-counting + gate re-verified; `DeployToAllSitesAsync` prebuilds per-site commands before the parallel phase (no shared DbContext under `Task.WhenAll`). No issues. |
| 4 | Error handling & resilience | ✓ | Failure-status writes use `CancellationToken.None`; lifecycle timeouts now audit; delete-orphan path surfaced. No new issues. |
| 5 | Security | ✓ | New: SMTP credentials are still serialized into the per-site artifact command and broadcast to every site, which the current design forbids outright (DeploymentManager-025). |
| 6 | Performance & resource management | ✓ | Global artifact queries hoisted (DM-023 resolved). Deployment-record row growth is unbounded per instance (part of DeploymentManager-026). |
| 7 | Design-document adherence | ✓ | New: notification lists + SMTP configs are still treated as deployable artifacts, contradicting the "central-only, never distributed to sites" design (DeploymentManager-025). |
| 8 | Code organization & conventions | ✓ | Options bound via Host; `OptionsSection` constant correct. No new issues. |
| 9 | Testing coverage | ✓ | Broad and current. New: artifact tests assert the forbidden notif/SMTP shipping (DeploymentManager-027). |
| 10 | Documentation & comments | ✓ | `ArtifactDeploymentService` class XML doc still lists notification lists + SMTP as broadcast artifacts (stale vs design — folded into DeploymentManager-025). |
#### Re-review 2026-05-28 (commit `1eb6e97`)
| # | Category | Examined | Notes |
@@ -1234,3 +1277,164 @@ recipient (an `IActorRef` to a TestKit probe), and assert via `ExpectMsg`
in each test. Where the simpler counter shape is preferred, pass a
shared-state object into the actor's constructor so each test owns its own
instance — never reach for `static` mutable test state.
### DeploymentManager-025 — Notification lists and SMTP configurations (with credentials) are still broadcast to every site, contradicting the central-only design
| | |
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.DeploymentManager/ArtifactDeploymentService.cs:13-22,128,130,150-151,181-188` |
**Description**
The design now states explicitly, in two authoritative places, that notification
lists and SMTP configuration are **not** deployable artifacts:
- `docs/requirements/Component-DeploymentManager.md:142-146`: *"Notification lists
and SMTP configuration are **not** deployable artifacts — they are central-only
definitions managed by the Notification Service ... Notification delivery happens
on the central cluster, so **no notification artifact or SMTP credential is ever
distributed to sites**."*
- CLAUDE.md "External Integrations" decision: *"Notification delivery is
central-only ... Notification lists and SMTP config are no longer deployed to
sites; recipient resolution happens at central, at delivery time."*
`ArtifactDeploymentService` still does the opposite. `FetchGlobalArtifactsAsync`
queries `_notificationRepo.GetAllNotificationListsAsync` and
`GetAllSmtpConfigurationsAsync` (lines 150-151), maps them into
`NotificationListArtifact`s and `SmtpConfigurationArtifact`s — the SMTP artifact
carrying `smtp.Credentials` verbatim (line 188) — and `BuildDeployArtifactsCommandAsync`
places both into the `DeployArtifactsCommand` sent to **every site** (lines 128,
130). The site side persists them: `SiteReplicationActor` (lines 192-201) and
`DeploymentManagerActor` (lines 1383-1419) loop over `command.NotificationLists`
and `command.SmtpConfigurations` and write them to site SQLite via
`SiteNotificationRepository`.
This is the precise scenario the design says must never happen: SMTP credentials
travel across the inter-cluster transport and land on every site's SQLite. It
supersedes the framing of the now-closed DeploymentManager-013, which accepted
SMTP-as-deployable-artifact as a documented design decision — the design has since
flipped to forbid distribution entirely, so this is a fresh divergence, not the
same finding. The class-level XML doc (lines 13-22) is correspondingly stale: it
still advertises "notification lists ... and SMTP configurations" as artifacts the
service "broadcasts ... to all sites."
Secondary defect in the same mapping: `NotificationListArtifact` is built from
`nl.Recipients.Where(r => r.EmailAddress is not null)` (line 182), which silently
drops every SMS-only recipient (`PhoneNumber` set, `EmailAddress` null) of a
`NotificationType.Sms` list. Even if list distribution were intended, the SMS
recipient set would be lost — but since lists must not be distributed at all, this
is subsumed by the primary fix.
**Recommendation**
Stop fetching, mapping, and shipping notification lists and SMTP configurations
from the artifact deployment path. Drop the `_notificationRepo` queries from
`FetchGlobalArtifactsAsync`, pass `null` (or empty) for the `NotificationLists`
and `SmtpConfigurations` fields of `DeployArtifactsCommand`, and update the class
XML doc to remove both from the artifact list. The message fields can remain on
`DeployArtifactsCommand` for additive compatibility but must never be populated
from central. Coordinate removal of the consuming code in SiteRuntime
(`SiteReplicationActor`, `DeploymentManagerActor`, `SiteNotificationRepository`)
in the same session per the project's "design + code + tests travel together"
rule. Update the contradicting tests (see DeploymentManager-027).
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): central `FetchGlobalArtifactsAsync` no longer queries or ships notification lists / SMTP configs (passes null; `DeployArtifactsCommand` fields kept for contract compatibility), and the site purges any already-persisted notification_lists / smtp_configurations rows (clearing the plaintext SMTP password) on both apply paths — enforcing the central-only delivery design. Verified no site runtime/delivery path reads this config.
### DeploymentManager-026 — `DeployInstanceAsync` inserts a new deployment record every deploy; per-instance rows accumulate and the "current status" read has no tiebreaker
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.DeploymentManager/DeploymentService.cs:215-225`, `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/DeploymentManagerRepository.cs:55-61` |
**Description**
`DeployInstanceAsync` always creates a brand-new `DeploymentRecord` and calls
`_repository.AddDeploymentRecordAsync(record, …)` (line 223) — it never reuses or
updates the instance's existing record. There is no update-in-place or
delete-prior path on the deploy flow, so every successful or failed deployment of
an instance leaves its predecessor row behind. Over the life of a central process
— amplified by the bulk "deploy all out-of-date instances" workflow and by repeated
redeploys after timeouts — the `DeploymentRecords` table grows without bound, one
row per deploy attempt per instance.
This contradicts the design's "Deployment Status Persistence" section
(`Component-DeploymentManager.md:106-109`): *"Only the **current deployment
status** per instance is stored in the configuration database ... No deployment
history table — the audit log (via IAuditService) already captures every
deployment action."* The audit log is the history; the deployment-record table is
supposed to hold only the current status. The implementation instead keeps an
ad-hoc, unindexed history there.
The accumulation also makes the reconciliation read order-sensitive.
`TryReconcileWithSiteAsync` reads the "prior" record via
`GetCurrentDeploymentStatusAsync`, which is
`OrderByDescending(d => d.DeployedAt).FirstOrDefault()` with **no secondary sort
key** (e.g. `ThenByDescending(d => d.Id)`). `DeployedAt` is a `DateTimeOffset`
stamped with `DateTimeOffset.UtcNow` at record creation; two records inserted
within the same clock tick (rapid redeploy, or a redeploy immediately after a
timed-out attempt) tie on `DeployedAt`, and SQL Server's choice between equal
sort keys is undefined. Reconciliation could then read the *wrong* prior record
(e.g. an older Success instead of the latest stuck InProgress), skipping the
intended site query, or vice-versa.
**Recommendation**
Either (a) make the deploy path upsert the instance's single current record
(update-in-place when one exists, insert only on first deploy) so the table holds
exactly one row per instance per the design, or (b) if multiple rows are
deliberately retained, add a deterministic tiebreaker to
`GetCurrentDeploymentStatusAsync` (`OrderByDescending(d => d.DeployedAt)
.ThenByDescending(d => d.Id)`) and document the retention/cleanup story so the
table does not grow unbounded. Option (a) aligns with the design and is preferred.
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): added a deterministic `.ThenByDescending(d => d.Id)` tiebreaker to `GetCurrentDeploymentStatusAsync` so same-tick deployment records resolve to the newest row. Insert-per-deploy behaviour unchanged (history-vs-upsert remains a separate design question).
### DeploymentManager-027 — Artifact tests assert that notification lists and SMTP configs are shipped, cementing the DeploymentManager-025 design violation
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ZB.MOM.WW.ScadaBridge.DeploymentManager.Tests/ArtifactDeploymentServiceTests.cs:173-174,200-201` |
**Description**
`ArtifactDeploymentServiceTests` asserts, via NSubstitute, that the artifact
deployment path queries the forbidden artifact sets exactly once per deployment:
```
await _notificationRepo.Received(1).GetAllNotificationListsAsync(Arg.Any<CancellationToken>());
await _notificationRepo.Received(1).GetAllSmtpConfigurationsAsync(Arg.Any<CancellationToken>());
```
(both in the `DeployToAllSitesAsync` global-query-hoisting test at 173-174 and the
`RetryForSiteAsync` test at 200-201). These assertions pin the exact behaviour the
current design forbids (DeploymentManager-025): they will keep the service shipping
notification lists and SMTP configs to sites and will actively block the fix —
removing the queries makes these `Received(1)` assertions fail. Tests that lock in
a design violation are worse than no test, because they make the correct change
look like a regression.
**Recommendation**
When DeploymentManager-025 is fixed, change these to
`DidNotReceive().GetAllNotificationListsAsync(...)` /
`DidNotReceive().GetAllSmtpConfigurationsAsync(...)` (and assert the
`DeployArtifactsCommand`'s `NotificationLists` / `SmtpConfigurations` fields are
null/empty) so the tests enforce the central-only design instead of contradicting
it.
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): the artifact tests no longer assert the forbidden notification/SMTP shipping — flipped `Received(1)``DidNotReceive()` and added assertions that the shipped command's NotificationLists/SmtpConfigurations are null.