docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked
Full per-module re-review of the 16 stale modules (last seen1eb6e97/ 2026-05-28) plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD4307c381. 67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commitfd618cf1closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings (IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision. Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024), an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001), and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary is semantically sound (symbol-based) in the production cluster config. README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
@@ -5,9 +5,9 @@
|
||||
| Module | `src/ZB.MOM.WW.ScadaBridge.DeploymentManager` |
|
||||
| Design doc | `docs/requirements/Component-DeploymentManager.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Last reviewed | 2026-06-20 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Commit reviewed | `4307c381` |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Summary
|
||||
@@ -83,8 +83,51 @@ The remaining six findings are medium/low: lifecycle-timeout audit gap
|
||||
(DeploymentManager-023), and shared static state across `*ProbeActor` tests
|
||||
(DeploymentManager-024).
|
||||
|
||||
#### Re-review 2026-06-20 (commit `4307c381`) — full review
|
||||
|
||||
Re-reviewed the whole current module at HEAD after the rename, the
|
||||
cert-broadcast / Transport `IStaleInstanceProbe` work, and milestone changes.
|
||||
DeploymentManager-001..024 all remain `Resolved` and verified against source —
|
||||
the ref-counted `OperationLockManager`, the broadened/non-cancellable failure
|
||||
writes, the `ApplyPostSuccessSideEffectsAsync` shared helper with
|
||||
`forceEnabledState` (Disabled-preservation), the lifecycle-timeout audit helper,
|
||||
the structured diff with List-value normalization, the hoisted global artifact
|
||||
fetch, and the instance-state-aware reconciliation are all present and correct.
|
||||
The two flagged cross-module/architectural seams the prompt called out — the
|
||||
`TrustServerCert`/`RemoveServerCert` broadcast-to-both-nodes and the
|
||||
`DeploymentManagerActor` deploy-state query handler — live in **SiteRuntime /
|
||||
Communication / CentralUI**, not this module, so they are out of scope here.
|
||||
This review found **3 new findings**. The material one is DeploymentManager-025:
|
||||
the system-wide artifact path still fetches and broadcasts **notification lists
|
||||
and SMTP configurations (including SMTP credentials)** to every site, in direct
|
||||
contradiction of the now-explicit design decision that these are central-only
|
||||
and "no SMTP credential is ever distributed to sites" (Component-DeploymentManager.md
|
||||
lines 142-146; CLAUDE.md notification-central-only decision). This supersedes
|
||||
the earlier accepted-deployable-artifact framing of the closed
|
||||
DeploymentManager-013. DeploymentManager-026 (deployment records are insert-only
|
||||
— a new row per deploy accumulates per instance, contradicting "only current
|
||||
status stored, no history table", and the same-tick `OrderByDescending(DeployedAt)`
|
||||
read has no tiebreaker) and DeploymentManager-027 (artifact tests *assert* the
|
||||
forbidden notif/SMTP shipping, cementing the DeploymentManager-025 violation)
|
||||
are the remaining two.
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
#### Re-review 2026-06-20 (commit `4307c381`)
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ✓ | New: deployment records are insert-only — `DeployInstanceAsync` Adds a new row per deploy; reconciliation's `GetCurrentDeploymentStatusAsync` orders by `DeployedAt` with no tiebreaker (DeploymentManager-026). |
|
||||
| 2 | Akka.NET conventions | ✓ | Module remains a plain service layer; no actors. The deploy-state-query/cert-broadcast actors live in SiteRuntime, out of scope. No issues. |
|
||||
| 3 | Concurrency & thread safety | ✓ | `OperationLockManager` ref-counting + gate re-verified; `DeployToAllSitesAsync` prebuilds per-site commands before the parallel phase (no shared DbContext under `Task.WhenAll`). No issues. |
|
||||
| 4 | Error handling & resilience | ✓ | Failure-status writes use `CancellationToken.None`; lifecycle timeouts now audit; delete-orphan path surfaced. No new issues. |
|
||||
| 5 | Security | ✓ | New: SMTP credentials are still serialized into the per-site artifact command and broadcast to every site, which the current design forbids outright (DeploymentManager-025). |
|
||||
| 6 | Performance & resource management | ✓ | Global artifact queries hoisted (DM-023 resolved). Deployment-record row growth is unbounded per instance (part of DeploymentManager-026). |
|
||||
| 7 | Design-document adherence | ✓ | New: notification lists + SMTP configs are still treated as deployable artifacts, contradicting the "central-only, never distributed to sites" design (DeploymentManager-025). |
|
||||
| 8 | Code organization & conventions | ✓ | Options bound via Host; `OptionsSection` constant correct. No new issues. |
|
||||
| 9 | Testing coverage | ✓ | Broad and current. New: artifact tests assert the forbidden notif/SMTP shipping (DeploymentManager-027). |
|
||||
| 10 | Documentation & comments | ✓ | `ArtifactDeploymentService` class XML doc still lists notification lists + SMTP as broadcast artifacts (stale vs design — folded into DeploymentManager-025). |
|
||||
|
||||
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
@@ -1234,3 +1277,164 @@ recipient (an `IActorRef` to a TestKit probe), and assert via `ExpectMsg`
|
||||
in each test. Where the simpler counter shape is preferred, pass a
|
||||
shared-state object into the actor's constructor so each test owns its own
|
||||
instance — never reach for `static` mutable test state.
|
||||
|
||||
### DeploymentManager-025 — Notification lists and SMTP configurations (with credentials) are still broadcast to every site, contradicting the central-only design
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ZB.MOM.WW.ScadaBridge.DeploymentManager/ArtifactDeploymentService.cs:13-22,128,130,150-151,181-188` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design now states explicitly, in two authoritative places, that notification
|
||||
lists and SMTP configuration are **not** deployable artifacts:
|
||||
|
||||
- `docs/requirements/Component-DeploymentManager.md:142-146`: *"Notification lists
|
||||
and SMTP configuration are **not** deployable artifacts — they are central-only
|
||||
definitions managed by the Notification Service ... Notification delivery happens
|
||||
on the central cluster, so **no notification artifact or SMTP credential is ever
|
||||
distributed to sites**."*
|
||||
- CLAUDE.md "External Integrations" decision: *"Notification delivery is
|
||||
central-only ... Notification lists and SMTP config are no longer deployed to
|
||||
sites; recipient resolution happens at central, at delivery time."*
|
||||
|
||||
`ArtifactDeploymentService` still does the opposite. `FetchGlobalArtifactsAsync`
|
||||
queries `_notificationRepo.GetAllNotificationListsAsync` and
|
||||
`GetAllSmtpConfigurationsAsync` (lines 150-151), maps them into
|
||||
`NotificationListArtifact`s and `SmtpConfigurationArtifact`s — the SMTP artifact
|
||||
carrying `smtp.Credentials` verbatim (line 188) — and `BuildDeployArtifactsCommandAsync`
|
||||
places both into the `DeployArtifactsCommand` sent to **every site** (lines 128,
|
||||
130). The site side persists them: `SiteReplicationActor` (lines 192-201) and
|
||||
`DeploymentManagerActor` (lines 1383-1419) loop over `command.NotificationLists`
|
||||
and `command.SmtpConfigurations` and write them to site SQLite via
|
||||
`SiteNotificationRepository`.
|
||||
|
||||
This is the precise scenario the design says must never happen: SMTP credentials
|
||||
travel across the inter-cluster transport and land on every site's SQLite. It
|
||||
supersedes the framing of the now-closed DeploymentManager-013, which accepted
|
||||
SMTP-as-deployable-artifact as a documented design decision — the design has since
|
||||
flipped to forbid distribution entirely, so this is a fresh divergence, not the
|
||||
same finding. The class-level XML doc (lines 13-22) is correspondingly stale: it
|
||||
still advertises "notification lists ... and SMTP configurations" as artifacts the
|
||||
service "broadcasts ... to all sites."
|
||||
|
||||
Secondary defect in the same mapping: `NotificationListArtifact` is built from
|
||||
`nl.Recipients.Where(r => r.EmailAddress is not null)` (line 182), which silently
|
||||
drops every SMS-only recipient (`PhoneNumber` set, `EmailAddress` null) of a
|
||||
`NotificationType.Sms` list. Even if list distribution were intended, the SMS
|
||||
recipient set would be lost — but since lists must not be distributed at all, this
|
||||
is subsumed by the primary fix.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Stop fetching, mapping, and shipping notification lists and SMTP configurations
|
||||
from the artifact deployment path. Drop the `_notificationRepo` queries from
|
||||
`FetchGlobalArtifactsAsync`, pass `null` (or empty) for the `NotificationLists`
|
||||
and `SmtpConfigurations` fields of `DeployArtifactsCommand`, and update the class
|
||||
XML doc to remove both from the artifact list. The message fields can remain on
|
||||
`DeployArtifactsCommand` for additive compatibility but must never be populated
|
||||
from central. Coordinate removal of the consuming code in SiteRuntime
|
||||
(`SiteReplicationActor`, `DeploymentManagerActor`, `SiteNotificationRepository`)
|
||||
in the same session per the project's "design + code + tests travel together"
|
||||
rule. Update the contradicting tests (see DeploymentManager-027).
|
||||
|
||||
**Resolution**
|
||||
|
||||
Resolved 2026-06-20 (commit `fd618cf1`): central `FetchGlobalArtifactsAsync` no longer queries or ships notification lists / SMTP configs (passes null; `DeployArtifactsCommand` fields kept for contract compatibility), and the site purges any already-persisted notification_lists / smtp_configurations rows (clearing the plaintext SMTP password) on both apply paths — enforcing the central-only delivery design. Verified no site runtime/delivery path reads this config.
|
||||
|
||||
### DeploymentManager-026 — `DeployInstanceAsync` inserts a new deployment record every deploy; per-instance rows accumulate and the "current status" read has no tiebreaker
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ZB.MOM.WW.ScadaBridge.DeploymentManager/DeploymentService.cs:215-225`, `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/DeploymentManagerRepository.cs:55-61` |
|
||||
|
||||
**Description**
|
||||
|
||||
`DeployInstanceAsync` always creates a brand-new `DeploymentRecord` and calls
|
||||
`_repository.AddDeploymentRecordAsync(record, …)` (line 223) — it never reuses or
|
||||
updates the instance's existing record. There is no update-in-place or
|
||||
delete-prior path on the deploy flow, so every successful or failed deployment of
|
||||
an instance leaves its predecessor row behind. Over the life of a central process
|
||||
— amplified by the bulk "deploy all out-of-date instances" workflow and by repeated
|
||||
redeploys after timeouts — the `DeploymentRecords` table grows without bound, one
|
||||
row per deploy attempt per instance.
|
||||
|
||||
This contradicts the design's "Deployment Status Persistence" section
|
||||
(`Component-DeploymentManager.md:106-109`): *"Only the **current deployment
|
||||
status** per instance is stored in the configuration database ... No deployment
|
||||
history table — the audit log (via IAuditService) already captures every
|
||||
deployment action."* The audit log is the history; the deployment-record table is
|
||||
supposed to hold only the current status. The implementation instead keeps an
|
||||
ad-hoc, unindexed history there.
|
||||
|
||||
The accumulation also makes the reconciliation read order-sensitive.
|
||||
`TryReconcileWithSiteAsync` reads the "prior" record via
|
||||
`GetCurrentDeploymentStatusAsync`, which is
|
||||
`OrderByDescending(d => d.DeployedAt).FirstOrDefault()` with **no secondary sort
|
||||
key** (e.g. `ThenByDescending(d => d.Id)`). `DeployedAt` is a `DateTimeOffset`
|
||||
stamped with `DateTimeOffset.UtcNow` at record creation; two records inserted
|
||||
within the same clock tick (rapid redeploy, or a redeploy immediately after a
|
||||
timed-out attempt) tie on `DeployedAt`, and SQL Server's choice between equal
|
||||
sort keys is undefined. Reconciliation could then read the *wrong* prior record
|
||||
(e.g. an older Success instead of the latest stuck InProgress), skipping the
|
||||
intended site query, or vice-versa.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either (a) make the deploy path upsert the instance's single current record
|
||||
(update-in-place when one exists, insert only on first deploy) so the table holds
|
||||
exactly one row per instance per the design, or (b) if multiple rows are
|
||||
deliberately retained, add a deterministic tiebreaker to
|
||||
`GetCurrentDeploymentStatusAsync` (`OrderByDescending(d => d.DeployedAt)
|
||||
.ThenByDescending(d => d.Id)`) and document the retention/cleanup story so the
|
||||
table does not grow unbounded. Option (a) aligns with the design and is preferred.
|
||||
|
||||
**Resolution**
|
||||
|
||||
Resolved 2026-06-20 (commit `fd618cf1`): added a deterministic `.ThenByDescending(d => d.Id)` tiebreaker to `GetCurrentDeploymentStatusAsync` so same-tick deployment records resolve to the newest row. Insert-per-deploy behaviour unchanged (history-vs-upsert remains a separate design question).
|
||||
|
||||
### DeploymentManager-027 — Artifact tests assert that notification lists and SMTP configs are shipped, cementing the DeploymentManager-025 design violation
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Status | Resolved |
|
||||
| Location | `tests/ZB.MOM.WW.ScadaBridge.DeploymentManager.Tests/ArtifactDeploymentServiceTests.cs:173-174,200-201` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ArtifactDeploymentServiceTests` asserts, via NSubstitute, that the artifact
|
||||
deployment path queries the forbidden artifact sets exactly once per deployment:
|
||||
|
||||
```
|
||||
await _notificationRepo.Received(1).GetAllNotificationListsAsync(Arg.Any<CancellationToken>());
|
||||
await _notificationRepo.Received(1).GetAllSmtpConfigurationsAsync(Arg.Any<CancellationToken>());
|
||||
```
|
||||
|
||||
(both in the `DeployToAllSitesAsync` global-query-hoisting test at 173-174 and the
|
||||
`RetryForSiteAsync` test at 200-201). These assertions pin the exact behaviour the
|
||||
current design forbids (DeploymentManager-025): they will keep the service shipping
|
||||
notification lists and SMTP configs to sites and will actively block the fix —
|
||||
removing the queries makes these `Received(1)` assertions fail. Tests that lock in
|
||||
a design violation are worse than no test, because they make the correct change
|
||||
look like a regression.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
When DeploymentManager-025 is fixed, change these to
|
||||
`DidNotReceive().GetAllNotificationListsAsync(...)` /
|
||||
`DidNotReceive().GetAllSmtpConfigurationsAsync(...)` (and assert the
|
||||
`DeployArtifactsCommand`'s `NotificationLists` / `SmtpConfigurations` fields are
|
||||
null/empty) so the tests enforce the central-only design instead of contradicting
|
||||
it.
|
||||
|
||||
**Resolution**
|
||||
|
||||
Resolved 2026-06-20 (commit `fd618cf1`): the artifact tests no longer assert the forbidden notification/SMTP shipping — flipped `Received(1)` → `DidNotReceive()` and added assertions that the shipped command's NotificationLists/SmtpConfigurations are null.
|
||||
|
||||
Reference in New Issue
Block a user