docs(code-reviews): re-review batch 2 at 39d737e — ConfigurationDatabase, DataConnectionLayer, DeploymentManager, ExternalSystemGateway, HealthMonitoring
17 new findings: ConfigurationDatabase-012..014, DataConnectionLayer-014..017, DeploymentManager-015..017, ExternalSystemGateway-015..017, HealthMonitoring-013..016.
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ScadaLink.DeploymentManager` |
|
||||
| Design doc | `docs/requirements/Component-DeploymentManager.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 0 |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 3 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -30,20 +30,43 @@ detail. Configuration is not bound to `appsettings.json`, leaving one option
|
||||
entirely dead. Test coverage stops at the communication boundary and never
|
||||
exercises a successful deployment or the lifecycle success paths.
|
||||
|
||||
#### Re-review 2026-05-17 (commit `39d737e`)
|
||||
|
||||
Re-reviewed at commit `39d737e` after the batch of fixes for
|
||||
DeploymentManager-001..014. All fourteen prior findings remain `Resolved` and
|
||||
verified against source — the broadened catch, non-cancellable cleanup writes,
|
||||
ref-counted `OperationLockManager`, query-before-redeploy reconciliation,
|
||||
structured diff, options binding, and the expanded TestKit-actor test suite are
|
||||
all present and correct. The module is in markedly better shape than the
|
||||
first review: error paths are now defensively handled and test coverage is
|
||||
broad (successful deploy/lifecycle, lock serialization, reconciliation
|
||||
matrix, artifact per-site matrix).
|
||||
|
||||
This re-review found **3 new findings**, all clustered on the
|
||||
DeploymentManager-006 reconciliation path added since the last review. The
|
||||
reconciliation shortcut (`TryReconcileWithSiteAsync`) marks a stale prior
|
||||
record `Success` when the site already has the target revision, but it does
|
||||
**not** perform the side effects the normal success path does — it never
|
||||
updates the instance `State`, never refreshes the `DeployedConfigSnapshot`,
|
||||
and never corrects the prior record's own `RevisionHash` (DeploymentManager-015,
|
||||
DeploymentManager-016). The `GetDeploymentStatusAsync` XML doc is now stale —
|
||||
it still describes the query-before-redeploy behaviour that actually moved into
|
||||
`TryReconcileWithSiteAsync` (DeploymentManager-017).
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ✓ | Stuck `InProgress` record on unexpected exception; cancelled-token failure write. |
|
||||
| 1 | Correctness & logic bugs | ✓ | Re-review 2026-05-17: reconciliation skips instance-state/snapshot updates (DeploymentManager-015) and keeps a stale `RevisionHash` (DeploymentManager-016). Prior: stuck `InProgress` / cancelled-token write (resolved). |
|
||||
| 2 | Akka.NET conventions | ✓ | Module is a plain service layer; it calls `CommunicationService` which wraps Ask. No actors here. No issues. |
|
||||
| 3 | Concurrency & thread safety | ✓ | `OperationLockManager` is sound but leaks semaphores; `DeployToAllSitesAsync` correctly builds commands sequentially before parallel send. |
|
||||
| 4 | Error handling & resilience | ✓ | Several gaps — see DeploymentManager-001/002/003/004. |
|
||||
| 5 | Security | ✓ | SMTP credentials are serialized and broadcast to sites — see DeploymentManager-013. No injection vectors; no authz here (enforced upstream). |
|
||||
| 6 | Performance & resource management | ✓ | Semaphore leak (DeploymentManager-005); artifact rebuild does N+1 method queries per external system. |
|
||||
| 7 | Design-document adherence | ✓ | Missing query-before-redeploy (DeploymentManager-006); Diff View not implemented (DeploymentManager-007). |
|
||||
| 8 | Code organization & conventions | ✓ | Options class not bound to configuration — DeploymentManager-008. POCO/repo placement correct. |
|
||||
| 9 | Testing coverage | ✓ | No successful-deploy test, no lifecycle success test — DeploymentManager-011; dead `CreateCommand` helper — DeploymentManager-014. |
|
||||
| 10 | Documentation & comments | ✓ | Misleading timeout comment — DeploymentManager-009; stale option XML doc — DeploymentManager-012. |
|
||||
| 3 | Concurrency & thread safety | ✓ | `OperationLockManager` ref-counts and reclaims semaphores; `DeployToAllSitesAsync` correctly builds commands sequentially before parallel send. No issues at re-review. |
|
||||
| 4 | Error handling & resilience | ✓ | Prior gaps DeploymentManager-001/002/003/004 resolved and verified. No new issues. |
|
||||
| 5 | Security | ✓ | SMTP credential handling documented as an accepted design decision (DeploymentManager-013). No injection vectors; no authz here (enforced upstream). No new issues. |
|
||||
| 6 | Performance & resource management | ✓ | Semaphore leak resolved (DeploymentManager-005). No new issues. |
|
||||
| 7 | Design-document adherence | ✓ | Query-before-redeploy and Diff View implemented (DeploymentManager-006/007). Re-review: reconciliation path breaks the deployed-snapshot/instance-state invariants — see DeploymentManager-015. |
|
||||
| 8 | Code organization & conventions | ✓ | Options binding resolved (DeploymentManager-008). POCO/repo placement correct. No new issues. |
|
||||
| 9 | Testing coverage | ✓ | Broad coverage added (success, lifecycle, lock serialization, reconciliation, artifact matrix). Re-review: reconciled-success path's missing side effects (DeploymentManager-015) are untested. |
|
||||
| 10 | Documentation & comments | ✓ | Prior comment findings resolved. Re-review: `GetDeploymentStatusAsync` XML doc is now stale — DeploymentManager-017. |
|
||||
|
||||
## Findings
|
||||
|
||||
@@ -710,3 +733,126 @@ the communication boundary. New tests:
|
||||
`DeployToAllSitesAsync_AllPerSiteCommandsShareTheSummaryDeploymentId` (also
|
||||
covers DeploymentManager-010), `DeployToAllSitesAsync_PartialFailure_ReportsPerSiteMatrix`
|
||||
(per-site success/failure matrix), `RetryForSiteAsync_SiteSucceeds_ReturnsSuccessAndAudits`.
|
||||
|
||||
### DeploymentManager-015 — Site-query reconciliation marks a deployment `Success` but skips instance-state and snapshot updates
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:631-655` |
|
||||
|
||||
**Description**
|
||||
|
||||
`TryReconcileWithSiteAsync` (the DeploymentManager-006 query-before-redeploy
|
||||
path) handles the case where a prior `InProgress`/timeout-`Failed` record exists
|
||||
and the site reports it already has the target revision hash. In that case it
|
||||
marks the prior `DeploymentRecord` `Success`, audit-logs `DeployReconciled`, and
|
||||
returns it — the caller then returns `Result.Success` and **never enters the
|
||||
normal deploy body**.
|
||||
|
||||
The normal success path (`DeployInstanceAsync.cs:215-223`) does three things on
|
||||
a successful site response: writes the deployment record terminal status, sets
|
||||
`instance.State = InstanceState.Enabled` + `UpdateInstanceAsync`, and calls
|
||||
`StoreDeployedSnapshotAsync`. The reconciliation shortcut performs only the
|
||||
first. Consequently, after a reconciled deployment:
|
||||
|
||||
- The instance `State` is left at whatever it was (e.g. `NotDeployed` for a
|
||||
first-time deploy that timed out, or `Disabled`) even though the site is
|
||||
actually running the configuration — the central state machine and the site
|
||||
diverge, and a subsequent `DisableInstanceAsync`/`EnableInstanceAsync` will be
|
||||
rejected or allowed incorrectly by `StateTransitionValidator`.
|
||||
- No `DeployedConfigSnapshot` is created or refreshed. A first-time deploy that
|
||||
is resolved purely by reconciliation leaves `GetDeploymentComparisonAsync`
|
||||
permanently returning `"No deployed snapshot found for this instance."`, and a
|
||||
redeploy reconciliation leaves the stored snapshot showing the *old* config
|
||||
even though the deployment record claims `Success` for the new revision.
|
||||
|
||||
The design ("Deployed vs. Template-Derived State", WP-4/WP-8) requires the
|
||||
deployed snapshot and instance state to reflect the last successful deployment;
|
||||
the reconciliation path silently breaks both invariants.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In the reconciled-success branch of `TryReconcileWithSiteAsync`, perform the
|
||||
same post-success side effects as the normal path: set `instance.State =
|
||||
InstanceState.Enabled` (+ `UpdateInstanceAsync`) and call
|
||||
`StoreDeployedSnapshotAsync` with the target deployment ID / revision hash /
|
||||
config JSON. Factor the shared post-success logic into one helper so the normal
|
||||
and reconciliation paths cannot drift. Add a regression test asserting that a
|
||||
reconciled deployment leaves the instance `Enabled` and a snapshot stored.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### DeploymentManager-016 — Reconciled prior record keeps its stale `RevisionHash`
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:639-651` |
|
||||
|
||||
**Description**
|
||||
|
||||
When `TryReconcileWithSiteAsync` reconciles a prior record, it mutates
|
||||
`prior.Status`, `prior.ErrorMessage`, and `prior.CompletedAt`, but **not**
|
||||
`prior.RevisionHash`. The reconciliation condition only compares the *site's*
|
||||
`AppliedRevisionHash` against the *freshly-flattened* `targetRevisionHash` — it
|
||||
does not require `prior.RevisionHash` to equal either of them.
|
||||
|
||||
The prior record can legitimately carry a different revision hash than the
|
||||
current target: e.g. a deploy timed out at revision `R1`, the template was then
|
||||
edited so the current flatten yields `R2`, and meanwhile the site actually
|
||||
applied `R2` through some other path (or `R1` and `R2` are equal-by-content but
|
||||
the prior record predates a hash recompute). After reconciliation the record's
|
||||
`Status` is `Success` but its `RevisionHash` still says `R1`, so staleness
|
||||
checks and any UI that reads `DeploymentRecord.RevisionHash` will report the
|
||||
instance as deployed at the wrong revision. The audit `DeployReconciled` entry
|
||||
records `RevisionHash = targetRevisionHash`, contradicting the persisted record.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In the reconciled-success branch, also set `prior.RevisionHash =
|
||||
targetRevisionHash` so the persisted record, the audit entry, and the site's
|
||||
actual applied revision all agree. Alternatively, only reconcile when
|
||||
`prior.RevisionHash == targetRevisionHash` and otherwise fall through to a
|
||||
normal deploy.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### DeploymentManager-017 — `GetDeploymentStatusAsync` XML doc describes behaviour it does not implement
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:562-570` |
|
||||
|
||||
**Description**
|
||||
|
||||
The XML summary on `GetDeploymentStatusAsync` reads: *"WP-2: After
|
||||
failover/timeout, query site for current deployment state before
|
||||
re-deploying."* The method body does no such thing — it is a one-line
|
||||
pass-through to `_repository.GetDeploymentByDeploymentIdAsync`, a pure local DB
|
||||
read. The query-the-site-before-redeploy behaviour the comment describes was
|
||||
implemented separately in `TryReconcileWithSiteAsync` (DeploymentManager-006).
|
||||
The stale comment is a leftover of the original design intent and misleads a
|
||||
reader into thinking this method contacts the site.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Reword the summary to describe what the method actually does — "returns the
|
||||
current persisted `DeploymentRecord` for the given deployment ID from the
|
||||
configuration database" — and, if useful, cross-reference
|
||||
`TryReconcileWithSiteAsync` as the place the site-query reconciliation lives.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
Reference in New Issue
Block a user