code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.
regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ScadaLink.DeploymentManager` |
|
||||
| Design doc | `docs/requirements/Component-DeploymentManager.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 0 |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -53,20 +53,52 @@ DeploymentManager-016). The `GetDeploymentStatusAsync` XML doc is now stale —
|
||||
it still describes the query-before-redeploy behaviour that actually moved into
|
||||
`TryReconcileWithSiteAsync` (DeploymentManager-017).
|
||||
|
||||
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
||||
|
||||
Re-reviewed at commit `1eb6e97` after the DeploymentManager-015/016/017 fixes
|
||||
and a docs-only XML-comment pass. The three prior findings remain `Resolved`
|
||||
and verified — `ApplyPostSuccessSideEffectsAsync` is now invoked from both the
|
||||
normal success path and `TryReconcileWithSiteAsync`, the reconciled-success
|
||||
branch corrects `prior.RevisionHash` to the target, and `GetDeploymentStatusAsync`'s
|
||||
XML doc now describes the local-DB-read it actually performs and cross-refs the
|
||||
reconciliation helper. The DiffService wiring, options binding, ref-counted
|
||||
operation lock, broadened catch, non-cancellable cleanup, and TestKit-actor
|
||||
test seam are still in place. The 7 new findings here are not regressions in
|
||||
the DeploymentManager-015/016 fixes — they are issues uncovered by widening
|
||||
the lens to the lifecycle paths, reconciliation's interaction with
|
||||
intentional `Disabled` state, audit semantics, and operational concerns
|
||||
(per-site artifact-build cost, Pending→InProgress double-write).
|
||||
|
||||
The single notable correctness issue is DeploymentManager-018: the
|
||||
reconciliation shortcut unconditionally sets `instance.State = Enabled` via
|
||||
`ApplyPostSuccessSideEffectsAsync`. After a central failover that loses the
|
||||
in-memory operation lock, a user can legitimately `Disable` an instance whose
|
||||
prior deploy record is still `InProgress`; a subsequent redeploy then reconciles
|
||||
and silently re-enables the instance against the user's explicit intent.
|
||||
The remaining six findings are medium/low: lifecycle-timeout audit gap
|
||||
(DeploymentManager-019), audit-user attribution in reconciliation
|
||||
(DeploymentManager-020), silent fallback in `ResolveSiteIdentifierAsync`
|
||||
(DeploymentManager-021), back-to-back `Pending`→`InProgress` writes
|
||||
(DeploymentManager-022), per-site re-query of system-wide artifacts
|
||||
(DeploymentManager-023), and shared static state across `*ProbeActor` tests
|
||||
(DeploymentManager-024).
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ✓ | Re-review 2026-05-17: reconciliation skips instance-state/snapshot updates (DeploymentManager-015) and keeps a stale `RevisionHash` (DeploymentManager-016). Prior: stuck `InProgress` / cancelled-token write (resolved). |
|
||||
| 2 | Akka.NET conventions | ✓ | Module is a plain service layer; it calls `CommunicationService` which wraps Ask. No actors here. No issues. |
|
||||
| 3 | Concurrency & thread safety | ✓ | `OperationLockManager` ref-counts and reclaims semaphores; `DeployToAllSitesAsync` correctly builds commands sequentially before parallel send. No issues at re-review. |
|
||||
| 4 | Error handling & resilience | ✓ | Prior gaps DeploymentManager-001/002/003/004 resolved and verified. No new issues. |
|
||||
| 5 | Security | ✓ | SMTP credential handling documented as an accepted design decision (DeploymentManager-013). No injection vectors; no authz here (enforced upstream). No new issues. |
|
||||
| 6 | Performance & resource management | ✓ | Semaphore leak resolved (DeploymentManager-005). No new issues. |
|
||||
| 7 | Design-document adherence | ✓ | Query-before-redeploy and Diff View implemented (DeploymentManager-006/007). Re-review: reconciliation path breaks the deployed-snapshot/instance-state invariants — see DeploymentManager-015. |
|
||||
| 8 | Code organization & conventions | ✓ | Options binding resolved (DeploymentManager-008). POCO/repo placement correct. No new issues. |
|
||||
| 9 | Testing coverage | ✓ | Broad coverage added (success, lifecycle, lock serialization, reconciliation, artifact matrix). Re-review: reconciled-success path's missing side effects (DeploymentManager-015) are untested. |
|
||||
| 10 | Documentation & comments | ✓ | Prior comment findings resolved. Re-review: `GetDeploymentStatusAsync` XML doc is now stale — DeploymentManager-017. |
|
||||
| 1 | Correctness & logic bugs | ✓ | New: reconciliation forces `Enabled` even if the user disabled the instance in between (DeploymentManager-018). |
|
||||
| 2 | Akka.NET conventions | ✓ | Module remains a plain service layer; no actors. No issues. |
|
||||
| 3 | Concurrency & thread safety | ✓ | `OperationLockManager` ref-counting verified. Note: test probes hold static state (DeploymentManager-024) — a test concern, not production code. |
|
||||
| 4 | Error handling & resilience | ✓ | New: Disable/Enable/Delete timeouts return early without writing any audit entry — deploy has `DeployFailed`, lifecycle has nothing (DeploymentManager-019). |
|
||||
| 5 | Security | ✓ | No new issues. SMTP credential decision documented (DeploymentManager-013 closed). |
|
||||
| 6 | Performance & resource management | ✓ | New: `BuildDeployArtifactsCommandAsync` re-queries every system-wide artifact set per site in `DeployToAllSitesAsync` (DeploymentManager-023). |
|
||||
| 7 | Design-document adherence | ✓ | Reconciliation now performs post-success side effects (DeploymentManager-015 resolved). DeploymentManager-018 surfaces a new gap on `Disabled`-state preservation. |
|
||||
| 8 | Code organization & conventions | ✓ | New: redundant `Pending`→`InProgress` back-to-back write with no intervening work (DeploymentManager-022). Silent string-fallback in `ResolveSiteIdentifierAsync` (DeploymentManager-021). |
|
||||
| 9 | Testing coverage | ✓ | New: no coverage for the reconciliation-overwrites-Disabled case (part of DeploymentManager-018); test probes share static state across tests (DeploymentManager-024). |
|
||||
| 10 | Documentation & comments | ✓ | New: `DeployReconciled` audit uses `prior.DeployedBy` instead of the current `user` parameter — misleading for forensics (DeploymentManager-020). |
|
||||
|
||||
## Findings
|
||||
|
||||
@@ -873,3 +905,293 @@ database as a pure local read, and cross-references `TryReconcileWithSiteAsync`
|
||||
as where the query-the-site-before-redeploy reconciliation actually lives.
|
||||
Documentation-only change; no regression test (a test asserting comment text
|
||||
would be meaningless).
|
||||
|
||||
### DeploymentManager-018 — Reconciliation force-sets `Enabled`, overwriting an intentional `Disabled` after central failover
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:675-682,721-748` |
|
||||
|
||||
**Description**
|
||||
|
||||
`TryReconcileWithSiteAsync` calls `ApplyPostSuccessSideEffectsAsync` whenever
|
||||
the site reports it has the target revision hash, and that helper
|
||||
unconditionally writes `instance.State = InstanceState.Enabled`. The
|
||||
reconciliation shortcut only runs when the prior `DeploymentRecord` is
|
||||
`InProgress` or timeout-`Failed` — exactly the scenarios that survive a central
|
||||
failover (the in-memory `OperationLockManager` is lost on failover, by design:
|
||||
*"Lost on central failover (acceptable per design — in-progress treated as
|
||||
failed)"*).
|
||||
|
||||
After such a failover, the per-instance operation lock is gone but the
|
||||
deployment record is still `InProgress` in the DB. A user can legitimately
|
||||
issue `DisableInstanceAsync` for the same instance — there is nothing in
|
||||
`DisableInstanceAsync` that consults the deployment record, only the
|
||||
`StateTransitionValidator` over `Instance.State`. If the state is `Enabled`
|
||||
(the typical case when the deploy started), the disable proceeds, the site
|
||||
honours it (the design states a disabled instance retains its deployed
|
||||
configuration), and central now persists `Instance.State = Disabled`. The
|
||||
deployment-record row remains `InProgress` (no one transitioned it). Later the
|
||||
user retries the deploy: `TryReconcileWithSiteAsync` runs, the site still has
|
||||
the target revision hash (Disable doesn't change the deployed config), the
|
||||
prior record is marked `Success`, and `ApplyPostSuccessSideEffectsAsync` writes
|
||||
`Instance.State = Enabled` — silently overriding the user's explicit Disable.
|
||||
|
||||
The same trap exists for any direct DB edit / migration that flipped the state
|
||||
between the timed-out deploy and the redeploy. The normal deploy path can
|
||||
defensibly assume `Enabled` after a fresh successful apply, but the
|
||||
reconciliation path is reconciling *prior* state with *prior* user intent; it
|
||||
should preserve `Disabled` if that is the current `Instance.State` at the time
|
||||
of reconciliation, mirroring the design's separation between deploy (config
|
||||
apply) and disable (subscription/script lifecycle).
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In the reconciliation branch, do not force `Enabled`. Either:
|
||||
- Pass a flag/parameter to `ApplyPostSuccessSideEffectsAsync` telling it
|
||||
whether to touch state, and skip the state write on the reconciliation path
|
||||
(leaving the current `Instance.State` intact, which is already `Enabled`
|
||||
for a fresh deploy that timed out and `Disabled` for the user-disabled
|
||||
follow-up case); or
|
||||
- Only set `Enabled` when the current `Instance.State` is `NotDeployed` (i.e.
|
||||
the first-deploy timed-out case), and leave existing `Enabled`/`Disabled`
|
||||
alone.
|
||||
|
||||
Add a regression test where an instance with `Instance.State = Disabled` and a
|
||||
prior `InProgress` deployment record is reconciled — the resulting
|
||||
`Instance.State` must remain `Disabled`, and the deployment record must still
|
||||
be marked `Success`.
|
||||
|
||||
### DeploymentManager-019 — Lifecycle command timeout writes no audit entry
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:328-339,385-396,445-458` |
|
||||
|
||||
**Description**
|
||||
|
||||
`DisableInstanceAsync`, `EnableInstanceAsync`, and `DeleteInstanceAsync` each
|
||||
wrap the `CommunicationService` call in a linked CTS with
|
||||
`LifecycleCommandTimeout` (DeploymentManager-012). On timeout they log a
|
||||
warning and `return Result<...>.Failure(...)` — and skip the
|
||||
`_auditService.LogAsync` call entirely. As a result, an operator-initiated
|
||||
disable/enable/delete that times out at the site leaves **no audit trail**:
|
||||
the user, the timestamp, the command id, and the failure mode are not
|
||||
recorded in the audit log. The deploy path goes out of its way to write a
|
||||
`DeployFailed` audit entry on the same failure mode
|
||||
(`DeploymentService.cs:274-276`), with `CancellationToken.None` so the write is
|
||||
durable; the lifecycle commands do not.
|
||||
|
||||
The design lists audit logging as a Deployment Manager responsibility for "all
|
||||
deployment actions, system-wide artifact deployments, and instance lifecycle
|
||||
changes" — a timed-out lifecycle command **is** an attempted lifecycle change,
|
||||
and the operator action is exactly the kind of event the audit log exists to
|
||||
record.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In each of the three `catch (Exception ex) when (ex is TimeoutException or
|
||||
OperationCanceledException)` blocks, write a `DisableTimeout`/`EnableTimeout`/
|
||||
`DeleteTimeout` (or use the existing operation name with a failure flag)
|
||||
audit entry with `CancellationToken.None` so a cancelled outer token does not
|
||||
prevent the audit write, mirroring `DeployFailed`. Add a unit test asserting
|
||||
that `DisableInstanceAsync_SiteUnresponsive_LifecycleCommandTimeoutBoundsTheWait`
|
||||
also produces an audit entry.
|
||||
|
||||
### DeploymentManager-020 — `DeployReconciled` audit attributes the action to the prior deployer, not the current user
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:683-686` |
|
||||
|
||||
**Description**
|
||||
|
||||
In `TryReconcileWithSiteAsync` the audit call is:
|
||||
|
||||
```
|
||||
await _auditService.LogAsync(prior.DeployedBy, "DeployReconciled", ...)
|
||||
```
|
||||
|
||||
`prior.DeployedBy` is the user who issued the original (timed-out / stuck)
|
||||
deployment, not the `user` parameter passed into `DeployInstanceAsync`. The
|
||||
current user — the one who triggered the redeploy that produced the
|
||||
reconciliation — is dropped on the floor. For audit forensics this is
|
||||
misleading: the row will read "user A reconciled their own deployment"
|
||||
when in fact user B initiated the action that reconciled it.
|
||||
|
||||
The original deployer is interesting context, but it should be carried in the
|
||||
audit-detail object (where `DeploymentId` and `RevisionHash` already live), not
|
||||
substituted for the actor.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Use `user` (the parameter on `DeployInstanceAsync`, threaded through
|
||||
`TryReconcileWithSiteAsync`) as the audit actor, and include
|
||||
`OriginalDeployer = prior.DeployedBy` in the detail object so the original
|
||||
attribution is preserved without misrepresenting who took the action.
|
||||
|
||||
### DeploymentManager-021 — `ResolveSiteIdentifierAsync` silently substitutes the DB id when the site row is missing
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:107-111` |
|
||||
|
||||
**Description**
|
||||
|
||||
```
|
||||
private async Task<string> ResolveSiteIdentifierAsync(int siteId, CancellationToken cancellationToken)
|
||||
{
|
||||
var site = await _siteRepository.GetSiteByIdAsync(siteId, cancellationToken);
|
||||
return site?.SiteIdentifier ?? siteId.ToString();
|
||||
}
|
||||
```
|
||||
|
||||
If the `Site` row is missing (FK was deleted, race with admin delete, DB
|
||||
inconsistency), the method silently returns the numeric DB id rendered as a
|
||||
string. This is then passed to `CommunicationService.{Deploy,Disable,Enable,
|
||||
Delete}InstanceAsync` and `QueryDeploymentStateAsync` as if it were a real
|
||||
`SiteIdentifier` (e.g. "site-a"). The communication layer will fail with an
|
||||
"unknown site" or routing error, producing a confusing diagnostic that hides
|
||||
the actual problem (no site row).
|
||||
|
||||
This is a defensive concern, but every mutating operation in the module goes
|
||||
through this method, so a stale instance whose site was deleted will produce a
|
||||
misleading error every time it is touched.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Treat a missing site as a hard validation failure: return a
|
||||
`Result.Failure($"Site with ID {siteId} not found")` early from the calling
|
||||
operations, instead of fabricating an identifier. The repository already
|
||||
returns `Site?`, so the null path is type-visible; just don't paper over it.
|
||||
|
||||
### DeploymentManager-022 — `Pending` and `InProgress` are written back-to-back with no intervening work
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:178-194` |
|
||||
|
||||
**Description**
|
||||
|
||||
`DeployInstanceAsync` does:
|
||||
|
||||
```
|
||||
record.Status = Pending;
|
||||
AddDeploymentRecordAsync(record); SaveChangesAsync(); NotifyStatusChange(record);
|
||||
record.Status = InProgress;
|
||||
UpdateDeploymentRecordAsync(record); SaveChangesAsync(); NotifyStatusChange(record);
|
||||
```
|
||||
|
||||
There is no work between the two writes — flattening, validation, and
|
||||
reconciliation have already completed by line 174. The deploy command is sent
|
||||
immediately after the `InProgress` write. The `Pending` write therefore costs:
|
||||
an extra `SaveChangesAsync` round-trip, an extra `IDeploymentStatusNotifier`
|
||||
invocation (which the CentralUI-006 page renders, so the user briefly sees a
|
||||
`Pending` flicker before `InProgress`), and an extra row-version bump if EF
|
||||
optimistic concurrency is enabled on the table.
|
||||
|
||||
The design uses `Pending` to mean "queued, not yet sent" and `InProgress` to
|
||||
mean "sent to site, awaiting response". The code's `Pending` slot has no
|
||||
queuing — it is set and immediately overwritten — so the state buys nothing
|
||||
operationally.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either:
|
||||
- Drop the `Pending` write entirely and create the record directly in
|
||||
`InProgress` (one row insert, one notification, simpler UI); or
|
||||
- Move the `Pending`→`InProgress` transition to bracket actual queueing/work
|
||||
(e.g. set `Pending` *before* flattening + reconciliation, set `InProgress`
|
||||
immediately before `DeployInstanceAsync` on the comm service) so the two
|
||||
states carry distinguishable semantics worth a separate write.
|
||||
|
||||
### DeploymentManager-023 — `BuildDeployArtifactsCommandAsync` re-queries system-wide artifacts once per site
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.DeploymentManager/ArtifactDeploymentService.cs:82-144,169-173` |
|
||||
|
||||
**Description**
|
||||
|
||||
`DeployToAllSitesAsync` loops over sites and calls
|
||||
`BuildDeployArtifactsCommandAsync(site.Id, ...)` for each one. Of the six
|
||||
artifact sets the method gathers, **only** `dataConnections` is per-site:
|
||||
|
||||
- `_templateRepo.GetAllSharedScriptsAsync` — global.
|
||||
- `_externalSystemRepo.GetAllExternalSystemsAsync` — global, plus
|
||||
`GetMethodsByExternalSystemIdAsync` per external system per site.
|
||||
- `_externalSystemRepo.GetAllDatabaseConnectionsAsync` — global.
|
||||
- `_notificationRepo.GetAllNotificationListsAsync` — global.
|
||||
- `_notificationRepo.GetAllSmtpConfigurationsAsync` — global.
|
||||
- `_siteRepo.GetDataConnectionsBySiteIdAsync(siteId, ...)` — **per-site**.
|
||||
|
||||
With N sites this issues ≈ 5·N redundant queries on the global sets (plus
|
||||
M·N method queries, where M is the external-system count). On a hub-and-spoke
|
||||
deployment with many sites the artifact-deploy path is noticeably slower than
|
||||
necessary and pins DbContext usage longer than needed. Per CLAUDE.md, the
|
||||
DbContext is not thread-safe and the per-site commands are already built
|
||||
sequentially (good); the redundant queries are sequential too, but the
|
||||
network/round-trip cost is real.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Hoist the global queries (shared scripts, external systems + their methods,
|
||||
DB connections, notification lists, SMTP configurations) out of
|
||||
`BuildDeployArtifactsCommandAsync`, fetch them once in `DeployToAllSitesAsync`,
|
||||
and pass them in alongside the site id (or expose a
|
||||
`BuildDeployArtifactsCommandAsync(siteId, prefetchedGlobals)` overload).
|
||||
`RetryForSiteAsync` (the single-site path) can keep the convenience-overload
|
||||
behaviour. Add a test using NSubstitute's `.Received()` to assert
|
||||
`_templateRepo.GetAllSharedScriptsAsync` is called exactly once for an
|
||||
N-site deployment.
|
||||
|
||||
### DeploymentManager-024 — Test probe actors hold mutable static state across tests
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Location | `tests/ScadaLink.DeploymentManager.Tests/DeploymentServiceTests.cs:966-1075`, `tests/ScadaLink.DeploymentManager.Tests/ArtifactDeploymentServiceTests.cs:196-217` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ReconcileProbeActor.QueryCount` / `DeployCount`, `SerializationProbeActor.MaxConcurrent`
|
||||
/ `_current`, and `ArtifactProbeActor.Received` are all `static` fields.
|
||||
Each test's actor constructor resets them — but reset-on-construction only
|
||||
works as long as no two tests in the same class run concurrently. xUnit's
|
||||
default parallelism disables intra-class parallelism, so today's tests pass;
|
||||
flip the assembly-level `[CollectionBehavior(DisableTestParallelization = true)]`
|
||||
or move to xUnit v3 (which enables intra-class parallelism by default) and the
|
||||
counters race — a deploy in test A could increment `DeployCount` while test B
|
||||
is asserting on it.
|
||||
|
||||
Static state shared across tests is also why a flaky-test investigation here
|
||||
will be unusually painful: the offending interaction is invisible from any
|
||||
single test file.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Replace the static counters with instance state, hand the actor a probe
|
||||
recipient (an `IActorRef` to a TestKit probe), and assert via `ExpectMsg`
|
||||
in each test. Where the simpler counter shape is preferred, pass a
|
||||
shared-state object into the actor's constructor so each test owns its own
|
||||
instance — never reach for `static` mutable test state.
|
||||
|
||||
Reference in New Issue
Block a user