fix(deployment-manager): resolve DeploymentManager-009,010,012,014 — shared deployment ID, lifecycle-timeout enforcement, doc/test cleanup; DeploymentManager-013 flagged
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 5 |
|
||||
| Open findings | 1 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -423,7 +423,7 @@ configuration binding. Regression tests:
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:288` |
|
||||
|
||||
**Description**
|
||||
@@ -436,6 +436,9 @@ into the comment and not derived from any constant in this module. If
|
||||
`LifecycleTimeout` is reconfigured, the comment becomes wrong. It also wrongly
|
||||
implies the value lives in this module.
|
||||
|
||||
**Verification:** Confirmed against source. The `DeleteInstanceAsync` XML doc
|
||||
quoted a hard-coded "30s" value.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Reword to "Delete fails if the site is unreachable within
|
||||
@@ -443,7 +446,12 @@ Reword to "Delete fails if the site is unreachable within
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending): the `DeleteInstanceAsync` XML doc no
|
||||
longer quotes a hard-coded "30s" — it now states delete fails if the site is
|
||||
unreachable within `CommunicationOptions.LifecycleTimeout` (and notes the
|
||||
deadline is applied inside `CommunicationService.DeleteInstanceAsync`).
|
||||
Documentation-only change; no regression test (a test asserting comment text
|
||||
would be meaningless).
|
||||
|
||||
### DeploymentManager-010 — `SystemArtifactDeploymentRecord` does not persist the deployment ID
|
||||
|
||||
@@ -451,7 +459,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DeploymentManager/ArtifactDeploymentService.cs:136,194-211` |
|
||||
|
||||
**Description**
|
||||
@@ -465,6 +473,9 @@ stored record. Additionally each per-site `DeployArtifactsCommand` carries its
|
||||
own separate GUID (`BuildDeployArtifactsCommandAsync` line 114), so there are in
|
||||
fact N+1 unrelated IDs for one logical artifact deployment.
|
||||
|
||||
**Verification:** Confirmed against source. Each per-site command minted its own
|
||||
GUID and the persisted record had no way to reference the logical id.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add a `DeploymentId` column to `SystemArtifactDeploymentRecord` and store the
|
||||
@@ -473,7 +484,23 @@ per-site commands so the audit log, UI summary, and persisted record agree.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending): `BuildDeployArtifactsCommandAsync` now
|
||||
accepts an optional `deploymentId`, and `DeployToAllSitesAsync` passes the one
|
||||
logical `deploymentId` to every per-site command — so the per-site commands,
|
||||
the audit log, and the UI summary all reference a single id instead of N+1
|
||||
unrelated GUIDs (`RetryForSiteAsync`, an independent single-site retry, still
|
||||
mints its own id). Adding a dedicated `DeploymentId` *column* to
|
||||
`SystemArtifactDeploymentRecord` was deliberately **not** done: that entity
|
||||
lives in `ScadaLink.Commons` with its EF mapping in
|
||||
`ScadaLink.ConfigurationDatabase`, both outside this module's edit scope.
|
||||
Instead the logical `deploymentId` is embedded in the record's free-form
|
||||
`PerSiteStatus` JSON payload (`{ DeploymentId, Sites }`), which is fully within
|
||||
this module's control, so the persisted record is correlatable with the
|
||||
summary/audit. A follow-up to promote it to a first-class column should be
|
||||
filed against Commons/ConfigurationDatabase if a queryable index is needed.
|
||||
Regression tests: `DeployToAllSitesAsync_AllPerSiteCommandsShareTheSummaryDeploymentId`,
|
||||
`DeployToAllSitesAsync_PartialFailure_ReportsPerSiteMatrix`,
|
||||
`RetryForSiteAsync_SiteSucceeds_ReturnsSuccessAndAudits`.
|
||||
|
||||
### DeploymentManager-011 — Tests never exercise a successful deployment or lifecycle success path
|
||||
|
||||
@@ -536,7 +563,7 @@ which asserts on `IAuditService.LogAsync`. Regression tests:
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DeploymentManager/DeploymentManagerOptions.cs:8-9` |
|
||||
|
||||
**Description**
|
||||
@@ -547,6 +574,9 @@ default and an XML doc, but it is never read anywhere in the codebase
|
||||
`CommunicationService`). The option misleads readers into thinking it controls
|
||||
disable/enable/delete timeouts, when setting it has no effect.
|
||||
|
||||
**Verification:** Confirmed against source. A repo-wide grep found exactly one
|
||||
occurrence of `LifecycleCommandTimeout` — the declaration itself.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Remove `LifecycleCommandTimeout`, or actually thread it through to the
|
||||
@@ -556,7 +586,21 @@ lifecycle command calls (e.g. by creating a linked CTS with this timeout in
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending): `LifecycleCommandTimeout` is now actually
|
||||
threaded through (the option exists for tuning, so it was wired up rather than
|
||||
deleted). `DisableInstanceAsync`/`EnableInstanceAsync`/`DeleteInstanceAsync`
|
||||
each create a linked `CancellationTokenSource` with `CancelAfter(
|
||||
_options.LifecycleCommandTimeout)` — the same pattern `ArtifactDeploymentService`
|
||||
uses for `ArtifactDeploymentTimeoutPerSite` — and pass its token to the
|
||||
`CommunicationService` call. Each method now catches the resulting
|
||||
`TimeoutException`/`OperationCanceledException`, logs a warning, and returns a
|
||||
`Result.Failure` (previously an `AskTimeoutException` from a hung site escaped
|
||||
uncaught). The option's XML doc was corrected to describe the real behaviour.
|
||||
Regression test:
|
||||
`DisableInstanceAsync_SiteUnresponsive_LifecycleCommandTimeoutBoundsTheWait`
|
||||
(asserts a 300 ms `LifecycleCommandTimeout` bounds the wait far below the 30 s
|
||||
`CommunicationOptions.LifecycleTimeout`; confirmed to fail before the fix —
|
||||
the call hung the full 30 s and threw `AskTimeoutException`).
|
||||
|
||||
### DeploymentManager-013 — SMTP credentials serialized and broadcast to all sites
|
||||
|
||||
@@ -585,9 +629,35 @@ Confirm inter-cluster transport encryption covers artifact commands, ensure
|
||||
SMTP credentials on site SQLite. Consider encrypting the credential field
|
||||
within the artifact payload.
|
||||
|
||||
**Verification (2026-05-16):** Re-triaged against source. The DeploymentManager
|
||||
side is **clean**: `ArtifactDeploymentService` maps `SmtpConfiguration.Credentials`
|
||||
into the artifact (which the design explicitly mandates — SMTP configuration is
|
||||
a deployable artifact) and **never logs it** — the three log statements in
|
||||
`DeployToAllSitesAsync` only reference `SiteId`, `SiteName`, `DeploymentId`, and
|
||||
`ex.Message`, never the credential. There is no defect to fix purely within
|
||||
`src/ScadaLink.DeploymentManager`. The finding's remaining recommendations are
|
||||
all cross-module and one needs a design decision:
|
||||
- inter-cluster transport TLS — `ScadaLink.Communication` /
|
||||
`ScadaLink.ClusterInfrastructure` (Akka remoting + ClusterClient config);
|
||||
- at-rest encryption of the credential on site SQLite — `ScadaLink.SiteRuntime`
|
||||
artifact store;
|
||||
- encrypting the credential field inside the artifact payload — needs the
|
||||
`SmtpConfigurationArtifact` shape in `ScadaLink.Commons` plus cooperating
|
||||
producer (DeploymentManager) and consumer (SiteRuntime) changes, and a
|
||||
**key-management design decision** (where the encryption key lives, how it
|
||||
is distributed to sites) that cannot be made unilaterally here.
|
||||
|
||||
**Status: Open — flagged.** No purely-DeploymentManager fix exists; the work
|
||||
crosses Communication / SiteRuntime / Commons and requires a key-management
|
||||
design decision. Severity confirmed Low: with TLS-protected inter-cluster
|
||||
transport (a separate, assumed-in-place control) and no logging leak, this is a
|
||||
hardening item, not an active leak.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
_Unresolved — see Verification above. Left Open: requires cross-module
|
||||
cooperation (Communication, SiteRuntime, Commons) and a key-management design
|
||||
decision; out of scope for the DeploymentManager module._
|
||||
|
||||
### DeploymentManager-014 — Dead `CreateCommand` helper in artifact tests
|
||||
|
||||
@@ -595,7 +665,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `tests/ScadaLink.DeploymentManager.Tests/ArtifactDeploymentServiceTests.cs:86-90` |
|
||||
|
||||
**Description**
|
||||
@@ -606,6 +676,10 @@ multi-site artifact deployment) was never written — coverage of
|
||||
`DeployToAllSitesAsync` is limited to the no-sites failure case, and
|
||||
`RetryForSiteAsync` and `BuildDeployArtifactsCommandAsync` have no tests at all.
|
||||
|
||||
**Verification:** Confirmed against source. The `CreateCommand()` helper had no
|
||||
callers, and `DeployToAllSitesAsync`/`RetryForSiteAsync` only had the no-sites
|
||||
failure case.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either remove the unused helper or, preferably, write the missing tests for
|
||||
@@ -614,4 +688,13 @@ Either remove the unused helper or, preferably, write the missing tests for
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending): took the recommendation's preferred
|
||||
option — removed the dead `CreateCommand()` helper and wrote the missing
|
||||
coverage instead. `ArtifactDeploymentServiceTests` now extends `TestKit` and
|
||||
uses a stand-in `ArtifactProbeActor` (records the `DeployArtifactsCommand`s it
|
||||
receives, replies success or, for a configured failure set, failure) so
|
||||
`DeployToAllSitesAsync` and `RetryForSiteAsync` are exercised end-to-end past
|
||||
the communication boundary. New tests:
|
||||
`DeployToAllSitesAsync_AllPerSiteCommandsShareTheSummaryDeploymentId` (also
|
||||
covers DeploymentManager-010), `DeployToAllSitesAsync_PartialFailure_ReportsPerSiteMatrix`
|
||||
(per-site success/failure matrix), `RetryForSiteAsync_SiteSucceeds_ReturnsSuccessAndAudits`.
|
||||
|
||||
Reference in New Issue
Block a user