fix(deployment-manager): resolve DeploymentManager-001/002 — broaden failure catch, persist failure status with non-cancellable token

This commit is contained in:
Joseph Doherty
2026-05-16 19:40:40 -04:00
parent fccd3274d3
commit ab098bf6c8
3 changed files with 149 additions and 14 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 14 |
| Open findings | 12 |
## Summary
@@ -53,7 +53,7 @@ exercises a successful deployment or the lifecycle success paths.
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:141-199` |
**Description**
@@ -81,7 +81,13 @@ every exit path out of the `try`.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `<pending>`): broadened the `catch` in
`DeployInstanceAsync` to `catch (Exception ex)` so any exception (transport,
serialization, DB, `InvalidOperationException` from an uninitialized
`CommunicationService`) marks the deployment record `Failed` with the error
message and audit-logs the failure, instead of escaping and leaving the record
stuck in `InProgress`. Regression test:
`DeployInstanceAsync_CommunicationThrowsUnexpectedException_RecordMarkedFailed`.
### DeploymentManager-002 — Failure-status write uses a possibly-cancelled cancellation token
@@ -89,7 +95,7 @@ _Unresolved._
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:186-196` |
**Description**
@@ -113,7 +119,14 @@ cancelled or timed out.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `<pending>`): the broadened `catch` block now
performs the failure-status write (`UpdateDeploymentRecordAsync`,
`SaveChangesAsync`) and the audit `LogAsync` with `CancellationToken.None`
instead of the operation's (possibly-cancelled) token, so the `Failed` status
is durably recorded even after a timeout/cancellation. The cleanup writes are
themselves wrapped in a `try`/`catch` that logs (without masking the original
error) if persistence still fails. Regression test:
`DeployInstanceAsync_FailureWrite_UsesNonCancellableToken`.
### DeploymentManager-003 — Successful-deployment cleanup is not atomic with the status write
@@ -248,7 +261,19 @@ stale-rejection.
**Resolution**
_Unresolved._
_Unresolved._ Finding confirmed valid against the source — `GetDeploymentStatusAsync`
only reads the local `DeploymentRecord` via `GetDeploymentByDeploymentIdAsync`,
and `DeployInstanceAsync` unconditionally generates a new deployment ID with no
site reconciliation. Left Open: a proper fix is a cross-module new feature, not
a bug fix scoped to `ScadaLink.DeploymentManager`. It requires (1) a new
request/response message contract in `ScadaLink.Commons`, (2) a new
`CommunicationService` query method in `ScadaLink.Communication`, and (3)
site-side handling of the query — all outside the DeploymentManager module — plus
a design decision on the query protocol. The reconciliation logic in
`DeploymentService` cannot be implemented without those. Recommend tracking as a
dedicated cross-module feature work item (or, alternatively, amending the design
doc to delegate reconciliation entirely to site-side stale-rejection — also
outside this module's editable scope).
### DeploymentManager-007 — "Diff View" reduced to a hash comparison with no diff detail