feat(deployment-manager): resolve DeploymentManager-006 — query site deployment state before redeploy and reconcile

Adds DeploymentStateQuery request/response contracts (Commons), a site-side
handler (SiteRuntime), a CommunicationService query method (Communication), and
reconciliation in DeploymentService: when a prior record is InProgress or
Failed-on-timeout, query the site; if it already holds the target revision hash
mark the record Success without re-sending; on query failure fall through to a
normal deploy (site-side stale-rejection is the safety net).
This commit is contained in:
Joseph Doherty
2026-05-16 20:12:24 -04:00
parent cac8aebe9f
commit bc548e1447
13 changed files with 662 additions and 19 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 12 |
| Open findings | 11 |
## Summary
@@ -231,7 +231,7 @@ _Unresolved._
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:84-200,363-368` |
**Description**
@@ -261,19 +261,32 @@ stale-rejection.
**Resolution**
_Unresolved._ Finding confirmed valid against the source — `GetDeploymentStatusAsync`
only reads the local `DeploymentRecord` via `GetDeploymentByDeploymentIdAsync`,
and `DeployInstanceAsync` unconditionally generates a new deployment ID with no
site reconciliation. Left Open: a proper fix is a cross-module new feature, not
a bug fix scoped to `ScadaLink.DeploymentManager`. It requires (1) a new
request/response message contract in `ScadaLink.Commons`, (2) a new
`CommunicationService` query method in `ScadaLink.Communication`, and (3)
site-side handling of the query — all outside the DeploymentManager module — plus
a design decision on the query protocol. The reconciliation logic in
`DeploymentService` cannot be implemented without those. Recommend tracking as a
dedicated cross-module feature work item (or, alternatively, amending the design
doc to delegate reconciliation entirely to site-side stale-rejection — also
outside this module's editable scope).
Resolved 2026-05-16 (commit `<pending>`): implemented the cross-module
query-the-site-before-redeploy idempotency feature across Commons, SiteRuntime,
Communication, and DeploymentManager — new `DeploymentStateQueryRequest` /
`DeploymentStateQueryResponse` contracts, a `DeploymentManagerActor` handler
answering from the site's deployed-config store, a
`CommunicationService.QueryDeploymentStateAsync` method routed over the
ClusterClient command/control transport, and reconciliation in
`DeployInstanceAsync` (`TryReconcileWithSiteAsync`) that queries the site only
when a prior record is `InProgress` or `Failed` due to a timeout, marks the
prior record `Success` without re-sending if the site already has the target
revision hash, and falls through to a normal deploy (relying on site-side
stale-rejection) when the query fails. Regression tests:
`RoundTrip_DeploymentStateQueryRequest_Succeeds`,
`RoundTrip_DeploymentStateQueryResponse_Deployed_Succeeds`,
`RoundTrip_DeploymentStateQueryResponse_NotDeployed_NullApplied`,
`DeploymentStateQuery_DeployedInstance_ReturnsAppliedIdentity`,
`DeploymentStateQuery_UnknownInstance_ReturnsNotDeployed`,
`DeploymentStateQuery_ForwardedToDeploymentManager`,
`QueryDeploymentStateAsync_BeforeInitialization_Throws`,
`QueryDeploymentStateAsync_SendsEnvelopeAndReturnsResponse`,
`DeployInstanceAsync_PriorInProgressRecord_SiteHasTargetHash_MarksSuccessWithoutRedeploy`,
`DeployInstanceAsync_PriorInProgressRecord_SiteHasDifferentHash_ProceedsWithDeploy`,
`DeployInstanceAsync_PriorFailedTimeoutRecord_QueriesSite`,
`DeployInstanceAsync_PriorSuccessRecord_SkipsSiteQuery`,
`DeployInstanceAsync_FreshFirstTimeDeploy_SkipsSiteQuery`,
`DeployInstanceAsync_PriorInProgressRecord_QueryFails_FallsThroughToDeploy`.
### DeploymentManager-007 — "Diff View" reduced to a hash comparison with no diff detail