fix(deployment-manager): resolve DeploymentManager-003..011 — atomic status commit, orphan-delete handling, semaphore reclamation, structured diff, options binding, lifecycle test coverage

This commit is contained in:
Joseph Doherty
2026-05-16 21:11:24 -04:00
parent c9b236e507
commit 8c67ffad2a
8 changed files with 760 additions and 44 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 | | Last reviewed | 2026-05-16 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `9c60592` | | Commit reviewed | `9c60592` |
| Open findings | 11 | | Open findings | 5 |
## Summary ## Summary
@@ -134,7 +134,7 @@ error) if persistence still fails. Regression test:
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Error handling & resilience | | Category | Error handling & resilience |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:155-170` | | Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:155-170` |
**Description** **Description**
@@ -150,6 +150,11 @@ deployment. Central and site are now divergent: the site is running the new
config but central still shows the old state and a non-`Success` deployment config but central still shows the old state and a non-`Success` deployment
record. record.
**Verification:** Confirmed against source. The DeploymentManager-001 fix made
this strictly worse, not better — after that fix a snapshot-store failure is
caught and the record is flipped from `Success` back to `Failed`, so central
reports a *failed* deployment while the site is running the new config.
**Recommendation** **Recommendation**
Wrap the post-success persistence so that, at minimum, the deployment record's Wrap the post-success persistence so that, at minimum, the deployment record's
@@ -160,7 +165,15 @@ apply.
**Resolution** **Resolution**
_Unresolved._ Resolved 2026-05-16 (commit pending): `DeployInstanceAsync` now commits the
deployment record's terminal status (`UpdateDeploymentRecordAsync` +
`SaveChangesAsync`) immediately after the site confirms the apply, *before*
touching instance state or the deployed-config snapshot. The post-success
instance-state update and `StoreDeployedSnapshotAsync` are wrapped in a
best-effort `try`/`catch` that logs loudly for operator reconciliation but no
longer flips the already-committed `Success` record back to `Failed`.
Regression test:
`DeployInstanceAsync_SiteSucceeds_SnapshotWriteFails_RecordStillCommittedSuccess`.
### DeploymentManager-004 — Site-success but central-delete-failure leaves orphaned site config ### DeploymentManager-004 — Site-success but central-delete-failure leaves orphaned site config
@@ -168,7 +181,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Error handling & resilience | | Category | Error handling & resilience |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:312-319` | | Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:312-319` |
**Description** **Description**
@@ -182,6 +195,10 @@ normal path (the site no longer has it, so a re-issued delete may fail) and is
permanently orphaned. The design states central must not mark the instance permanently orphaned. The design states central must not mark the instance
deleted until the site confirms — but it does not address the inverse failure. deleted until the site confirms — but it does not address the inverse failure.
**Verification:** Confirmed against source. `DeleteInstanceAsync` has no
`try`/`catch` around the post-success block, so any exception from
`DeleteInstanceAsync`/`SaveChangesAsync` escapes uncaught to the caller.
**Recommendation** **Recommendation**
Catch persistence failures in the post-success block and surface a distinct Catch persistence failures in the post-success block and surface a distinct
@@ -191,7 +208,13 @@ idempotent and retryable independently of the site command.
**Resolution** **Resolution**
_Unresolved._ Resolved 2026-05-16 (commit pending): the post-success removal in
`DeleteInstanceAsync` (`DeleteInstanceAsync` + `SaveChangesAsync`) is now
wrapped in a `try`/`catch`. A persistence failure no longer escapes uncaught —
it is logged, recorded with a `DeleteOrphaned` audit entry, and surfaced as a
distinct `Result` failure stating the site deleted the instance but the central
record is orphaned and must be reconciled. Regression test:
`DeleteInstanceAsync_SiteSucceeds_CentralDeleteFails_ReturnsDistinctFailure`.
### DeploymentManager-005 — `OperationLockManager` leaks a `SemaphoreSlim` per instance name ### DeploymentManager-005 — `OperationLockManager` leaks a `SemaphoreSlim` per instance name
@@ -199,7 +222,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Performance & resource management | | Category | Performance & resource management |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/OperationLockManager.cs:15-33` | | Location | `src/ScadaLink.DeploymentManager/OperationLockManager.cs:15-33` |
**Description** **Description**
@@ -213,6 +236,9 @@ with the bulk "deploy all out-of-date instances" workflow and instances that
are created and deleted over time — this is an unbounded leak of both managed are created and deleted over time — this is an unbounded leak of both managed
memory and OS handles. Deleted instances' semaphores are never reclaimed. memory and OS handles. Deleted instances' semaphores are never reclaimed.
**Verification:** Confirmed against source. `_locks` is a `ConcurrentDictionary`
with no removal path anywhere in the type.
**Recommendation** **Recommendation**
Either accept the leak explicitly and document the expected bounded cardinality Either accept the leak explicitly and document the expected bounded cardinality
@@ -223,7 +249,17 @@ At minimum, remove the semaphore entry when an instance is deleted
**Resolution** **Resolution**
_Unresolved._ Resolved 2026-05-16 (commit pending): `OperationLockManager` now ref-counts each
lock entry. A reference is reserved (creating the entry if needed) before the
`SemaphoreSlim.WaitAsync`, so concurrent waiters for the same instance share one
semaphore and the entry survives until every waiter/holder has released. When
the reference count reaches zero — on release, timeout, or cancellation — the
entry is removed from the dictionary and the semaphore is `Dispose()`d, so the
process no longer accumulates one kernel wait handle per distinct instance name.
A `TrackedLockCount` diagnostic property was added to make reclamation testable.
Regression tests: `AcquireAsync_ReleasedLock_RemovesSemaphoreEntry`,
`AcquireAsync_ManyDistinctInstances_DoesNotAccumulateSemaphores`,
`AcquireAsync_ContendedLock_KeepsSemaphoreUntilLastReleaseThenReclaims`.
### DeploymentManager-006 — Query-the-site-before-redeploy idempotency requirement not implemented ### DeploymentManager-006 — Query-the-site-before-redeploy idempotency requirement not implemented
@@ -294,7 +330,7 @@ stale-rejection) when the query fails. Regression tests:
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:334-358,401-406` | | Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:334-358,401-406` |
**Description** **Description**
@@ -308,6 +344,12 @@ added/removed/changed detail is produced, and the Template Engine's diff
capability is not invoked. The UI cannot render a meaningful diff from this capability is not invoked. The UI cannot render a meaningful diff from this
result. result.
**Verification:** Confirmed against source. The Template Engine already provides
`DiffService` + `ConfigurationDiff` (structured Added/Removed/Changed entries
for attributes, alarms, and scripts, including data connection binding fields),
and `DiffService` is DI-registered — it was simply never wired into the
Deployment Manager's comparison path.
**Recommendation** **Recommendation**
Either implement a real diff (deserialize the stored Either implement a real diff (deserialize the stored
@@ -318,7 +360,15 @@ down to staleness detection only.
**Resolution** **Resolution**
_Unresolved._ Resolved 2026-05-16 (commit pending): `GetDeploymentComparisonAsync` now
deserializes the stored `DeployedConfigSnapshot.ConfigurationJson` and runs the
Template Engine `DiffService` against the freshly flattened current
configuration, attaching the resulting `ConfigurationDiff` (added/removed/changed
attributes, alarms, scripts) to a new optional `Diff` property on
`DeploymentComparisonResult`. `DiffService` is injected into `DeploymentService`.
A snapshot that cannot be deserialized (corrupt / older schema) still yields the
hash-based staleness result with a null diff, logged at warning level.
Regression test: `GetDeploymentComparisonAsync_ProducesStructuredDiff`.
### DeploymentManager-008 — `DeploymentManagerOptions` is never bound to configuration ### DeploymentManager-008 — `DeploymentManagerOptions` is never bound to configuration
@@ -326,7 +376,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/ServiceCollectionExtensions.cs:7-14` | | Location | `src/ScadaLink.DeploymentManager/ServiceCollectionExtensions.cs:7-14` |
**Description** **Description**
@@ -341,6 +391,9 @@ to options classes (Options pattern)." `Host/Program.cs` binds
`SecurityOptions` and `InboundApiOptions` from configuration sections but has `SecurityOptions` and `InboundApiOptions` from configuration sections but has
no equivalent for `DeploymentManagerOptions`. no equivalent for `DeploymentManagerOptions`.
**Verification:** Confirmed against source. Neither `AddDeploymentManager` nor
`Host/Program.cs` binds `DeploymentManagerOptions`.
**Recommendation** **Recommendation**
Add an `IConfiguration` parameter (or a configure callback) to Add an `IConfiguration` parameter (or a configure callback) to
@@ -349,7 +402,18 @@ Add an `IConfiguration` parameter (or a configure callback) to
**Resolution** **Resolution**
_Unresolved._ Resolved 2026-05-16 (commit pending): added an
`AddDeploymentManager(IServiceCollection, IConfiguration)` overload that binds
`DeploymentManagerOptions` to the `ScadaLink:DeploymentManager` configuration
section (exposed as `ServiceCollectionExtensions.OptionsSection`). The
`Microsoft.Extensions.Options.ConfigurationExtensions` package was added to the
project. The original parameterless overload is retained for callers/tests that
do not bind configuration. Regression tests:
`AddDeploymentManager_WithConfiguration_BindsDeploymentManagerOptions`,
`AddDeploymentManager_WithConfiguration_MissingSection_UsesDefaults`. Note: a
one-line follow-up in `Host/Program.cs` (call the new overload with
`builder.Configuration`) is required to take effect at runtime — that file is
outside this module's edit scope and is surfaced for the Host owner.
### DeploymentManager-009 — Misleading timeout comment on `DeleteInstanceAsync` ### DeploymentManager-009 — Misleading timeout comment on `DeleteInstanceAsync`
@@ -415,7 +479,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Testing coverage | | Category | Testing coverage |
| Status | Open | | Status | Resolved |
| Location | `tests/ScadaLink.DeploymentManager.Tests/DeploymentServiceTests.cs:100-151,155-199` | | Location | `tests/ScadaLink.DeploymentManager.Tests/DeploymentServiceTests.cs:100-151,155-199` |
**Description** **Description**
@@ -432,6 +496,13 @@ critical post-response branch (`DeploymentService.cs:154-184`) and the entire
delete/disable/enable success path are untested. The `AuditLogs` test delete/disable/enable success path are untested. The `AuditLogs` test
(lines 277-289) asserts nothing. (lines 277-289) asserts nothing.
**Verification:** Partially confirmed. By the time this finding was being
resolved, the DeploymentManager-006 fix had already introduced a TestKit-actor
seam (`CreateServiceWithCommActor` + `ReconcileProbeActor`) and successful-deploy
tests. The genuinely-still-missing coverage was: successful Disable/Enable/Delete
paths, per-instance lock serialization during deploy, and the assertionless
`AuditLogs` test — those gaps were addressed.
**Recommendation** **Recommendation**
Introduce a seam to inject a fake/substitute communication path (e.g. an Introduce a seam to inject a fake/substitute communication path (e.g. an
@@ -442,7 +513,20 @@ test assert on `IAuditService.LogAsync`.
**Resolution** **Resolution**
_Unresolved._ Resolved 2026-05-16 (commit pending): extended the TestKit-actor seam
(`ReconcileProbeActor` now also answers lifecycle commands) and added the
missing coverage — successful Disable/Enable/Delete (state transition + audit
assertions), a successful-deploy audit assertion, and per-instance lock
serialization via a new deferred-reply `SerializationProbeActor` that asserts a
single instance's concurrent deploys never overlap. The assertionless `AuditLogs`
test was replaced with `DeployInstanceAsync_FlatteningFails_DoesNotReachAudit`,
which asserts on `IAuditService.LogAsync`. Regression tests:
`DisableInstanceAsync_SiteSucceeds_SetsDisabledStateAndAudits`,
`EnableInstanceAsync_SiteSucceeds_SetsEnabledStateAndAudits`,
`DeleteInstanceAsync_SiteSucceeds_RemovesRecordAndAudits`,
`DeployInstanceAsync_SiteSucceeds_WritesDeployAuditEntry`,
`DeployInstanceAsync_FlatteningFails_DoesNotReachAudit`,
`DeployInstanceAsync_SameInstance_OperationLockSerializesConcurrentDeploys`.
### DeploymentManager-012 — `LifecycleCommandTimeout` option is dead code ### DeploymentManager-012 — `LifecycleCommandTimeout` option is dead code

View File

@@ -40,6 +40,7 @@ public class DeploymentService
private readonly CommunicationService _communicationService; private readonly CommunicationService _communicationService;
private readonly OperationLockManager _lockManager; private readonly OperationLockManager _lockManager;
private readonly IAuditService _auditService; private readonly IAuditService _auditService;
private readonly DiffService _diffService;
private readonly DeploymentManagerOptions _options; private readonly DeploymentManagerOptions _options;
private readonly ILogger<DeploymentService> _logger; private readonly ILogger<DeploymentService> _logger;
@@ -58,6 +59,7 @@ public class DeploymentService
CommunicationService communicationService, CommunicationService communicationService,
OperationLockManager lockManager, OperationLockManager lockManager,
IAuditService auditService, IAuditService auditService,
DiffService diffService,
IOptions<DeploymentManagerOptions> options, IOptions<DeploymentManagerOptions> options,
ILogger<DeploymentService> logger) ILogger<DeploymentService> logger)
{ {
@@ -67,6 +69,7 @@ public class DeploymentService
_communicationService = communicationService; _communicationService = communicationService;
_lockManager = lockManager; _lockManager = lockManager;
_auditService = auditService; _auditService = auditService;
_diffService = diffService;
_options = options.Value; _options = options.Value;
_logger = logger; _logger = logger;
} }
@@ -171,24 +174,47 @@ public class DeploymentService
var response = await _communicationService.DeployInstanceAsync(siteId, command, cancellationToken); var response = await _communicationService.DeployInstanceAsync(siteId, command, cancellationToken);
// WP-1: Update status based on site response // WP-1: Update status based on site response.
record.Status = response.Status; record.Status = response.Status;
record.ErrorMessage = response.ErrorMessage; record.ErrorMessage = response.ErrorMessage;
record.CompletedAt = DateTimeOffset.UtcNow; record.CompletedAt = DateTimeOffset.UtcNow;
// DeploymentManager-003: once the site has confirmed the apply,
// commit the deployment record's terminal status BEFORE touching
// instance state and the deployed-config snapshot. If a later write
// (instance update / snapshot store) fails, the recorded fact that
// the site succeeded must NOT be lost -- otherwise central reports a
// non-Success record while the site is running the new config.
await _repository.UpdateDeploymentRecordAsync(record, cancellationToken); await _repository.UpdateDeploymentRecordAsync(record, cancellationToken);
await _repository.SaveChangesAsync(cancellationToken);
if (response.Status == DeploymentStatus.Success) if (response.Status == DeploymentStatus.Success)
{ {
// WP-4: Update instance state to Enabled on successful deployment // The site has applied the deployment. The post-success
instance.State = InstanceState.Enabled; // persistence below is best-effort: a failure here must be
await _repository.UpdateInstanceAsync(instance, cancellationToken); // logged loudly for operator reconciliation but must not flip
// the already-committed Success record back to Failed.
try
{
// WP-4: Update instance state to Enabled on successful deployment
instance.State = InstanceState.Enabled;
await _repository.UpdateInstanceAsync(instance, cancellationToken);
// WP-8: Store deployed config snapshot // WP-8: Store deployed config snapshot
await StoreDeployedSnapshotAsync(instanceId, deploymentId, revisionHash, configJson, cancellationToken); await StoreDeployedSnapshotAsync(instanceId, deploymentId, revisionHash, configJson, cancellationToken);
await _repository.SaveChangesAsync(cancellationToken);
}
catch (Exception postEx)
{
_logger.LogError(postEx,
"Deployment {DeploymentId} for instance {Instance} was applied by the site and " +
"recorded Success, but post-success persistence (instance state / config snapshot) " +
"failed -- central and site state may diverge until reconciled",
deploymentId, instance.UniqueName);
}
} }
await _repository.SaveChangesAsync(cancellationToken);
// Audit log // Audit log
await _auditService.LogAsync(user, "Deploy", "Instance", instanceId.ToString(), await _auditService.LogAsync(user, "Deploy", "Instance", instanceId.ToString(),
instance.UniqueName, new { DeploymentId = deploymentId, Status = record.Status.ToString() }, instance.UniqueName, new { DeploymentId = deploymentId, Status = record.Status.ToString() },
@@ -368,8 +394,34 @@ public class DeploymentService
// Delete means delete: remove the instance record entirely. // Delete means delete: remove the instance record entirely.
// Deployment records, snapshot, overrides, and connection bindings // Deployment records, snapshot, overrides, and connection bindings
// are removed with it (see repository implementation). // are removed with it (see repository implementation).
await _repository.DeleteInstanceAsync(instanceId, cancellationToken); //
await _repository.SaveChangesAsync(cancellationToken); // DeploymentManager-004: the site has already destroyed the Instance
// Actor and removed its config. If the central record removal now
// fails (DB error / concurrency), the exception must NOT escape
// uncaught -- that would leave the central record orphaned and
// un-deletable through the normal path (a re-issued delete may fail
// because the site no longer has the instance). Surface a distinct
// failure so an operator can reconcile.
try
{
await _repository.DeleteInstanceAsync(instanceId, cancellationToken);
await _repository.SaveChangesAsync(cancellationToken);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Instance {Instance} was deleted at the site, but the central record could not be " +
"removed -- the central record is now orphaned and must be reconciled manually",
instance.UniqueName);
await _auditService.LogAsync(user, "DeleteOrphaned", "Instance", instanceId.ToString(),
instance.UniqueName, new { CommandId = commandId, Error = ex.Message },
CancellationToken.None);
return Result<InstanceLifecycleResponse>.Failure(
$"The site deleted instance '{instance.UniqueName}', but the central record could not " +
$"be removed: {ex.Message}. The central record is orphaned and must be reconciled.");
}
} }
await _auditService.LogAsync(user, "Delete", "Instance", instanceId.ToString(), await _auditService.LogAsync(user, "Delete", "Instance", instanceId.ToString(),
@@ -383,7 +435,12 @@ public class DeploymentService
} }
/// <summary> /// <summary>
/// WP-8: Get the deployed config snapshot and compare with current template-derived state. /// WP-8: Get the deployed config snapshot and compare with current
/// template-derived state. Produces both a staleness flag and — per the
/// design's "Diff View" — a structured <see cref="ConfigurationDiff"/> of
/// added/removed/changed attributes, alarms, and scripts (including data
/// connection binding changes) computed by the TemplateEngine
/// <see cref="DiffService"/>.
/// </summary> /// </summary>
public async Task<Result<DeploymentComparisonResult>> GetDeploymentComparisonAsync( public async Task<Result<DeploymentComparisonResult>> GetDeploymentComparisonAsync(
int instanceId, int instanceId,
@@ -398,15 +455,47 @@ public class DeploymentService
if (currentResult.IsFailure) if (currentResult.IsFailure)
return Result<DeploymentComparisonResult>.Failure($"Cannot compute current config: {currentResult.Error}"); return Result<DeploymentComparisonResult>.Failure($"Cannot compute current config: {currentResult.Error}");
var currentConfig = currentResult.Value.Configuration;
var currentHash = currentResult.Value.RevisionHash; var currentHash = currentResult.Value.RevisionHash;
var isStale = snapshot.RevisionHash != currentHash; var isStale = snapshot.RevisionHash != currentHash;
// DeploymentManager-007: deserialize the deployed snapshot and run the
// TemplateEngine DiffService so the result carries real
// added/removed/changed detail, not just a hash comparison. A snapshot
// that cannot be deserialized (corrupt / older schema) still yields the
// hash-based staleness result, with a null diff.
ConfigurationDiff? diff = null;
try
{
var deployedConfig = JsonSerializer.Deserialize<FlattenedConfiguration>(snapshot.ConfigurationJson);
if (deployedConfig != null)
{
diff = _diffService.ComputeDiff(
deployedConfig, currentConfig, snapshot.RevisionHash, currentHash);
}
else
{
_logger.LogWarning(
"Deployed snapshot for instance {InstanceId} deserialized to null; " +
"returning hash-based comparison without a structured diff",
instanceId);
}
}
catch (JsonException ex)
{
_logger.LogWarning(ex,
"Could not deserialize deployed snapshot for instance {InstanceId}; " +
"returning hash-based comparison without a structured diff",
instanceId);
}
var result = new DeploymentComparisonResult( var result = new DeploymentComparisonResult(
instanceId, instanceId,
snapshot.RevisionHash, snapshot.RevisionHash,
currentHash, currentHash,
isStale, isStale,
snapshot.DeployedAt); snapshot.DeployedAt,
diff);
return Result<DeploymentComparisonResult>.Success(result); return Result<DeploymentComparisonResult>.Success(result);
} }
@@ -551,9 +640,16 @@ public class DeploymentService
/// <summary> /// <summary>
/// WP-8: Result of comparing deployed vs template-derived configuration. /// WP-8: Result of comparing deployed vs template-derived configuration.
/// </summary> /// </summary>
/// <param name="Diff">
/// DeploymentManager-007: structured added/removed/changed detail for
/// attributes, alarms, and scripts. Null only when the deployed snapshot could
/// not be deserialized (corrupt / older schema), in which case
/// <see cref="IsStale"/> still reflects the hash comparison.
/// </param>
public record DeploymentComparisonResult( public record DeploymentComparisonResult(
int InstanceId, int InstanceId,
string DeployedRevisionHash, string DeployedRevisionHash,
string CurrentRevisionHash, string CurrentRevisionHash,
bool IsStale, bool IsStale,
DateTimeOffset DeployedAt); DateTimeOffset DeployedAt,
ConfigurationDiff? Diff = null);

View File

@@ -6,13 +6,34 @@ namespace ScadaLink.DeploymentManager;
/// WP-3: Per-instance operation lock. Only one mutating operation (deploy, disable, enable, delete) /// WP-3: Per-instance operation lock. Only one mutating operation (deploy, disable, enable, delete)
/// may be in progress per instance at a time. Different instances can proceed in parallel. /// may be in progress per instance at a time. Different instances can proceed in parallel.
/// ///
/// Implementation: ConcurrentDictionary of SemaphoreSlim(1,1) keyed by instance unique name. /// Implementation: ConcurrentDictionary of ref-counted SemaphoreSlim(1,1) keyed by instance
/// Lock released on completion, timeout, or failure. /// unique name. The lock is released on completion, timeout, or failure.
/// Lost on central failover (acceptable per design -- in-progress treated as failed). /// Lost on central failover (acceptable per design -- in-progress treated as failed).
///
/// DeploymentManager-005: each entry is ref-counted. The semaphore is created on the
/// first acquire/wait, shared while there are waiters or a holder, and removed +
/// <see cref="IDisposable.Dispose"/>d when the last reference is released — so the dictionary
/// does not accumulate one kernel wait handle per distinct instance name forever.
/// </summary> /// </summary>
public class OperationLockManager public class OperationLockManager
{ {
private readonly ConcurrentDictionary<string, SemaphoreSlim> _locks = new(StringComparer.Ordinal); private readonly object _gate = new();
private readonly Dictionary<string, LockEntry> _locks = new(StringComparer.Ordinal);
/// <summary>
/// Number of lock entries currently tracked. Used for diagnostics and to
/// verify that semaphores are reclaimed (DeploymentManager-005).
/// </summary>
public int TrackedLockCount
{
get
{
lock (_gate)
{
return _locks.Count;
}
}
}
/// <summary> /// <summary>
/// Acquires the operation lock for the given instance. Returns a disposable that releases the lock. /// Acquires the operation lock for the given instance. Returns a disposable that releases the lock.
@@ -20,16 +41,40 @@ public class OperationLockManager
/// </summary> /// </summary>
public async Task<IDisposable> AcquireAsync(string instanceUniqueName, TimeSpan timeout, CancellationToken cancellationToken = default) public async Task<IDisposable> AcquireAsync(string instanceUniqueName, TimeSpan timeout, CancellationToken cancellationToken = default)
{ {
var semaphore = _locks.GetOrAdd(instanceUniqueName, _ => new SemaphoreSlim(1, 1)); // Reserve a reference (creating the entry if needed) BEFORE waiting, so a
// concurrent waiter for the same instance shares the same semaphore and
if (!await semaphore.WaitAsync(timeout, cancellationToken)) // the entry survives until every waiter/holder has released it.
LockEntry entry;
lock (_gate)
{ {
throw new TimeoutException( if (!_locks.TryGetValue(instanceUniqueName, out entry!))
$"Could not acquire operation lock for instance '{instanceUniqueName}' within {timeout.TotalSeconds}s. " + {
"Another mutating operation is in progress."); entry = new LockEntry();
_locks[instanceUniqueName] = entry;
}
entry.RefCount++;
} }
return new LockRelease(semaphore); try
{
if (!await entry.Semaphore.WaitAsync(timeout, cancellationToken))
{
throw new TimeoutException(
$"Could not acquire operation lock for instance '{instanceUniqueName}' within {timeout.TotalSeconds}s. " +
"Another mutating operation is in progress.");
}
}
catch (Exception) when (DropReferenceOnFailure(instanceUniqueName, entry))
{
// DropReferenceOnFailure always returns false; the filter just runs
// the cleanup so the reservation is not leaked when WaitAsync throws
// or times out (TimeoutException / OperationCanceledException). The
// exception still propagates. The semaphore was NOT entered on any
// of these paths, so only the reference is dropped.
throw;
}
return new LockRelease(this, instanceUniqueName, entry);
} }
/// <summary> /// <summary>
@@ -37,21 +82,73 @@ public class OperationLockManager
/// </summary> /// </summary>
public bool IsLocked(string instanceUniqueName) public bool IsLocked(string instanceUniqueName)
{ {
return _locks.TryGetValue(instanceUniqueName, out var semaphore) && semaphore.CurrentCount == 0; lock (_gate)
{
return _locks.TryGetValue(instanceUniqueName, out var entry) && entry.Semaphore.CurrentCount == 0;
}
}
private bool DropReferenceOnFailure(string instanceUniqueName, LockEntry entry)
{
ReleaseReference(instanceUniqueName, entry, semaphoreWasEntered: false);
return false;
}
/// <summary>
/// Drops one reference to the entry. When <paramref name="semaphoreWasEntered"/>
/// is true the semaphore is released first. When the reference count reaches
/// zero the entry is removed from the dictionary and the semaphore disposed.
/// </summary>
private void ReleaseReference(string instanceUniqueName, LockEntry entry, bool semaphoreWasEntered)
{
lock (_gate)
{
// Release the semaphore (handing the lock to any waiter) and drop the
// reference under the same gate, so the dispose decision below cannot
// race with the Release on an entry that another caller is reclaiming.
if (semaphoreWasEntered)
{
entry.Semaphore.Release();
}
entry.RefCount--;
if (entry.RefCount <= 0 &&
_locks.TryGetValue(instanceUniqueName, out var current) &&
ReferenceEquals(current, entry))
{
_locks.Remove(instanceUniqueName);
entry.Semaphore.Dispose();
}
}
}
private sealed class LockEntry
{
public readonly SemaphoreSlim Semaphore = new(1, 1);
/// <summary>Number of in-flight acquires (waiters + the current holder). Guarded by <see cref="_gate"/>.</summary>
public int RefCount;
} }
private sealed class LockRelease : IDisposable private sealed class LockRelease : IDisposable
{ {
private readonly SemaphoreSlim _semaphore; private readonly OperationLockManager _owner;
private readonly string _instanceUniqueName;
private readonly LockEntry _entry;
private int _disposed; private int _disposed;
public LockRelease(SemaphoreSlim semaphore) => _semaphore = semaphore; public LockRelease(OperationLockManager owner, string instanceUniqueName, LockEntry entry)
{
_owner = owner;
_instanceUniqueName = instanceUniqueName;
_entry = entry;
}
public void Dispose() public void Dispose()
{ {
if (Interlocked.CompareExchange(ref _disposed, 1, 0) == 0) if (Interlocked.CompareExchange(ref _disposed, 1, 0) == 0)
{ {
_semaphore.Release(); _owner.ReleaseReference(_instanceUniqueName, _entry, semaphoreWasEntered: true);
} }
} }
} }

View File

@@ -11,6 +11,7 @@
<PackageReference Include="Microsoft.Extensions.DependencyInjection.Abstractions" /> <PackageReference Include="Microsoft.Extensions.DependencyInjection.Abstractions" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" /> <PackageReference Include="Microsoft.Extensions.Logging.Abstractions" />
<PackageReference Include="Microsoft.Extensions.Options" /> <PackageReference Include="Microsoft.Extensions.Options" />
<PackageReference Include="Microsoft.Extensions.Options.ConfigurationExtensions" />
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>

View File

@@ -1,9 +1,41 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.DependencyInjection;
namespace ScadaLink.DeploymentManager; namespace ScadaLink.DeploymentManager;
public static class ServiceCollectionExtensions public static class ServiceCollectionExtensions
{ {
/// <summary>
/// Configuration section that <see cref="DeploymentManagerOptions"/> is bound to.
/// </summary>
public const string OptionsSection = "ScadaLink:DeploymentManager";
/// <summary>
/// Registers the Deployment Manager services and binds
/// <see cref="DeploymentManagerOptions"/> to the
/// <see cref="OptionsSection"/> configuration section, consistent with the
/// Options-pattern convention ("Per-component configuration via
/// appsettings.json sections bound to options classes").
/// </summary>
public static IServiceCollection AddDeploymentManager(
this IServiceCollection services,
IConfiguration configuration)
{
ArgumentNullException.ThrowIfNull(configuration);
// DeploymentManager-008: bind the options class so the operation-lock
// and artifact-deployment timeouts are tunable via appsettings.json.
services.Configure<DeploymentManagerOptions>(configuration.GetSection(OptionsSection));
return services.AddDeploymentManager();
}
/// <summary>
/// Registers the Deployment Manager services without binding options to
/// configuration. <see cref="DeploymentManagerOptions"/> falls back to its
/// declared defaults unless configured elsewhere. Prefer the
/// <see cref="AddDeploymentManager(IServiceCollection, IConfiguration)"/>
/// overload so the options are bound to <c>appsettings.json</c>.
/// </summary>
public static IServiceCollection AddDeploymentManager(this IServiceCollection services) public static IServiceCollection AddDeploymentManager(this IServiceCollection services)
{ {
services.AddSingleton<OperationLockManager>(); services.AddSingleton<OperationLockManager>();

View File

@@ -13,6 +13,7 @@ using ScadaLink.Commons.Types;
using ScadaLink.Commons.Types.Enums; using ScadaLink.Commons.Types.Enums;
using ScadaLink.Commons.Types.Flattening; using ScadaLink.Commons.Types.Flattening;
using ScadaLink.Communication; using ScadaLink.Communication;
using ScadaLink.TemplateEngine.Flattening;
namespace ScadaLink.DeploymentManager.Tests; namespace ScadaLink.DeploymentManager.Tests;
@@ -45,7 +46,8 @@ public class DeploymentServiceTests : TestKit
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = Substitute.For<ISiteRepository>();
_service = new DeploymentService( _service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit, options, _repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(), options,
NullLogger<DeploymentService>.Instance); NullLogger<DeploymentService>.Instance);
} }
@@ -276,6 +278,34 @@ public class DeploymentServiceTests : TestKit
Assert.Contains("not found", result.Error); Assert.Contains("not found", result.Error);
} }
// ── DeploymentManager-004: site-success but central-delete-failure must not escape uncaught ──
[Fact]
public async Task DeleteInstanceAsync_SiteSucceeds_CentralDeleteFails_ReturnsDistinctFailure()
{
// The site destroys the Instance Actor and removes its config (response
// Success), but the central record removal throws. The exception must
// NOT propagate uncaught -- it must be surfaced as a distinct failure so
// an operator can reconcile the orphaned central record.
var instance = new Instance("OrphanInst") { Id = 30, SiteId = 1, State = InstanceState.Enabled };
_repo.GetInstanceByIdAsync(30, Arg.Any<CancellationToken>()).Returns(instance);
_repo.DeleteInstanceAsync(30, Arg.Any<CancellationToken>())
.Returns<Task>(_ => throw new InvalidOperationException("db unavailable"));
var commActor = Sys.ActorOf(Props.Create(() =>
new ReconcileProbeActor(siteHash: "sha256:x", failQuery: false)));
var service = CreateServiceWithCommActor(commActor);
var result = await service.DeleteInstanceAsync(30, "admin");
// The failure is surfaced (not thrown) and clearly says the site
// succeeded but the central record could not be removed.
Assert.True(result.IsFailure);
Assert.Contains("site", result.Error, StringComparison.OrdinalIgnoreCase);
Assert.Contains("central", result.Error, StringComparison.OrdinalIgnoreCase);
}
// ── WP-8: Deployment comparison ── // ── WP-8: Deployment comparison ──
[Fact] [Fact]
@@ -331,6 +361,51 @@ public class DeploymentServiceTests : TestKit
Assert.True(result.Value.IsStale); Assert.True(result.Value.IsStale);
} }
// ── DeploymentManager-007: comparison must produce a structured diff ──
[Fact]
public async Task GetDeploymentComparisonAsync_ProducesStructuredDiff()
{
// The deployed snapshot has one attribute; the current template-derived
// config has a different attribute. The comparison must surface a real
// Added/Removed diff via the TemplateEngine DiffService, not just a
// boolean staleness flag.
var deployedConfig = new FlattenedConfiguration
{
InstanceUniqueName = "DiffInst",
Attributes = [new ResolvedAttribute { CanonicalName = "OldAttr", DataType = "Int" }]
};
var snapshot = new DeployedConfigSnapshot(
"dep1", "sha256:old", System.Text.Json.JsonSerializer.Serialize(deployedConfig))
{
InstanceId = 40,
DeployedAt = DateTimeOffset.UtcNow
};
_repo.GetDeployedSnapshotByInstanceIdAsync(40, Arg.Any<CancellationToken>()).Returns(snapshot);
var currentConfig = new FlattenedConfiguration
{
InstanceUniqueName = "DiffInst",
Attributes = [new ResolvedAttribute { CanonicalName = "NewAttr", DataType = "Int" }]
};
_pipeline.FlattenAndValidateAsync(40, Arg.Any<CancellationToken>())
.Returns(Result<FlatteningPipelineResult>.Success(
new FlatteningPipelineResult(currentConfig, "sha256:new", ValidationResult.Success())));
var result = await _service.GetDeploymentComparisonAsync(40);
Assert.True(result.IsSuccess);
Assert.True(result.Value.IsStale);
// A structured diff is present with the added and removed attributes.
Assert.NotNull(result.Value.Diff);
Assert.True(result.Value.Diff!.HasChanges);
Assert.Contains(result.Value.Diff.AttributeChanges,
c => c.CanonicalName == "NewAttr" && c.ChangeType == DiffChangeType.Added);
Assert.Contains(result.Value.Diff.AttributeChanges,
c => c.CanonicalName == "OldAttr" && c.ChangeType == DiffChangeType.Removed);
}
// ── WP-2: GetDeploymentStatusAsync ── // ── WP-2: GetDeploymentStatusAsync ──
[Fact] [Fact]
@@ -352,8 +427,11 @@ public class DeploymentServiceTests : TestKit
// ── Audit logging ── // ── Audit logging ──
[Fact] [Fact]
public async Task DeployInstanceAsync_AuditLogs() public async Task DeployInstanceAsync_FlatteningFails_DoesNotReachAudit()
{ {
// DeploymentManager-011: this test previously asserted nothing. A
// flatten failure returns before any site communication, so no audit
// entry is written.
var instance = new Instance("TestInst") { Id = 1, SiteId = 1, State = InstanceState.NotDeployed }; var instance = new Instance("TestInst") { Id = 1, SiteId = 1, State = InstanceState.NotDeployed };
_repo.GetInstanceByIdAsync(1).Returns(instance); _repo.GetInstanceByIdAsync(1).Returns(instance);
@@ -362,8 +440,120 @@ public class DeploymentServiceTests : TestKit
await _service.DeployInstanceAsync(1, "admin"); await _service.DeployInstanceAsync(1, "admin");
// Failure case does not reach audit (returns before communication) await _audit.DidNotReceive().LogAsync(
// The audit is only logged after communication succeeds/fails Arg.Any<string>(), Arg.Any<string>(), Arg.Any<string>(), Arg.Any<string>(),
Arg.Any<string>(), Arg.Any<object>(), Arg.Any<CancellationToken>());
}
[Fact]
public async Task DeployInstanceAsync_SiteSucceeds_WritesDeployAuditEntry()
{
// DeploymentManager-011: a successful deployment must write a "Deploy"
// audit entry referencing the deployed instance.
var instance = new Instance("AuditInst") { Id = 50, SiteId = 1, State = InstanceState.NotDeployed };
_repo.GetInstanceByIdAsync(50, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(50, "AuditInst", "sha256:target");
_repo.GetCurrentDeploymentStatusAsync(50, Arg.Any<CancellationToken>())
.Returns((DeploymentRecord?)null);
var commActor = Sys.ActorOf(Props.Create(() =>
new ReconcileProbeActor(siteHash: "sha256:target", failQuery: false)));
var service = CreateServiceWithCommActor(commActor);
var result = await service.DeployInstanceAsync(50, "admin");
Assert.True(result.IsSuccess);
await _audit.Received().LogAsync(
"admin", "Deploy", "Instance", "50", "AuditInst",
Arg.Any<object>(), Arg.Any<CancellationToken>());
}
// ── DeploymentManager-011: lifecycle success paths ──
[Fact]
public async Task DisableInstanceAsync_SiteSucceeds_SetsDisabledStateAndAudits()
{
var instance = new Instance("DisInst") { Id = 51, SiteId = 1, State = InstanceState.Enabled };
_repo.GetInstanceByIdAsync(51, Arg.Any<CancellationToken>()).Returns(instance);
var commActor = Sys.ActorOf(Props.Create(() =>
new ReconcileProbeActor(siteHash: "x", failQuery: false)));
var service = CreateServiceWithCommActor(commActor);
var result = await service.DisableInstanceAsync(51, "admin");
Assert.True(result.IsSuccess);
Assert.Equal(InstanceState.Disabled, instance.State);
await _repo.Received().UpdateInstanceAsync(instance, Arg.Any<CancellationToken>());
await _audit.Received().LogAsync(
"admin", "Disable", "Instance", "51", "DisInst",
Arg.Any<object>(), Arg.Any<CancellationToken>());
}
[Fact]
public async Task EnableInstanceAsync_SiteSucceeds_SetsEnabledStateAndAudits()
{
var instance = new Instance("EnInst") { Id = 52, SiteId = 1, State = InstanceState.Disabled };
_repo.GetInstanceByIdAsync(52, Arg.Any<CancellationToken>()).Returns(instance);
var commActor = Sys.ActorOf(Props.Create(() =>
new ReconcileProbeActor(siteHash: "x", failQuery: false)));
var service = CreateServiceWithCommActor(commActor);
var result = await service.EnableInstanceAsync(52, "admin");
Assert.True(result.IsSuccess);
Assert.Equal(InstanceState.Enabled, instance.State);
await _repo.Received().UpdateInstanceAsync(instance, Arg.Any<CancellationToken>());
await _audit.Received().LogAsync(
"admin", "Enable", "Instance", "52", "EnInst",
Arg.Any<object>(), Arg.Any<CancellationToken>());
}
[Fact]
public async Task DeleteInstanceAsync_SiteSucceeds_RemovesRecordAndAudits()
{
var instance = new Instance("DelInst") { Id = 53, SiteId = 1, State = InstanceState.Enabled };
_repo.GetInstanceByIdAsync(53, Arg.Any<CancellationToken>()).Returns(instance);
var commActor = Sys.ActorOf(Props.Create(() =>
new ReconcileProbeActor(siteHash: "x", failQuery: false)));
var service = CreateServiceWithCommActor(commActor);
var result = await service.DeleteInstanceAsync(53, "admin");
Assert.True(result.IsSuccess);
await _repo.Received().DeleteInstanceAsync(53, Arg.Any<CancellationToken>());
await _audit.Received().LogAsync(
"admin", "Delete", "Instance", "53", "DelInst",
Arg.Any<object>(), Arg.Any<CancellationToken>());
}
[Fact]
public async Task DeployInstanceAsync_SameInstance_OperationLockSerializesConcurrentDeploys()
{
// DeploymentManager-011: two concurrent deploys of the SAME instance
// must be serialized by the per-instance operation lock — the site sees
// them one at a time, never overlapping.
var instance = new Instance("LockInst") { Id = 54, SiteId = 1, State = InstanceState.Enabled };
_repo.GetInstanceByIdAsync(54, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(54, "LockInst", "sha256:target");
_repo.GetCurrentDeploymentStatusAsync(54, Arg.Any<CancellationToken>())
.Returns((DeploymentRecord?)null);
var commActor = Sys.ActorOf(Props.Create(() =>
new SerializationProbeActor()));
var service = CreateServiceWithCommActor(commActor);
var deploy1 = service.DeployInstanceAsync(54, "admin");
var deploy2 = service.DeployInstanceAsync(54, "admin");
var results = await Task.WhenAll(deploy1, deploy2);
Assert.True(results[0].IsSuccess);
Assert.True(results[1].IsSuccess);
// The probe records the maximum concurrency observed; the lock must
// keep it at 1 for a single instance.
Assert.Equal(1, SerializationProbeActor.MaxConcurrent);
} }
// ── DeploymentManager-006: query-the-site-before-redeploy idempotency ── // ── DeploymentManager-006: query-the-site-before-redeploy idempotency ──
@@ -386,6 +576,7 @@ public class DeploymentServiceTests : TestKit
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = Substitute.For<ISiteRepository>();
return new DeploymentService( return new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit, _repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
Options.Create(new DeploymentManagerOptions { OperationLockTimeout = TimeSpan.FromSeconds(5) }), Options.Create(new DeploymentManagerOptions { OperationLockTimeout = TimeSpan.FromSeconds(5) }),
NullLogger<DeploymentService>.Instance); NullLogger<DeploymentService>.Instance);
} }
@@ -572,6 +763,109 @@ public class DeploymentServiceTests : TestKit
Assert.Equal(1, ReconcileProbeActor.DeployCount); Assert.Equal(1, ReconcileProbeActor.DeployCount);
} }
// ── DeploymentManager-003: post-success persistence must commit the Success status ──
[Fact]
public async Task DeployInstanceAsync_SiteSucceeds_SnapshotWriteFails_RecordStillCommittedSuccess()
{
// The site applies the deployment (response Success), but storing the
// deployed-config snapshot afterwards throws. The deployment record's
// Success status MUST still be durably committed -- otherwise central
// and site diverge: the site runs the new config while central shows a
// non-Success record forever.
var instance = new Instance("SnapFailInst") { Id = 20, SiteId = 1, State = InstanceState.NotDeployed };
_repo.GetInstanceByIdAsync(20, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(20, "SnapFailInst", "sha256:target");
_repo.GetCurrentDeploymentStatusAsync(20, Arg.Any<CancellationToken>())
.Returns((DeploymentRecord?)null);
DeploymentRecord? captured = null;
await _repo.AddDeploymentRecordAsync(
Arg.Do<DeploymentRecord>(r => captured = r), Arg.Any<CancellationToken>());
// The snapshot store throws.
_repo.GetDeployedSnapshotByInstanceIdAsync(20, Arg.Any<CancellationToken>())
.Returns((DeployedConfigSnapshot?)null);
_repo.AddDeployedSnapshotAsync(Arg.Any<DeployedConfigSnapshot>(), Arg.Any<CancellationToken>())
.Returns<Task>(_ => throw new InvalidOperationException("snapshot store unavailable"));
var commActor = Sys.ActorOf(Props.Create(() =>
new ReconcileProbeActor(siteHash: "sha256:target", failQuery: false)));
var service = CreateServiceWithCommActor(commActor);
var result = await service.DeployInstanceAsync(20, "admin");
// The site succeeded -> the deployment is reported successful.
Assert.True(result.IsSuccess);
Assert.NotNull(captured);
Assert.Equal(DeploymentStatus.Success, captured!.Status);
// The Success status was committed (a SaveChanges happened with the
// record in Success state) BEFORE the snapshot write was attempted.
await _repo.Received().UpdateDeploymentRecordAsync(
Arg.Is<DeploymentRecord>(r => r.Status == DeploymentStatus.Success),
Arg.Any<CancellationToken>());
}
/// <summary>
/// Stand-in CentralCommunicationActor that measures deploy concurrency. It
/// defers each deploy reply via the scheduler, so if two deploys for the
/// same instance were NOT serialized by the operation lock their windows
/// would overlap and <see cref="MaxConcurrent"/> would exceed 1.
/// </summary>
private class SerializationProbeActor : ReceiveActor, IWithTimers
{
public static int MaxConcurrent;
private static int _current;
private static readonly object Gate = new();
public ITimerScheduler Timers { get; set; } = null!;
public SerializationProbeActor()
{
MaxConcurrent = 0;
_current = 0;
Receive<SiteEnvelope>(env =>
{
if (env.Message is DeployInstanceCommand d)
{
lock (Gate)
{
_current++;
if (_current > MaxConcurrent) MaxConcurrent = _current;
}
var replyTo = Sender;
// Defer the reply so the deploy "window" stays open long
// enough for a non-serialized second deploy to overlap.
Timers.StartSingleTimer(
d.DeploymentId,
new CompleteDeploy(d, replyTo),
TimeSpan.FromMilliseconds(150));
}
else if (env.Message is DeploymentStateQueryRequest q)
{
Sender.Tell(new DeploymentStateQueryResponse(
q.CorrelationId, q.InstanceUniqueName, false, null, null, DateTimeOffset.UtcNow));
}
});
Receive<CompleteDeploy>(c =>
{
lock (Gate)
{
_current--;
}
c.ReplyTo.Tell(new DeploymentStatusResponse(
c.Command.DeploymentId, c.Command.InstanceUniqueName,
DeploymentStatus.Success, null, DateTimeOffset.UtcNow));
});
}
private sealed record CompleteDeploy(DeployInstanceCommand Command, IActorRef ReplyTo);
}
/// <summary> /// <summary>
/// Stand-in CentralCommunicationActor for reconciliation tests. Counts the /// Stand-in CentralCommunicationActor for reconciliation tests. Counts the
/// site queries and deploy commands it receives, answers queries with a /// site queries and deploy commands it receives, answers queries with a
@@ -610,6 +904,21 @@ public class DeploymentServiceTests : TestKit
d.DeploymentId, d.InstanceUniqueName, d.DeploymentId, d.InstanceUniqueName,
DeploymentStatus.Success, null, DateTimeOffset.UtcNow)); DeploymentStatus.Success, null, DateTimeOffset.UtcNow));
break; break;
case DisableInstanceCommand dis:
Sender.Tell(new InstanceLifecycleResponse(
dis.CommandId, dis.InstanceUniqueName, true, null, DateTimeOffset.UtcNow));
break;
case EnableInstanceCommand en:
Sender.Tell(new InstanceLifecycleResponse(
en.CommandId, en.InstanceUniqueName, true, null, DateTimeOffset.UtcNow));
break;
case DeleteInstanceCommand del:
Sender.Tell(new InstanceLifecycleResponse(
del.CommandId, del.InstanceUniqueName, true, null, DateTimeOffset.UtcNow));
break;
} }
}); });
} }

View File

@@ -92,4 +92,52 @@ public class OperationLockManagerTests
await Task.WhenAll(tasks); await Task.WhenAll(tasks);
} }
// ── DeploymentManager-005: semaphore must not leak ──
[Fact]
public async Task AcquireAsync_ReleasedLock_RemovesSemaphoreEntry()
{
// A semaphore that is created, used, and fully released must not be
// retained — otherwise every distinct instance name leaks a
// SemaphoreSlim (a kernel handle) for the life of the process.
using (await _lockManager.AcquireAsync("transient-inst", TimeSpan.FromSeconds(5)))
{
Assert.Equal(1, _lockManager.TrackedLockCount);
}
// After release, the entry is reclaimed.
Assert.Equal(0, _lockManager.TrackedLockCount);
}
[Fact]
public async Task AcquireAsync_ManyDistinctInstances_DoesNotAccumulateSemaphores()
{
// Simulates the long-running central process: many instances are
// deployed/disabled over time. Their semaphores must be reclaimed.
for (var i = 0; i < 100; i++)
{
using var handle = await _lockManager.AcquireAsync($"churn-{i}", TimeSpan.FromSeconds(5));
}
Assert.Equal(0, _lockManager.TrackedLockCount);
}
[Fact]
public async Task AcquireAsync_ContendedLock_KeepsSemaphoreUntilLastReleaseThenReclaims()
{
// While a second caller is waiting, the semaphore must survive the
// first release; only when the last holder releases is it reclaimed.
var first = await _lockManager.AcquireAsync("contended", TimeSpan.FromSeconds(5));
var secondAcquire = _lockManager.AcquireAsync("contended", TimeSpan.FromSeconds(5));
first.Dispose(); // hands the lock to the waiter; entry must NOT be removed
var second = await secondAcquire;
Assert.Equal(1, _lockManager.TrackedLockCount);
second.Dispose();
Assert.Equal(0, _lockManager.TrackedLockCount);
}
} }

View File

@@ -0,0 +1,49 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Options;
namespace ScadaLink.DeploymentManager.Tests;
/// <summary>
/// DeploymentManager-008: DeploymentManagerOptions must be bound to the
/// "ScadaLink:DeploymentManager" configuration section, consistent with the
/// Options-pattern convention used by the other components.
/// </summary>
public class ServiceCollectionExtensionsTests
{
[Fact]
public void AddDeploymentManager_WithConfiguration_BindsDeploymentManagerOptions()
{
var configuration = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaLink:DeploymentManager:OperationLockTimeout"] = "00:00:09",
["ScadaLink:DeploymentManager:ArtifactDeploymentTimeoutPerSite"] = "00:03:00"
})
.Build();
var services = new ServiceCollection();
services.AddDeploymentManager(configuration);
using var provider = services.BuildServiceProvider();
var options = provider.GetRequiredService<IOptions<DeploymentManagerOptions>>().Value;
Assert.Equal(TimeSpan.FromSeconds(9), options.OperationLockTimeout);
Assert.Equal(TimeSpan.FromMinutes(3), options.ArtifactDeploymentTimeoutPerSite);
}
[Fact]
public void AddDeploymentManager_WithConfiguration_MissingSection_UsesDefaults()
{
var configuration = new ConfigurationBuilder().Build();
var services = new ServiceCollection();
services.AddDeploymentManager(configuration);
using var provider = services.BuildServiceProvider();
var options = provider.GetRequiredService<IOptions<DeploymentManagerOptions>>().Value;
// No section present -> the option-class defaults are retained.
Assert.Equal(TimeSpan.FromSeconds(5), options.OperationLockTimeout);
}
}