- DeploymentManager-008: revert IConfiguration overload (violated OptionsTests component-convention); Host now binds the ScadaLink:DeploymentManager section - SiteStreamGrpcServer: make test-only int ctor internal so DI sees one public ctor (resolves ambiguous-constructor failure in SiteCompositionRootTests) - Host site composition-root test config: supply Cluster:SeedNodes for the new ClusterOptionsValidator
29 KiB
Code Review — DeploymentManager
| Field | Value |
|---|---|
| Module | src/ScadaLink.DeploymentManager |
| Design doc | docs/requirements/Component-DeploymentManager.md |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | 9c60592 |
| Open findings | 5 |
Summary
The DeploymentManager module is small, well-structured, and clearly maps work
packages (WP-N) onto code. The happy paths for instance deployment, lifecycle
commands, artifact broadcast, and staleness comparison are implemented
sensibly, and the operation lock correctly serializes mutating operations per
instance while allowing cross-instance parallelism. However, the review found a
significant cluster of error-handling and resilience gaps: the deployment
record can be left permanently stuck in InProgress when an exception other
than timeout/cancellation is thrown, the catch block writes its failure status
using a cancellation token that may already be cancelled, and the
OperationLockManager leaks one SemaphoreSlim per instance name forever.
There are also two notable design-document adherence gaps: the
"query-the-site-before-redeploy" idempotency requirement is not implemented
(GetDeploymentStatusAsync only reads the local DB), and the "Diff View"
feature is reduced to a bare hash comparison with no added/removed/changed
detail. Configuration is not bound to appsettings.json, leaving one option
entirely dead. Test coverage stops at the communication boundary and never
exercises a successful deployment or the lifecycle success paths.
Checklist coverage
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ✓ | Stuck InProgress record on unexpected exception; cancelled-token failure write. |
| 2 | Akka.NET conventions | ✓ | Module is a plain service layer; it calls CommunicationService which wraps Ask. No actors here. No issues. |
| 3 | Concurrency & thread safety | ✓ | OperationLockManager is sound but leaks semaphores; DeployToAllSitesAsync correctly builds commands sequentially before parallel send. |
| 4 | Error handling & resilience | ✓ | Several gaps — see DeploymentManager-001/002/003/004. |
| 5 | Security | ✓ | SMTP credentials are serialized and broadcast to sites — see DeploymentManager-013. No injection vectors; no authz here (enforced upstream). |
| 6 | Performance & resource management | ✓ | Semaphore leak (DeploymentManager-005); artifact rebuild does N+1 method queries per external system. |
| 7 | Design-document adherence | ✓ | Missing query-before-redeploy (DeploymentManager-006); Diff View not implemented (DeploymentManager-007). |
| 8 | Code organization & conventions | ✓ | Options class not bound to configuration — DeploymentManager-008. POCO/repo placement correct. |
| 9 | Testing coverage | ✓ | No successful-deploy test, no lifecycle success test — DeploymentManager-011; dead CreateCommand helper — DeploymentManager-014. |
| 10 | Documentation & comments | ✓ | Misleading timeout comment — DeploymentManager-009; stale option XML doc — DeploymentManager-012. |
Findings
DeploymentManager-001 — Unexpected exceptions leave the deployment record stuck in InProgress
| Severity | High |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/DeploymentService.cs:141-199 |
Description
DeployInstanceAsync sets the record to InProgress (lines 137-139), then the
try block calls into CommunicationService and the repository. The only
catch filter is when (ex is TimeoutException or OperationCanceledException).
Any other exception — InvalidOperationException (thrown by
CommunicationService.GetCommunicationActor() when the actor is not set), a
JSON serialization error, a deserialization failure of the response, a DB
exception on UpdateDeploymentRecordAsync, or any transport error — escapes the
method. The deployment record remains in DeploymentStatus.InProgress
permanently. Because staleness and the UI both read current status, the
instance is then misreported as "deploying" forever and a re-deploy may be
blocked or misinterpreted. The design explicitly states an interrupted
deployment must be "treated as failed".
Recommendation
Broaden the catch to a general catch (Exception ex) that records
DeploymentStatus.Failed with the error message, audit-logs the failure, and
re-throws or returns a failed Result. Keep the timeout-specific branch only
if a distinct message is desired. Ensure the failure-status write happens for
every exit path out of the try.
Resolution
Resolved 2026-05-16 (commit <pending>): broadened the catch in
DeployInstanceAsync to catch (Exception ex) so any exception (transport,
serialization, DB, InvalidOperationException from an uninitialized
CommunicationService) marks the deployment record Failed with the error
message and audit-logs the failure, instead of escaping and leaving the record
stuck in InProgress. Regression test:
DeployInstanceAsync_CommunicationThrowsUnexpectedException_RecordMarkedFailed.
DeploymentManager-002 — Failure-status write uses a possibly-cancelled cancellation token
| Severity | High |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/DeploymentService.cs:186-196 |
Description
The catch (Exception ex) when (ex is TimeoutException or OperationCanceledException) block updates the record to Failed and calls
UpdateDeploymentRecordAsync/SaveChangesAsync/LogAsync passing the same
cancellationToken that was just cancelled (an OperationCanceledException
caught here means the token is already in the cancelled state). Those
repository and audit calls will themselves throw OperationCanceledException
before the failure status is persisted, so the record stays InProgress — the
exact bug DeploymentManager-001 describes, reached via the supposedly-handled
path.
Recommendation
Perform the cleanup writes with a fresh, non-cancellable token (e.g.
CancellationToken.None, optionally with an independent short timeout) so the
failure status is durably recorded even when the original operation was
cancelled or timed out.
Resolution
Resolved 2026-05-16 (commit <pending>): the broadened catch block now
performs the failure-status write (UpdateDeploymentRecordAsync,
SaveChangesAsync) and the audit LogAsync with CancellationToken.None
instead of the operation's (possibly-cancelled) token, so the Failed status
is durably recorded even after a timeout/cancellation. The cleanup writes are
themselves wrapped in a try/catch that logs (without masking the original
error) if persistence still fails. Regression test:
DeployInstanceAsync_FailureWrite_UsesNonCancellableToken.
DeploymentManager-003 — Successful-deployment cleanup is not atomic with the status write
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/DeploymentService.cs:155-170 |
Description
After a successful site response the code calls UpdateDeploymentRecordAsync
(no SaveChanges yet), then UpdateInstanceAsync, then
StoreDeployedSnapshotAsync (which itself issues Add/Update calls), then a
single SaveChangesAsync at line 170. If StoreDeployedSnapshotAsync throws,
the exception is not caught (see DeploymentManager-001) and the
SaveChangesAsync never runs — the instance state, deployment status, and
snapshot are all left unpersisted even though the site has actually applied the
deployment. Central and site are now divergent: the site is running the new
config but central still shows the old state and a non-Success deployment
record.
Verification: Confirmed against source. The DeploymentManager-001 fix made
this strictly worse, not better — after that fix a snapshot-store failure is
caught and the record is flipped from Success back to Failed, so central
reports a failed deployment while the site is running the new config.
Recommendation
Wrap the post-success persistence so that, at minimum, the deployment record's
Success status is committed. Consider committing the status first, then the
instance state and snapshot, so a later failure does not lose the fact that the
site succeeded. Log loudly if the snapshot write fails after a confirmed site
apply.
Resolution
Resolved 2026-05-16 (commit pending): DeployInstanceAsync now commits the
deployment record's terminal status (UpdateDeploymentRecordAsync +
SaveChangesAsync) immediately after the site confirms the apply, before
touching instance state or the deployed-config snapshot. The post-success
instance-state update and StoreDeployedSnapshotAsync are wrapped in a
best-effort try/catch that logs loudly for operator reconciliation but no
longer flips the already-committed Success record back to Failed.
Regression test:
DeployInstanceAsync_SiteSucceeds_SnapshotWriteFails_RecordStillCommittedSuccess.
DeploymentManager-004 — Site-success but central-delete-failure leaves orphaned site config
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/DeploymentService.cs:312-319 |
Description
In DeleteInstanceAsync, when the site responds Success the code calls
_repository.DeleteInstanceAsync then SaveChangesAsync. If SaveChangesAsync
throws (DB error, concurrency), the exception propagates uncaught: the site has
already destroyed the Instance Actor and removed its config, but the central
instance record still exists. The instance is now un-deletable through the
normal path (the site no longer has it, so a re-issued delete may fail) and is
permanently orphaned. The design states central must not mark the instance
deleted until the site confirms — but it does not address the inverse failure.
Verification: Confirmed against source. DeleteInstanceAsync has no
try/catch around the post-success block, so any exception from
DeleteInstanceAsync/SaveChangesAsync escapes uncaught to the caller.
Recommendation
Catch persistence failures in the post-success block and surface a distinct error indicating the site succeeded but the central record could not be removed, so an operator/retry can reconcile. Consider making the central delete idempotent and retryable independently of the site command.
Resolution
Resolved 2026-05-16 (commit pending): the post-success removal in
DeleteInstanceAsync (DeleteInstanceAsync + SaveChangesAsync) is now
wrapped in a try/catch. A persistence failure no longer escapes uncaught —
it is logged, recorded with a DeleteOrphaned audit entry, and surfaced as a
distinct Result failure stating the site deleted the instance but the central
record is orphaned and must be reconciled. Regression test:
DeleteInstanceAsync_SiteSucceeds_CentralDeleteFails_ReturnsDistinctFailure.
DeploymentManager-005 — OperationLockManager leaks a SemaphoreSlim per instance name
| Severity | Medium |
| Category | Performance & resource management |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/OperationLockManager.cs:15-33 |
Description
AcquireAsync does _locks.GetOrAdd(instanceUniqueName, _ => new SemaphoreSlim(1, 1)) and entries are never removed. Every distinct instance
unique name that is ever deployed/disabled/enabled/deleted permanently adds a
SemaphoreSlim (an IDisposable holding a kernel wait handle) to the
dictionary. Over the lifetime of a long-running central process — especially
with the bulk "deploy all out-of-date instances" workflow and instances that
are created and deleted over time — this is an unbounded leak of both managed
memory and OS handles. Deleted instances' semaphores are never reclaimed.
Verification: Confirmed against source. _locks is a ConcurrentDictionary
with no removal path anywhere in the type.
Recommendation
Either accept the leak explicitly and document the expected bounded cardinality of instance names, or implement reclamation: e.g. ref-count handles and remove
Dispose()the semaphore when the count reaches zero and the lock is free. At minimum, remove the semaphore entry when an instance is deleted (DeleteInstanceAsync).
Resolution
Resolved 2026-05-16 (commit pending): OperationLockManager now ref-counts each
lock entry. A reference is reserved (creating the entry if needed) before the
SemaphoreSlim.WaitAsync, so concurrent waiters for the same instance share one
semaphore and the entry survives until every waiter/holder has released. When
the reference count reaches zero — on release, timeout, or cancellation — the
entry is removed from the dictionary and the semaphore is Dispose()d, so the
process no longer accumulates one kernel wait handle per distinct instance name.
A TrackedLockCount diagnostic property was added to make reclamation testable.
Regression tests: AcquireAsync_ReleasedLock_RemovesSemaphoreEntry,
AcquireAsync_ManyDistinctInstances_DoesNotAccumulateSemaphores,
AcquireAsync_ContendedLock_KeepsSemaphoreUntilLastReleaseThenReclaims.
DeploymentManager-006 — Query-the-site-before-redeploy idempotency requirement not implemented
| Severity | High |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/DeploymentService.cs:84-200,363-368 |
Description
The design ("Deployment Identity & Idempotency") requires: "After a central
failover or timeout, the Deployment Manager queries the site for current
deployment state before allowing a re-deploy. This prevents duplicate
application and out-of-order config changes." The code never does this.
GetDeploymentStatusAsync only reads the local DeploymentRecord from the DB
(GetDeploymentByDeploymentIdAsync) — it does not contact the site.
DeployInstanceAsync unconditionally generates a new deployment ID and sends a
new DeployInstanceCommand regardless of any prior in-flight or timed-out
deployment. After a timeout where the site actually applied the config, a
re-deploy produces a second deployment with no reconciliation against the
site's current revision hash. Site-side stale-rejection is the only safety
net, and that is not verified here.
Recommendation
Add a site query (a new CommunicationService pattern returning the site's
currently-applied deployment ID / revision hash) and call it before re-deploy
when a prior record for the instance is in InProgress/Failed due to
timeout. Reconcile: if the site already has the target revision, mark the prior
record Success instead of re-sending. Either implement this or update the
design doc to reflect that reconciliation is delegated entirely to site-side
stale-rejection.
Resolution
Resolved 2026-05-16 (commit <pending>): implemented the cross-module
query-the-site-before-redeploy idempotency feature across Commons, SiteRuntime,
Communication, and DeploymentManager — new DeploymentStateQueryRequest /
DeploymentStateQueryResponse contracts, a DeploymentManagerActor handler
answering from the site's deployed-config store, a
CommunicationService.QueryDeploymentStateAsync method routed over the
ClusterClient command/control transport, and reconciliation in
DeployInstanceAsync (TryReconcileWithSiteAsync) that queries the site only
when a prior record is InProgress or Failed due to a timeout, marks the
prior record Success without re-sending if the site already has the target
revision hash, and falls through to a normal deploy (relying on site-side
stale-rejection) when the query fails. Regression tests:
RoundTrip_DeploymentStateQueryRequest_Succeeds,
RoundTrip_DeploymentStateQueryResponse_Deployed_Succeeds,
RoundTrip_DeploymentStateQueryResponse_NotDeployed_NullApplied,
DeploymentStateQuery_DeployedInstance_ReturnsAppliedIdentity,
DeploymentStateQuery_UnknownInstance_ReturnsNotDeployed,
DeploymentStateQuery_ForwardedToDeploymentManager,
QueryDeploymentStateAsync_BeforeInitialization_Throws,
QueryDeploymentStateAsync_SendsEnvelopeAndReturnsResponse,
DeployInstanceAsync_PriorInProgressRecord_SiteHasTargetHash_MarksSuccessWithoutRedeploy,
DeployInstanceAsync_PriorInProgressRecord_SiteHasDifferentHash_ProceedsWithDeploy,
DeployInstanceAsync_PriorFailedTimeoutRecord_QueriesSite,
DeployInstanceAsync_PriorSuccessRecord_SkipsSiteQuery,
DeployInstanceAsync_FreshFirstTimeDeploy_SkipsSiteQuery,
DeployInstanceAsync_PriorInProgressRecord_QueryFails_FallsThroughToDeploy.
DeploymentManager-007 — "Diff View" reduced to a hash comparison with no diff detail
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/DeploymentService.cs:334-358,401-406 |
Description
The design ("Diff View" and "Dependencies" sections) states the Deployment
Manager can request a diff from the Template Engine showing added/removed
members, changed values, and connection-binding changes.
GetDeploymentComparisonAsync and DeploymentComparisonResult only compare two
revision hashes and return a boolean IsStale plus the two hashes. No
added/removed/changed detail is produced, and the Template Engine's diff
capability is not invoked. The UI cannot render a meaningful diff from this
result.
Verification: Confirmed against source. The Template Engine already provides
DiffService + ConfigurationDiff (structured Added/Removed/Changed entries
for attributes, alarms, and scripts, including data connection binding fields),
and DiffService is DI-registered — it was simply never wired into the
Deployment Manager's comparison path.
Recommendation
Either implement a real diff (deserialize the stored
DeployedConfigSnapshot.ConfigurationJson and the freshly flattened config and
invoke the Template Engine's diff service, surfacing structured
added/removed/changed entries), or revise the design doc to scope the feature
down to staleness detection only.
Resolution
Resolved 2026-05-16 (commit pending): GetDeploymentComparisonAsync now
deserializes the stored DeployedConfigSnapshot.ConfigurationJson and runs the
Template Engine DiffService against the freshly flattened current
configuration, attaching the resulting ConfigurationDiff (added/removed/changed
attributes, alarms, scripts) to a new optional Diff property on
DeploymentComparisonResult. DiffService is injected into DeploymentService.
A snapshot that cannot be deserialized (corrupt / older schema) still yields the
hash-based staleness result with a null diff, logged at warning level.
Regression test: GetDeploymentComparisonAsync_ProducesStructuredDiff.
DeploymentManager-008 — DeploymentManagerOptions is never bound to configuration
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | src/ScadaLink.DeploymentManager/ServiceCollectionExtensions.cs:7-14 |
Description
AddDeploymentManager registers the services but never calls
services.Configure<DeploymentManagerOptions>(configuration.GetSection(...)).
IOptions<DeploymentManagerOptions> therefore always resolves to a
default-constructed instance — the operation-lock and artifact-deployment
timeouts cannot be tuned via appsettings.json, contrary to the CLAUDE.md
convention "Per-component configuration via appsettings.json sections bound
to options classes (Options pattern)." Host/Program.cs binds
SecurityOptions and InboundApiOptions from configuration sections but has
no equivalent for DeploymentManagerOptions.
Verification: Confirmed against source. Neither AddDeploymentManager nor
Host/Program.cs binds DeploymentManagerOptions.
Recommendation
Add an IConfiguration parameter (or a configure callback) to
AddDeploymentManager and bind DeploymentManagerOptions to a section such as
ScadaLink:DeploymentManager, consistent with the other components.
Resolution
Resolved 2026-05-16 (commit pending): AddDeploymentManager() now calls
services.AddOptions<DeploymentManagerOptions>() so IOptions<DeploymentManagerOptions>
is always resolvable, and Host/Program.cs binds the
ScadaLink:DeploymentManager section (exposed as
ServiceCollectionExtensions.OptionsSection) via
services.Configure<DeploymentManagerOptions>(...) — the same pattern the Host
uses for SecurityOptions/InboundApiOptions. An earlier attempt added an
AddDeploymentManager(IConfiguration) overload; that was reverted because the
project convention (enforced by Host.Tests.OptionsTests) forbids component
Add* methods from depending on IConfiguration — the Host owns
configuration binding. Regression tests:
AddDeploymentManager_RegistersResolvableOptions_WithDefaults,
AddDeploymentManager_OptionsBindToConfigurationSection_AsTheHostWires,
OptionsSection_MatchesTheConventionalComponentSectionPath.
DeploymentManager-009 — Misleading timeout comment on DeleteInstanceAsync
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | src/ScadaLink.DeploymentManager/DeploymentService.cs:288 |
Description
The XML doc says "Delete fails if site unreachable (30s timeout via
CommunicationOptions)." The actual delete timeout is whatever
CommunicationOptions.LifecycleTimeout is configured to (passed inside
CommunicationService.DeleteInstanceAsync); the "30s" figure is hard-coded
into the comment and not derived from any constant in this module. If
LifecycleTimeout is reconfigured, the comment becomes wrong. It also wrongly
implies the value lives in this module.
Recommendation
Reword to "Delete fails if the site is unreachable within
CommunicationOptions.LifecycleTimeout" without quoting a specific number.
Resolution
Unresolved.
DeploymentManager-010 — SystemArtifactDeploymentRecord does not persist the deployment ID
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | src/ScadaLink.DeploymentManager/ArtifactDeploymentService.cs:136,194-211 |
Description
DeployToAllSitesAsync generates a deploymentId (line 136) and returns it in
the ArtifactDeploymentSummary and audit log, but the persisted
SystemArtifactDeploymentRecord has no field for it (the entity only has Id,
ArtifactType, DeployedBy, DeployedAt, PerSiteStatus). The deployment ID
that appears in the UI summary and audit log cannot be correlated back to the
stored record. Additionally each per-site DeployArtifactsCommand carries its
own separate GUID (BuildDeployArtifactsCommandAsync line 114), so there are in
fact N+1 unrelated IDs for one logical artifact deployment.
Recommendation
Add a DeploymentId column to SystemArtifactDeploymentRecord and store the
single logical deploymentId; reuse that ID (or a derived per-site ID) for the
per-site commands so the audit log, UI summary, and persisted record agree.
Resolution
Unresolved.
DeploymentManager-011 — Tests never exercise a successful deployment or lifecycle success path
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | tests/ScadaLink.DeploymentManager.Tests/DeploymentServiceTests.cs:100-151,155-199 |
Description
DeploymentServiceTests never sets the CommunicationService actor, so every
deploy/lifecycle test deliberately stops at the InvalidOperationException
thrown by GetCommunicationActor() (see lines 118-125, 147). As a result there
is no test covering: a successful deployment (DeploymentStatus.Success
response → instance state set to Enabled, snapshot stored, audit logged); a
failed-but-handled site response; the InProgress-stuck bug
(DeploymentManager-001); successful Disable/Enable/Delete; or the operation
lock actually serializing two concurrent deploys of the same instance. The
critical post-response branch (DeploymentService.cs:154-184) and the entire
delete/disable/enable success path are untested. The AuditLogs test
(lines 277-289) asserts nothing.
Verification: Partially confirmed. By the time this finding was being
resolved, the DeploymentManager-006 fix had already introduced a TestKit-actor
seam (CreateServiceWithCommActor + ReconcileProbeActor) and successful-deploy
tests. The genuinely-still-missing coverage was: successful Disable/Enable/Delete
paths, per-instance lock serialization during deploy, and the assertionless
AuditLogs test — those gaps were addressed.
Recommendation
Introduce a seam to inject a fake/substitute communication path (e.g. an
interface over CommunicationService, or wire a TestKit actor) so success and
handled-failure paths can be unit tested. Add tests for the stuck-InProgress
scenario and for per-instance lock contention during deploy. Make the audit
test assert on IAuditService.LogAsync.
Resolution
Resolved 2026-05-16 (commit pending): extended the TestKit-actor seam
(ReconcileProbeActor now also answers lifecycle commands) and added the
missing coverage — successful Disable/Enable/Delete (state transition + audit
assertions), a successful-deploy audit assertion, and per-instance lock
serialization via a new deferred-reply SerializationProbeActor that asserts a
single instance's concurrent deploys never overlap. The assertionless AuditLogs
test was replaced with DeployInstanceAsync_FlatteningFails_DoesNotReachAudit,
which asserts on IAuditService.LogAsync. Regression tests:
DisableInstanceAsync_SiteSucceeds_SetsDisabledStateAndAudits,
EnableInstanceAsync_SiteSucceeds_SetsEnabledStateAndAudits,
DeleteInstanceAsync_SiteSucceeds_RemovesRecordAndAudits,
DeployInstanceAsync_SiteSucceeds_WritesDeployAuditEntry,
DeployInstanceAsync_FlatteningFails_DoesNotReachAudit,
DeployInstanceAsync_SameInstance_OperationLockSerializesConcurrentDeploys.
DeploymentManager-012 — LifecycleCommandTimeout option is dead code
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | src/ScadaLink.DeploymentManager/DeploymentManagerOptions.cs:8-9 |
Description
DeploymentManagerOptions.LifecycleCommandTimeout is declared with a 30s
default and an XML doc, but it is never read anywhere in the codebase
(lifecycle commands rely on CommunicationOptions.LifecycleTimeout inside
CommunicationService). The option misleads readers into thinking it controls
disable/enable/delete timeouts, when setting it has no effect.
Recommendation
Remove LifecycleCommandTimeout, or actually thread it through to the
lifecycle command calls (e.g. by creating a linked CTS with this timeout in
DisableInstanceAsync/EnableInstanceAsync/DeleteInstanceAsync, the way
ArtifactDeploymentTimeoutPerSite is used).
Resolution
Unresolved.
DeploymentManager-013 — SMTP credentials serialized and broadcast to all sites
| Severity | Low |
| Category | Security |
| Status | Open |
| Location | src/ScadaLink.DeploymentManager/ArtifactDeploymentService.cs:108-111 |
Description
BuildDeployArtifactsCommandAsync maps smtp.Credentials directly into
SmtpConfigurationArtifact and that command is sent to every site. Distributing
SMTP credentials to sites is consistent with the design (SMTP configuration is
a deployable artifact), but the credentials travel inside a serialized command
across the inter-cluster transport and are stored on each site's SQLite. There
is no indication the value is encrypted at rest on the site or scrubbed from
logs. Worth confirming the transport is TLS-protected and the site stores the
credential securely; at minimum this should be a conscious, documented decision.
Recommendation
Confirm inter-cluster transport encryption covers artifact commands, ensure
Credentials is never written to logs, and document the at-rest protection of
SMTP credentials on site SQLite. Consider encrypting the credential field
within the artifact payload.
Resolution
Unresolved.
DeploymentManager-014 — Dead CreateCommand helper in artifact tests
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | tests/ScadaLink.DeploymentManager.Tests/ArtifactDeploymentServiceTests.cs:86-90 |
Description
The private static CreateCommand() helper is never referenced by any test in
the file. It is dead code that suggests an intended test (e.g. a successful
multi-site artifact deployment) was never written — coverage of
DeployToAllSitesAsync is limited to the no-sites failure case, and
RetryForSiteAsync and BuildDeployArtifactsCommandAsync have no tests at all.
Recommendation
Either remove the unused helper or, preferably, write the missing tests for
DeployToAllSitesAsync (per-site success/failure matrix, partial failure) and
RetryForSiteAsync using it.
Resolution
Unresolved.