docs(code-reviews): re-review batch 4 at 39d737e — SiteEventLogging, SiteRuntime, StoreAndForward, TemplateEngine

11 new findings: SiteEventLogging-012..014, SiteRuntime-017..019, StoreAndForward-015..017, TemplateEngine-015..016.
2026-05-17 00:51:58 -04:00
parent 3b3760f026
commit 0ba4e49e11
5 changed files with 613 additions and 27 deletions
--- a/code-reviews/README.md
+++ b/code-reviews/README.md
@@ -41,9 +41,9 @@ module file and counted in **Total**.
 |----------|---------------|
 | Critical | 0 |
 | High | 8 |
-| Medium | 20 |
-| Low | 27 |
-| **Total** | **55** |
+| Medium | 26 |
+| Low | 32 |
+| **Total** | **66** |

 ## Module Status

@@ -64,10 +64,10 @@ module file and counted in **Total**.
 | [ManagementService](ManagementService/findings.md) | 2026-05-16 | `9c60592` | 0/1/1/2 | 4 | 17 |
 | [NotificationService](NotificationService/findings.md) | 2026-05-16 | `9c60592` | 0/2/1/2 | 5 | 18 |
 | [Security](Security/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/2 | 4 | 15 |
-| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 11 |
-| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 16 |
-| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 14 |
-| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 14 |
+| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-16 | `9c60592` | 0/0/1/2 | 3 | 14 |
+| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-16 | `9c60592` | 0/0/1/2 | 3 | 19 |
+| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/1 | 3 | 17 |
+| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/0 | 2 | 16 |

 ## Pending Findings

@@ -93,7 +93,7 @@ _None open._
 | NotificationService-014 | [NotificationService](NotificationService/findings.md) | OAuth2 token-fetch failure escapes `DeliverBufferedAsync`; a permanently-broken config is retried forever |
 | NotificationService-015 | [NotificationService](NotificationService/findings.md) | Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script |

-### Medium (20)
+### Medium (26)

 | ID | Module | Title |
 |----|--------|-------|
@@ -117,8 +117,14 @@ _None open._
 | NotificationService-016 | [NotificationService](NotificationService/findings.md) | `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials |
 | Security-012 | [Security](Security/findings.md) | Partial LDAP failure during login yields a roleless authenticated session |
 | Security-014 | [Security](Security/findings.md) | `RefreshToken` re-issues a token without checking the idle timeout |
+| SiteEventLogging-012 | [SiteEventLogging](SiteEventLogging/findings.md) | Dropped events report success: `Task` is completed, not faulted, when the event cannot be persisted |
+| SiteRuntime-017 | [SiteRuntime](SiteRuntime/findings.md) | Instance Actor's live `_attributes` dictionary is shared by reference into child actor constructors |
+| StoreAndForward-015 | [StoreAndForward](StoreAndForward/findings.md) | `EnqueueAsync`'s public contract never documents that `maxRetries == 0` means "retry forever" |
+| StoreAndForward-016 | [StoreAndForward](StoreAndForward/findings.md) | Operator-initiated parked-message retry and discard are not replicated to the standby |
+| TemplateEngine-015 | [TemplateEngine](TemplateEngine/findings.md) | `RenameCompositionAsync` does not cascade-rename nested derived templates |
+| TemplateEngine-016 | [TemplateEngine](TemplateEngine/findings.md) | Composed-script `ScriptScope.ParentPath` is always empty, breaking `Parent.X` resolution for nested modules |

-### Low (27)
+### Low (32)

 | ID | Module | Title |
 |----|--------|-------|
@@ -149,3 +155,8 @@ _None open._
 | NotificationService-018 | [NotificationService](NotificationService/findings.md) | Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed |
 | Security-013 | [Security](Security/findings.md) | `ExtractFirstRdnValue` mis-parses group DNs containing escaped commas |
 | Security-015 | [Security](Security/findings.md) | Username is not trimmed before use in the LDAP filter, fallback DN, and JWT claims |
+| SiteEventLogging-013 | [SiteEventLogging](SiteEventLogging/findings.md) | Keyword search does not escape SQL `LIKE` wildcards in user input |
+| SiteEventLogging-014 | [SiteEventLogging](SiteEventLogging/findings.md) | Initial purge runs synchronously on the host startup thread |
+| SiteRuntime-018 | [SiteRuntime](SiteRuntime/findings.md) | `ScriptExecutionActor` XML doc still claims a "dedicated blocking I/O dispatcher" |
+| SiteRuntime-019 | [SiteRuntime](SiteRuntime/findings.md) | Dead `DisableInstanceCommand` / `EnableInstanceCommand` handlers in `InstanceActor` |
+| StoreAndForward-017 | [StoreAndForward](StoreAndForward/findings.md) | Retry/Discard activity-log entries hard-code the `ExternalSystem` category |
--- a/code-reviews/SiteEventLogging/findings.md
+++ b/code-reviews/SiteEventLogging/findings.md
@@ -5,10 +5,10 @@
 | Module | `src/ScadaLink.SiteEventLogging` |
 | Design doc | `docs/requirements/Component-SiteEventLogging.md` |
 | Status | Reviewed |
-| Last reviewed | 2026-05-16 |
+| Last reviewed | 2026-05-17 |
 | Reviewer | claude-agent |
-| Commit reviewed | `9c60592` |
-| Open findings | 0 |
+| Commit reviewed | `39d737e` |
+| Open findings | 3 |

 ## Summary

@@ -28,16 +28,33 @@ cluster-singleton placement of the handler actor (which can pin to the standby
 node), missing indexes for common query filters, retention/cap purge not enforcing
 the requirement strictly, and several documentation/maintainability issues.

+#### Re-review 2026-05-17 (commit `39d737e`)
+
+Re-reviewed the module at commit `39d737e`. All eleven prior findings remain closed
+(SiteEventLogging-001..003, 005..011 Resolved; 004 Won't Fix) and the resolutions
+hold up under inspection — the background writer, lock-guarded `WithConnection`,
+`auto_vacuum = INCREMENTAL` plus logical-size measurement, the severity index, and
+the concrete-recorder DI wiring are all present and correct at this commit. The
+module source is byte-identical between `39d737e` and current `HEAD`, so this review
+reflects the live code. Three new findings were recorded, all low-to-medium and none
+regressions of prior fixes. The most notable (SiteEventLogging-012) is a correctness
+gap left by the SiteEventLogging-005 background-writer rework: when an event cannot
+be persisted because the logger has been disposed, the returned `Task` is completed
+*successfully* rather than faulted, so an `await`-ing caller is told a dropped audit
+event was written. The other two are minor: unescaped SQL `LIKE` wildcards in the
+keyword-search filter (SiteEventLogging-013) and the initial purge running
+synchronously on the host startup thread (SiteEventLogging-014).
+
 ## Checklist coverage

 | # | Category | Examined | Notes |
 |---|----------|----------|-------|
-| 1 | Correctness & logic bugs | ☑ | `incremental_vacuum` no-op breaks cap purge (-001); over-delete on cap (-002). |
+| 1 | Correctness & logic bugs | ☑ | `incremental_vacuum` no-op breaks cap purge (-001); over-delete on cap (-002). Re-review: dropped events report success (-012); `LIKE` wildcards unescaped in keyword search (-013). |
 | 2 | Akka.NET conventions | ☑ | Handler actor has no supervision/correlation concerns of its own; singleton placement issue (-004). `Ask` boundary is appropriate. |
 | 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). |
 | 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. |
 | 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. |
-| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). |
+| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: initial purge blocks host startup thread (-014). |
 | 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). |
 | 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). |
 | 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
@@ -529,3 +546,122 @@ explanatory note added to `AddSiteEventLogging` pointing readers to where the ac
 is actually registered. Documentation/dead-code change only; no regression test was
 added — the change is a method removal verified by the compiler (no callers) and the
 full module suite still passing.
+
+### SiteEventLogging-012 — Dropped events report success: `Task` is completed, not faulted, when the event cannot be persisted
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:160-166,193-197` |
+
+**Description**
+
+`LogEventAsync` returns a `Task` that, per the interface XML doc (corrected under
+SiteEventLogging-009), "completes once the event is durably persisted and faults if
+the write fails, so callers that `await` it observe success or failure." Two paths
+break that contract by signalling **success** for an event that was never written:
+
+1. In `LogEventAsync`, if `_writeQueue.Writer.TryWrite(pending)` fails (the channel
+   has been completed because the logger was disposed), the code calls
+   `pending.Completion.TrySetResult()` — completing the `Task` successfully — even
+   though the comment immediately above acknowledges "there is nowhere to persist the
+   event."
+2. In `ProcessWriteQueueAsync`, `WithConnection` returns `false` when the logger has
+   been disposed mid-drain. The code does not inspect the returned `written` flag and
+   unconditionally calls `pending.Completion.TrySetResult()`, again reporting success
+   for an event the comment admits "simply cannot be persisted."
+
+The event log is the site's diagnostic audit trail. A caller that `await`s
+`LogEventAsync` to confirm a critical event (deployment applied, alarm activated) was
+recorded will observe a *successful* completion for an event that was silently
+dropped. This is the same class of defect SiteEventLogging-008 fixed for write
+*errors* — but the disposed-drop path was left reporting false success. The window
+is the disposal/shutdown interval, during which shutdown-related events (graceful
+singleton handover, instance disable) are exactly the events most likely to be
+enqueued and lost.
+
+**Recommendation**
+
+For both paths, fault the `Task` (or complete it with a sentinel failure) instead of
+`TrySetResult()` — e.g. `pending.Completion.TrySetException(new ObjectDisposedException(...))`
+— so an `await`-ing caller can distinguish a dropped event from a persisted one.
+Inspect the `written` flag returned by `WithConnection` in `ProcessWriteQueueAsync`
+and only call `TrySetResult()` when `written` is `true`. Update the XML doc if a
+deliberate "drop silently on shutdown" semantics is chosen instead.
+
+**Resolution**
+
+_Unresolved._
+
+### SiteEventLogging-013 — Keyword search does not escape SQL `LIKE` wildcards in user input
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:79-83` |
+
+**Description**
+
+The keyword-search filter builds the `LIKE` pattern as `$"%{request.KeywordFilter}%"`
+and binds it as a parameter. Parameterisation correctly prevents SQL injection, but
+it does **not** neutralise the `LIKE` metacharacters `%` and `_` inside the
+user-supplied keyword. A search for a literal `_` (common in event sources and
+identifiers such as `store_and_forward`, `PLC_1`, or instance IDs) is interpreted as
+"match any single character", and a `%` matches any run of characters. The design
+calls keyword search "free-text search on message and source fields ... Useful for
+finding events by script name, alarm name, or error message" — users will reasonably
+expect a literal substring match, so a query for `store_and_forward` silently returns
+events containing `storeXandYforward` and similar false positives. There is no way
+for the caller to search for a literal underscore or percent.
+
+**Recommendation**
+
+Escape `%`, `_`, and the escape character itself in `request.KeywordFilter` before
+wrapping it in `%...%`, and append an `ESCAPE` clause to the `LIKE` expression
+(e.g. `... LIKE $keyword ESCAPE '\'`). Alternatively document that the keyword field
+accepts `LIKE` wildcard syntax, but a literal-substring match is the behaviour the
+design implies.
+
+**Resolution**
+
+_Unresolved._
+
+### SiteEventLogging-014 — Initial purge runs synchronously on the host startup thread
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Performance & resource management |
+| Status | Open |
+| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:34-48` |
+
+**Description**
+
+`EventLogPurgeService.ExecuteAsync` calls `RunPurge()` (a fully synchronous method
+that runs `PurgeByRetention` and `PurgeByStorageCap`) *before* the first `await`
+(`await timer.WaitForNextTickAsync(...)`). A `BackgroundService`'s `ExecuteAsync` is
+invoked from `StartAsync`, and the host's startup pipeline does not proceed past a
+`BackgroundService` until its `ExecuteAsync` yields at the first real `await`. Because
+`RunPurge()` precedes any `await`, the entire initial purge — including a cap-purge
+that deletes rows in 1000-row batches and runs `PRAGMA incremental_vacuum` until a
+near-1 GB database is back under the cap — executes inline on the startup thread,
+blocking host startup (and therefore the `/health/ready` gate) for as long as the
+purge takes. On a site that has accumulated a large log this can be a multi-second
+stall during every node start/failover. The class doc states the service "runs on a
+background thread and does not block event recording" — the startup-thread block is
+inconsistent with that intent.
+
+**Recommendation**
+
+Yield before the initial purge so it runs on the background scheduler rather than the
+startup thread — e.g. `await Task.Yield();` as the first statement of `ExecuteAsync`,
+or move the initial `RunPurge()` to after the first `await timer.WaitForNextTickAsync`
+(accepting a one-interval delay), or offload it with `await Task.Run(RunPurge, stoppingToken)`.
+
+**Resolution**
+
+_Unresolved._
--- a/code-reviews/SiteRuntime/findings.md
+++ b/code-reviews/SiteRuntime/findings.md
@@ -5,10 +5,10 @@
 | Module | `src/ScadaLink.SiteRuntime` |
 | Design doc | `docs/requirements/Component-SiteRuntime.md` |
 | Status | Reviewed |
-| Last reviewed | 2026-05-16 |
+| Last reviewed | 2026-05-17 |
 | Reviewer | claude-agent |
-| Commit reviewed | `9c60592` |
-| Open findings | 0 |
+| Commit reviewed | `39d737e` |
+| Open findings | 3 |

 ## Summary

@@ -28,6 +28,24 @@ in a comment but ships it anyway). Test coverage exists for the coordinator acto
 persistence and scripting, but the short-lived execution actors, the replication
 actor, and the repositories are untested.

+#### Re-review 2026-05-17 (commit `39d737e`)
+
+The module was re-reviewed at commit `39d737e`. No source under
+`src/ScadaLink.SiteRuntime` has changed since the previous review at `9c60592`
+(the only intervening commits are code-review documentation updates), so all of
+SiteRuntime-001..013, 015, 016 remain Resolved and SiteRuntime-014 remains
+Deferred — its Deferred justification (a trigger-evaluation concurrency design
+decision is required before either recommended fix can land in-module) still
+holds verbatim against the unchanged `ScriptActor`/`AlarmActor` source. The
+re-review nonetheless worked through all 10 checklist categories afresh and
+surfaced three new findings that the prior pass did not record: a cross-thread
+`Dictionary` enumeration race when the Instance Actor's live `_attributes`
+dictionary is handed by reference into child `ScriptActor`/`AlarmActor`
+constructors (SiteRuntime-017, Medium); a stale `ScriptExecutionActor` XML doc
+that still claims a "dedicated blocking I/O dispatcher" (SiteRuntime-018, Low);
+and two dead lifecycle handlers in `InstanceActor` that the Deployment Manager
+never routes to (SiteRuntime-019, Low). Open findings: 3.
+
 ## Checklist coverage

 | # | Category | Examined | Notes |
@@ -733,3 +751,126 @@ harness is a larger test-infrastructure task tracked separately and out of scope
 Low-severity coverage finding; the highest-value untested paths the finding called out
 (script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests
 green.
+
+### SiteRuntime-017 — Instance Actor's live `_attributes` dictionary is shared by reference into child actor constructors
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Status | Open |
+| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:625`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:675`, `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:83`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:93` |
+
+**Description**
+
+`InstanceActor.CreateChildActors` passes the Instance Actor's own mutable
+`_attributes` field (a plain `Dictionary<string, object?>`) by reference into the
+`Props.Create(...)` factory for every `ScriptActor` and `AlarmActor` (as the
+`initialAttributes` constructor argument). Each child constructor then iterates
+that dictionary to seed its `_attributeSnapshot`:
+
+```csharp
+if (initialAttributes != null)
+    foreach (var kvp in initialAttributes)
+        _attributeSnapshot[kvp.Key] = kvp.Value;
+```
+
+`Context.ActorOf` returns immediately; the child actor's constructor runs later on
+the *child's* mailbox thread. Meanwhile the Instance Actor's `PreStart` returns and
+the Instance Actor begins processing its mailbox — `HandleTagValueUpdate` and
+`HandleAttributeValueChanged` both mutate `_attributes` (`_attributes[...] = ...`).
+A DCL tag update that arrives before a child has finished its constructor copy
+therefore mutates the dictionary on the Instance Actor thread while the child
+thread is enumerating it. `Dictionary<,>` is explicitly not safe for concurrent
+read/write: the enumeration can throw `InvalidOperationException` ("collection was
+modified") — which surfaces as an `ActorInitializationException` and, under the
+Instance Actor's `SupervisorStrategy`, **stops** the child (the strategy returns
+`Stop` for `ActorInitializationException`). The script or alarm is then silently
+absent for the life of the instance. A torn read of an entry is also possible. The
+window is small but deterministically reachable on a busy site at startup/failover
+— exactly the staggered-startup scenario the design is most concerned about.
+
+**Recommendation**
+
+Do not share the live dictionary. Snapshot it on the Instance Actor thread before
+constructing the child — e.g. pass `new Dictionary<string, object?>(_attributes)`
+(or an immutable copy) into each `Props.Create`. The copy is made on the Instance
+Actor thread inside `CreateChildActors`, so it is race-free, and each child gets a
+private dictionary to seed from.
+
+**Resolution**
+
+_Unresolved._
+
+### SiteRuntime-018 — `ScriptExecutionActor` XML doc still claims a "dedicated blocking I/O dispatcher"
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Documentation & comments |
+| Status | Open |
+| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs:17` |
+
+**Description**
+
+The class-level XML summary on `ScriptExecutionActor` states "Runs on a dedicated
+blocking I/O dispatcher." That is not what the code does. SiteRuntime-009 was
+resolved by introducing `ScriptExecutionScheduler` (a bounded dedicated
+`TaskScheduler`); the *actor itself and its mailbox* run on the **default** Akka
+dispatcher, and only the script body runs on the scheduler's threads via
+`Task.Factory.StartNew(..., scheduler)`. The resolution of SiteRuntime-009
+explicitly chose the `TaskScheduler` route *instead of* a HOCON dispatcher and
+even removed the "in production, configure a dedicated dispatcher" comments
+elsewhere — but this stale summary line was missed. A reader is told the actor is
+on a dedicated dispatcher when it is not, which is misleading when reasoning about
+mailbox throughput and thread-pool pressure. (`AlarmExecutionActor` does not carry
+the equivalent claim — its summary only says "Same pattern as ScriptExecutionActor.")
+
+**Recommendation**
+
+Correct the summary to describe the actual model: the actor runs on the default
+dispatcher and the script body is dispatched onto the dedicated
+`ScriptExecutionScheduler` (SiteRuntime-009). Align the wording with the accurate
+comment already present at `ScriptExecutionActor.cs:71-73`.
+
+**Resolution**
+
+_Unresolved._
+
+### SiteRuntime-019 — Dead `DisableInstanceCommand` / `EnableInstanceCommand` handlers in `InstanceActor`
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:106`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:113` |
+
+**Description**
+
+`InstanceActor`'s constructor registers `Receive<DisableInstanceCommand>` and
+`Receive<EnableInstanceCommand>` handlers that log and reply with a successful
+`InstanceLifecycleResponse`. These handlers are unreachable. The Deployment Manager
+is the only sender of those commands, and `DeploymentManagerActor.HandleDisable` /
+`HandleEnable` handle the lifecycle entirely themselves — they call
+`Context.Stop(actor)` (disable) or `CreateInstanceActor(...)` (enable) directly and
+reply to the original sender from the Deployment Manager. Neither command is ever
+`Forward`-ed or `Tell`-ed to the Instance Actor. The handlers are dead code, and
+they are actively misleading: a maintainer reading `InstanceActor` would reasonably
+believe disable/enable is partly an Instance-Actor responsibility, and the no-op
+"true" reply implies an instance-side acknowledgement contract that does not exist.
+If a future change *did* route these commands here, the disable handler would do
+nothing useful (it does not stop children or tear down state — Akka does that when
+the parent stops the actor).
+
+**Recommendation**
+
+Remove the two `Receive<...>` registrations and their handler bodies from
+`InstanceActor`, since the Deployment Manager owns the disable/enable lifecycle.
+If the intent is to keep them for a future instance-side hook, add an XML comment
+stating that the Deployment Manager currently handles these and the handlers are a
+reserved placeholder — but removal is preferred.
+
+**Resolution**
+
+_Unresolved._
--- a/code-reviews/StoreAndForward/findings.md
+++ b/code-reviews/StoreAndForward/findings.md
@@ -5,10 +5,10 @@
 | Module | `src/ScadaLink.StoreAndForward` |
 | Design doc | `docs/requirements/Component-StoreAndForward.md` |
 | Status | Reviewed |
-| Last reviewed | 2026-05-16 |
+| Last reviewed | 2026-05-17 |
 | Reviewer | claude-agent |
-| Commit reviewed | `9c60592` |
-| Open findings | 0 (3 Deferred: 002, 011, 012 — see notes) |
+| Commit reviewed | `39d737e` |
+| Open findings | 3 (3 Deferred: 002, 011, 012 — see notes) |

 ## Summary

@@ -30,20 +30,45 @@ status set, and untested critical paths (retry-due timing, replication-from-acti
 the actor bridge). None of the findings are blockers for compilation, but the
 replication and retry-count issues are functional defects against the design.

+#### Re-review 2026-05-17 (commit `39d737e`)
+
+Re-reviewed at commit `39d737e` after the batch-3 fixes. All of findings 001 and
+003–010, plus 013–014, are confirmed `Resolved` against the current source: the
+replication wiring (`BufferAsync`/`ReplicateRemove`/`ReplicatePark`), the corrected
+retry-count semantics, the conditional `UpdateMessageIfStatusAsync` writes, the
+transactioned parked-message reads, the `PipeTo` refactor, the `RaiseActivity`
+hardening, the `RetryParkedMessageAsync` `last_attempt_at` reset and the database
+directory creation are all present as described. Findings 002, 011 and 012 were
+re-verified and remain validly `Deferred` — their preconditions are unchanged (002's
+residual no-handler gap, 011's Commons-owned enum, 012's Commons-owned entity placement).
+
+This pass surfaced **three new findings**. StoreAndForward-015 records the
+StoreAndForward side of the cross-module `MaxRetries == 0` ambiguity flagged by
+ExternalSystemGateway-015: `EnqueueAsync`'s public contract documents `maxRetries` only
+as "parked once `MaxRetries` is reached" and never states the `0 = no limit / retry
+forever` special case that `RetryMessageAsync` actually enforces, so an ESG caller
+passing `0` to mean "never retry" gets the opposite behaviour with no warning from the
+S&F API surface. StoreAndForward-016 records that operator-initiated parked-message
+retry and discard are not replicated to the standby — only the add/remove/park sweep
+paths are — so a failover diverges the standby buffer from the active one.
+StoreAndForward-017 records that the Retry/Discard activity-log entries hard-code the
+`ExternalSystem` category, mislabelling notification and cached-DB-write messages in
+the site event log.
+
 ## Checklist coverage

 | # | Category | Examined | Notes |
 |---|----------|----------|-------|
-| 1 | Correctness & logic bugs | ☑ | Off-by-one in retry counting (003); parked-message retry timing (010). |
+| 1 | Correctness & logic bugs | ☑ | Off-by-one in retry counting (003); parked-message retry timing (010); Retry/Discard activity log hard-codes the category (017). |
 | 2 | Akka.NET conventions | ☑ | `ContinueWith` used instead of `PipeTo`-friendly continuations; default supervision; see 007. |
 | 3 | Concurrency & thread safety | ☑ | Sweep guarded by `Interlocked`, but no guard against retry-vs-manage races (005); `OnActivity` event not thread-safe (009). |
-| 4 | Error handling & resilience | ☑ | Replication never invoked from active path (001); no-handler messages buffered then stuck (002). |
+| 4 | Error handling & resilience | ☑ | Replication never invoked from active path (001); no-handler messages buffered then stuck (002); operator retry/discard not replicated to standby (016). |
 | 5 | Security | ☑ | No issues found — parameterised SQL throughout; no secrets handled directly; payload JSON treated opaquely. |
 | 6 | Performance & resource management | ☑ | New SQLite connection per call; multi-statement operations not wrapped in a transaction (006, 008). |
 | 7 | Design-document adherence | ☑ | Replication gap (001); `InFlight` status undocumented/unused (011); "retrying" status from design doc not modelled. |
 | 8 | Code organization & conventions | ☑ | `StoreAndForwardMessage` is an entity-like POCO living in the component, not Commons (012). |
 | 9 | Testing coverage | ☑ | Retry-due timing, replication-from-active, and `ParkedMessageHandlerActor` are untested (013). |
-| 10 | Documentation & comments | ☑ | XML doc on `RegisterDeliveryHandler` contract is inconsistent with code (004). |
+| 10 | Documentation & comments | ☑ | XML doc on `RegisterDeliveryHandler` contract is inconsistent with code (004); `EnqueueAsync` never documents the `maxRetries == 0` = "retry forever" special case (015). |

 ## Findings

@@ -703,3 +728,158 @@ and bare filenames are skipped). Regression test
 `InitializeAsync_FileInMissingDirectory_CreatesDirectory` fails against the pre-fix code;
 all six `SiteActorPathTests` now pass. Fixed by the commit whose message references
 `StoreAndForward-014`.
+
+### StoreAndForward-015 — `EnqueueAsync`'s public contract never documents that `maxRetries == 0` means "retry forever"
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Documentation & comments |
+| Status | Open |
+| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:114`–`:130`, `:285` |
+
+**Description**
+
+The re-review brief asks for the StoreAndForward side of the cross-module ambiguity
+recorded as `ExternalSystemGateway-015`. The semantics are split across this module and
+its callers, and the StoreAndForward side carries a genuine documentation/API-contract
+fault:
+
+- `RetryMessageAsync` parks a message only when `message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries`
+  (`StoreAndForwardService.cs:285`). When `MaxRetries == 0` the guard is false on every
+  sweep, so a `0` value means **"no limit — retry forever"**. The
+  `StoreAndForwardMessage.MaxRetries` XML doc (`StoreAndForwardMessage.cs:31`) does state
+  `"0 = no limit"`, so the persistence model is internally consistent.
+- But `EnqueueAsync` — the *only* public entry point into the engine — exposes a
+  `maxRetries` parameter (`StoreAndForwardService.cs:128`) with **no parameter
+  documentation at all**, and its method summary (lines 114–122) describes the lifecycle
+  only as "On max retries → park" / "parked once `MaxRetries` is reached" (see also the
+  `_deliveryHandlers` field doc, line 50–51). Nothing on the public surface tells a
+  caller that passing `0` flips the meaning from "park immediately / never retry" to
+  "retry forever". A caller reading only `EnqueueAsync` would reasonably assume `0`
+  retries means zero retries.
+- This is exactly the trap ESG fell into: `ExternalSystemClient.CachedCallAsync` /
+  `DatabaseGateway.CachedWriteAsync` pass the source entity's `MaxRetries` verbatim,
+  intending `0` to mean "never retry", and instead get unbounded retry — the
+  duplicate-delivery / unbounded-buffer-growth hazard the design doc's idempotency note
+  warns against. The fault is not solely ESG's: the S&F public API silently overloads
+  `0` with the opposite of its natural reading and does not document it.
+
+The defect is in this module's API contract and documentation, so it is recorded here
+in addition to `ExternalSystemGateway-015`. (Whether `0` *should* mean "no limit" or
+"no retry" is the cross-module design decision tracked by ESG-015; this finding is
+specifically that the StoreAndForward public surface fails to document whichever
+meaning is chosen.)
+
+**Recommendation**
+
+Document the `maxRetries` parameter on `EnqueueAsync` explicitly with a `<param>` tag
+that states the `0` special case in the same words as `StoreAndForwardMessage.MaxRetries`
+(`"0 = no limit — retried on every sweep until delivered, never parked"`), and add the
+`0` case to the method summary's lifecycle description. Better still — and consistent
+with the resolution of ESG-015 — make the engine reject the ambiguity at the API: accept
+a nullable/`enum` retry policy, or treat `0` as an explicit "no retry" (do not buffer, or
+park on the first sweep) so the natural reading and the behaviour agree. Either way the
+public `EnqueueAsync` contract must state the chosen meaning; today it states nothing.
+
+**Resolution**
+
+_Unresolved._
+
+### StoreAndForward-016 — Operator-initiated parked-message retry and discard are not replicated to the standby
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Status | Open |
+| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:339`–`:362`; `src/ScadaLink.StoreAndForward/ReplicationService.cs:131`–`:136` |
+
+**Description**
+
+`StoreAndForward-001`'s fix wired replication into the active *delivery* paths:
+`BufferAsync` replicates an `Add`, a successful retry replicates a `Remove`, and a park
+replicates a `Park`. But the two *operator* paths — `RetryParkedMessageAsync` (line 339)
+and `DiscardParkedMessageAsync` (line 353) — change buffer state and never touch
+`_replication`:
+
+- `RetryParkedMessageAsync` flips a row from `Parked` back to `Pending` (and clears
+  `retry_count` / `last_attempt_at`) in the local SQLite only. The standby's copy stays
+  `Parked`.
+- `DiscardParkedMessageAsync` `DELETE`s the row from the local SQLite only. The standby's
+  copy is left in place, still `Parked`.
+
+The Component design doc ("Persistence") requires the active node to forward "each
+buffer operation (add, remove, park)" so that on failover "the new active node has a
+near-complete copy of the buffer." An operator retrying a parked message is a buffer
+state change; an operator discarding one is a removal. After a failover that follows an
+operator action:
+
+1. A **discarded** message reappears on the new active node — it is still `Parked`
+   there, so it resurfaces in the central UI's parked-message list and an operator must
+   discard it a second time. For a message deliberately removed (e.g. a known-bad
+   payload) this is a correctness regression of the operator's intent.
+2. A **retried** message is still `Parked` on the new active node, so the operator's
+   "move it back to the queue" action is silently lost across the failover and the
+   message is not re-attempted.
+
+`ReplicationOperationType` only models `Add`/`Remove`/`Park` — there is no operation for
+"un-park / move back to pending", so even a minimal fix needs either a new operation
+type or a re-use of `Add` to overwrite the standby row. This is the same class of defect
+as the now-resolved `StoreAndForward-001`, for the operator paths rather than the sweep
+paths.
+
+**Recommendation**
+
+Replicate both operator actions. `DiscardParkedMessageAsync` should call
+`_replication?.ReplicateRemove(messageId)` after a successful local delete (the existing
+`Remove` op already deletes on the standby). For `RetryParkedMessageAsync`, add a
+`Requeue`/`Unpark` `ReplicationOperationType` whose `ApplyReplicatedOperationAsync` case
+resets the standby row to `Pending` with `retry_count = 0`, or have the method re-load
+the updated message and replicate it as an `Add`-style upsert. Add replication tests for
+both operator paths (the existing `StoreAndForwardReplicationTests` only cover the sweep
+paths).
+
+**Resolution**
+
+_Unresolved._
+
+### StoreAndForward-017 — Retry/Discard activity-log entries hard-code the `ExternalSystem` category
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:344`, `:358` |
+
+**Description**
+
+`RetryParkedMessageAsync` and `DiscardParkedMessageAsync` raise an S&F activity
+notification (consumed by Site Event Logging — WP-14) but pass a hard-coded
+`StoreAndForwardCategory.ExternalSystem` as the category argument:
+
+```csharp
+RaiseActivity("Retry",   StoreAndForwardCategory.ExternalSystem, $"Parked message {messageId} moved back to queue");
+RaiseActivity("Discard", StoreAndForwardCategory.ExternalSystem, $"Parked message {messageId} discarded");
+```
+
+Both methods take only a `messageId` and never load the message, so they have no access
+to its real category. When an operator retries or discards a parked **Notification** or
+**CachedDbWrite** message, the site event log records the activity under
+`ExternalSystem`. Every other `RaiseActivity` call in the service passes the message's
+true `Category` (`EnqueueAsync`, `RetryMessageAsync`), so the operator paths are
+inconsistent and produce mislabelled audit entries — misleading when an operator later
+filters or reviews S&F activity by category.
+
+**Recommendation**
+
+Load the message (or have `StoreAndForwardStorage.RetryParkedMessageAsync` /
+`DiscardParkedMessageAsync` return the affected row's category) and pass the real
+`Category` to `RaiseActivity`. If loading the row is considered too costly on these
+infrequent operator paths, change the `OnActivity` event / `RaiseActivity` signature to
+allow a nullable category for management actions rather than asserting a false one.
+
+**Resolution**
+
+_Unresolved._
--- a/code-reviews/TemplateEngine/findings.md
+++ b/code-reviews/TemplateEngine/findings.md
@@ -5,10 +5,10 @@
 | Module | `src/ScadaLink.TemplateEngine` |
 | Design doc | `docs/requirements/Component-TemplateEngine.md` |
 | Status | Reviewed |
-| Last reviewed | 2026-05-16 |
+| Last reviewed | 2026-05-17 |
 | Reviewer | claude-agent |
-| Commit reviewed | `9c60592` |
-| Open findings | 0 |
+| Commit reviewed | `39d737e` |
+| Open findings | 2 |

 ## Summary

@@ -29,11 +29,30 @@ create, optimistic concurrency on instance state) are claimed but not implemente
 Themes: validation that is weaker than the design promises, and asymmetric handling
 of attributes vs. alarms vs. scripts throughout the resolve/flatten/derive paths.

+#### Re-review 2026-05-17 (commit `39d737e`)
+
+Re-reviewed the whole module against all ten checklist categories at commit
+`39d737e`. All fourteen prior findings remain closed — the batch-4 fixes
+(`bc88a36`/`804697f` and predecessors) hold up: the recursive composition walk,
+the per-slot alarm override mechanism, the code-region-aware delimiter scanner,
+and the single-source deletion-constraint logic are all correctly in place. Two
+new Medium findings surfaced, both in the composition-cascade path and both
+affecting **nested** (depth ≥ 2) compositions specifically — the same blind spot
+that produced TemplateEngine-001. **TemplateEngine-015**: `RenameCompositionAsync`
+renames only the directly slot-owned derived template, leaving cascaded inner
+derived templates with a stale dotted-path name. **TemplateEngine-016**:
+`FlatteningService` hard-codes `ScriptScope.ParentPath` to the empty string for
+every composed script regardless of nesting depth, so a script two or more
+levels deep cannot resolve `Parent.X` references to its real parent module.
+Both are limited-impact (nested compositions are the less common case and there
+is design-time visibility) but represent genuine drift from the recursive-nesting
+design promise.
+
 ## Checklist coverage

 | # | Category | Examined | Notes |
 |---|----------|----------|-------|
-| 1 | Correctness & logic bugs | ✓ | Multiple real bugs: deep composed-member loss, derived alarms omitted, granularity bypass, no-op create-time collision block. |
+| 1 | Correctness & logic bugs | ✓ | Prior bugs (001–005, 013) all resolved and verified. Re-review 2026-05-17 found two new nested-composition defects: rename does not cascade (TemplateEngine-015), composed-script `ParentPath` always empty (TemplateEngine-016). |
 | 2 | Akka.NET conventions | ✓ | No actors in this module (`AddTemplateEngineActors` is an empty placeholder). Nothing to assess. |
 | 3 | Concurrency & thread safety | ✓ | Services are stateless, scoped per request; static helpers hold no mutable state. Design says template editing is last-write-wins; that is honoured. See TemplateEngine-010 re: a doc claim of optimistic concurrency that is not implemented. |
 | 4 | Error handling & resilience | ✓ | `Result<T>` used consistently; repository nulls guarded. `FlatteningService` wraps in try/catch. No store-and-forward or failover surface in this module. |
@@ -648,3 +667,102 @@ reports all blocking reasons and uses `TemplateDeletionService`'s phrasing — t
 affected `TemplateServiceTests` delete tests were updated to the unified messages,
 and a regression test `DeleteTemplate_MultipleConstraints_ReportsAllNotJustFirst`
 verifies all three constraint categories are surfaced together.
+
+### TemplateEngine-015 — `RenameCompositionAsync` does not cascade-rename nested derived templates
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.TemplateEngine/TemplateService.cs:680` |
+
+**Description**
+
+`AddCompositionAsync` builds a cascade of derived templates whose names follow a
+dotted path: composing `$Sensor` (which itself composes `$Probe` as `Probe1`)
+into `$Pump` as `TempSensor` produces `$Pump.TempSensor` **and** the nested
+`$Pump.TempSensor.Probe1` (see `CreateCascadedCompositionAsync` and the
+`AddComposition_CascadesChildCompositions` test). `RenameCompositionAsync`,
+however, renames only the **directly** slot-owned derived template:
+
+```csharp
+var derived = await _repository.GetTemplateByIdAsync(composition.ComposedTemplateId, ...);
+if (derived != null && derived.IsDerived && derived.OwnerCompositionId == compositionId)
+{
+    var newDerivedName = $"{owner.Name}.{newInstanceName}";
+    ...
+    derived.Name = newDerivedName;
+    await _repository.UpdateTemplateAsync(derived, ...);
+}
+```
+
+There is no recursion into `derived.Compositions`. After renaming the `TempSensor`
+slot to `MainSensor`, the parent derived becomes `$Pump.MainSensor` but the
+cascaded child stays `$Pump.TempSensor.Probe1` — its name no longer reflects the
+slot path it lives under, breaking the dotted-path naming invariant the cascade
+otherwise maintains. `DeleteCompositionAsync` correctly recurses
+(`CascadeDeleteDerivedAsync`), so rename is the asymmetric outlier. The
+`RenameComposition_RenamesSlotAndDerivedTemplate` test only exercises a
+single-level derived, so the gap is untested. The stale name also breaks the
+`AddComposition_DerivedNameCollision_Fails` / cascade-name pre-check on any
+subsequent compose that walks the now-inconsistent name tree.
+
+**Recommendation**
+
+Recurse over `derived.Compositions` (mirroring `CascadeDeleteDerivedAsync`),
+re-deriving each cascaded child's name from the renamed parent
+(`$"{parentDerivedName}.{childComposition.InstanceName}"`), and run the
+existing same-name collision pre-check across every name the cascade will
+produce — not just the top-level one. Add a regression test covering a
+two-level cascade rename.
+
+**Resolution**
+
+_Unresolved._
+
+### TemplateEngine-016 — Composed-script `ScriptScope.ParentPath` is always empty, breaking `Parent.X` resolution for nested modules
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Status | Open |
+| Location | `src/ScadaLink.TemplateEngine/Flattening/FlatteningService.cs:750` |
+
+**Description**
+
+`ResolveComposedScriptsRecursive` assigns each composed script a `ScriptScope`:
+
+```csharp
+Scope = new Commons.Types.Scripts.ScriptScope(SelfPath: prefix, ParentPath: "")
+```
+
+`prefix` is the accumulated path-qualified module path (`Outer` at depth 1,
+`Outer.Inner` at depth 2, etc.), so `SelfPath` is correct. `ParentPath`, however,
+is hard-coded to the empty string at every depth. Per `ScriptScope`'s own XML
+doc, `ParentPath` is "computed at flattening time and seeded into the script's
+globals … so `Attributes["X"]` / `Parent.X` can prepend the right path-prefix."
+For a script directly composed at depth 1 the parent is the root and `""` is
+correct, but for a script in a nested module (`Outer.Inner.Foo`) the parent
+module is `Outer` — yet `ParentPath` is still `""`. A nested composed script
+that references `Parent.X` will therefore resolve the reference against the root
+flat namespace instead of its actual parent module, reading the wrong attribute
+(or failing to find one). This is the same depth-≥2 nesting blind spot as
+TemplateEngine-001; the recursive walk was added there but the `Scope`
+construction was not updated to carry the parent path. `ResolveComposedScripts`
+for direct (root-template) scripts leaves `Scope` at the default `ScriptScope.Root`,
+which is correct.
+
+**Recommendation**
+
+Thread the parent module path through `ResolveComposedScriptsRecursive` (the
+caller already knows it — it is the `prefix` of the enclosing recursion frame,
+or `""` for a depth-1 composition) and set
+`ParentPath` to that value, so `SelfPath = "Outer.Inner"` pairs with
+`ParentPath = "Outer"`. Add a flattening test asserting the `Scope` of a
+two-level composed script.
+
+**Resolution**
+
+_Unresolved._