docs(code-reviews): re-review batch 4 at 39d737e — SiteEventLogging, SiteRuntime, StoreAndForward, TemplateEngine
11 new findings: SiteEventLogging-012..014, SiteRuntime-017..019, StoreAndForward-015..017, TemplateEngine-015..016.
This commit is contained in:
@@ -41,9 +41,9 @@ module file and counted in **Total**.
|
|||||||
|----------|---------------|
|
|----------|---------------|
|
||||||
| Critical | 0 |
|
| Critical | 0 |
|
||||||
| High | 8 |
|
| High | 8 |
|
||||||
| Medium | 20 |
|
| Medium | 26 |
|
||||||
| Low | 27 |
|
| Low | 32 |
|
||||||
| **Total** | **55** |
|
| **Total** | **66** |
|
||||||
|
|
||||||
## Module Status
|
## Module Status
|
||||||
|
|
||||||
@@ -64,10 +64,10 @@ module file and counted in **Total**.
|
|||||||
| [ManagementService](ManagementService/findings.md) | 2026-05-16 | `9c60592` | 0/1/1/2 | 4 | 17 |
|
| [ManagementService](ManagementService/findings.md) | 2026-05-16 | `9c60592` | 0/1/1/2 | 4 | 17 |
|
||||||
| [NotificationService](NotificationService/findings.md) | 2026-05-16 | `9c60592` | 0/2/1/2 | 5 | 18 |
|
| [NotificationService](NotificationService/findings.md) | 2026-05-16 | `9c60592` | 0/2/1/2 | 5 | 18 |
|
||||||
| [Security](Security/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/2 | 4 | 15 |
|
| [Security](Security/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/2 | 4 | 15 |
|
||||||
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 11 |
|
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-16 | `9c60592` | 0/0/1/2 | 3 | 14 |
|
||||||
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 16 |
|
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-16 | `9c60592` | 0/0/1/2 | 3 | 19 |
|
||||||
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 14 |
|
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/1 | 3 | 17 |
|
||||||
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-16 | `9c60592` | 0/0/0/0 | 0 | 14 |
|
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-16 | `9c60592` | 0/0/2/0 | 2 | 16 |
|
||||||
|
|
||||||
## Pending Findings
|
## Pending Findings
|
||||||
|
|
||||||
@@ -93,7 +93,7 @@ _None open._
|
|||||||
| NotificationService-014 | [NotificationService](NotificationService/findings.md) | OAuth2 token-fetch failure escapes `DeliverBufferedAsync`; a permanently-broken config is retried forever |
|
| NotificationService-014 | [NotificationService](NotificationService/findings.md) | OAuth2 token-fetch failure escapes `DeliverBufferedAsync`; a permanently-broken config is retried forever |
|
||||||
| NotificationService-015 | [NotificationService](NotificationService/findings.md) | Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script |
|
| NotificationService-015 | [NotificationService](NotificationService/findings.md) | Unclassified exceptions (OAuth2 token fetch, non-cancellation OCE) escape `SendAsync` to the calling script |
|
||||||
|
|
||||||
### Medium (20)
|
### Medium (26)
|
||||||
|
|
||||||
| ID | Module | Title |
|
| ID | Module | Title |
|
||||||
|----|--------|-------|
|
|----|--------|-------|
|
||||||
@@ -117,8 +117,14 @@ _None open._
|
|||||||
| NotificationService-016 | [NotificationService](NotificationService/findings.md) | `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials |
|
| NotificationService-016 | [NotificationService](NotificationService/findings.md) | `AuthenticateAsync` silently sends unauthenticated for an unknown auth type or empty credentials |
|
||||||
| Security-012 | [Security](Security/findings.md) | Partial LDAP failure during login yields a roleless authenticated session |
|
| Security-012 | [Security](Security/findings.md) | Partial LDAP failure during login yields a roleless authenticated session |
|
||||||
| Security-014 | [Security](Security/findings.md) | `RefreshToken` re-issues a token without checking the idle timeout |
|
| Security-014 | [Security](Security/findings.md) | `RefreshToken` re-issues a token without checking the idle timeout |
|
||||||
|
| SiteEventLogging-012 | [SiteEventLogging](SiteEventLogging/findings.md) | Dropped events report success: `Task` is completed, not faulted, when the event cannot be persisted |
|
||||||
|
| SiteRuntime-017 | [SiteRuntime](SiteRuntime/findings.md) | Instance Actor's live `_attributes` dictionary is shared by reference into child actor constructors |
|
||||||
|
| StoreAndForward-015 | [StoreAndForward](StoreAndForward/findings.md) | `EnqueueAsync`'s public contract never documents that `maxRetries == 0` means "retry forever" |
|
||||||
|
| StoreAndForward-016 | [StoreAndForward](StoreAndForward/findings.md) | Operator-initiated parked-message retry and discard are not replicated to the standby |
|
||||||
|
| TemplateEngine-015 | [TemplateEngine](TemplateEngine/findings.md) | `RenameCompositionAsync` does not cascade-rename nested derived templates |
|
||||||
|
| TemplateEngine-016 | [TemplateEngine](TemplateEngine/findings.md) | Composed-script `ScriptScope.ParentPath` is always empty, breaking `Parent.X` resolution for nested modules |
|
||||||
|
|
||||||
### Low (27)
|
### Low (32)
|
||||||
|
|
||||||
| ID | Module | Title |
|
| ID | Module | Title |
|
||||||
|----|--------|-------|
|
|----|--------|-------|
|
||||||
@@ -149,3 +155,8 @@ _None open._
|
|||||||
| NotificationService-018 | [NotificationService](NotificationService/findings.md) | Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed |
|
| NotificationService-018 | [NotificationService](NotificationService/findings.md) | Concurrency limiter: lock-free read of a non-volatile field, never resized on redeployment, never disposed |
|
||||||
| Security-013 | [Security](Security/findings.md) | `ExtractFirstRdnValue` mis-parses group DNs containing escaped commas |
|
| Security-013 | [Security](Security/findings.md) | `ExtractFirstRdnValue` mis-parses group DNs containing escaped commas |
|
||||||
| Security-015 | [Security](Security/findings.md) | Username is not trimmed before use in the LDAP filter, fallback DN, and JWT claims |
|
| Security-015 | [Security](Security/findings.md) | Username is not trimmed before use in the LDAP filter, fallback DN, and JWT claims |
|
||||||
|
| SiteEventLogging-013 | [SiteEventLogging](SiteEventLogging/findings.md) | Keyword search does not escape SQL `LIKE` wildcards in user input |
|
||||||
|
| SiteEventLogging-014 | [SiteEventLogging](SiteEventLogging/findings.md) | Initial purge runs synchronously on the host startup thread |
|
||||||
|
| SiteRuntime-018 | [SiteRuntime](SiteRuntime/findings.md) | `ScriptExecutionActor` XML doc still claims a "dedicated blocking I/O dispatcher" |
|
||||||
|
| SiteRuntime-019 | [SiteRuntime](SiteRuntime/findings.md) | Dead `DisableInstanceCommand` / `EnableInstanceCommand` handlers in `InstanceActor` |
|
||||||
|
| StoreAndForward-017 | [StoreAndForward](StoreAndForward/findings.md) | Retry/Discard activity-log entries hard-code the `ExternalSystem` category |
|
||||||
|
|||||||
@@ -5,10 +5,10 @@
|
|||||||
| Module | `src/ScadaLink.SiteEventLogging` |
|
| Module | `src/ScadaLink.SiteEventLogging` |
|
||||||
| Design doc | `docs/requirements/Component-SiteEventLogging.md` |
|
| Design doc | `docs/requirements/Component-SiteEventLogging.md` |
|
||||||
| Status | Reviewed |
|
| Status | Reviewed |
|
||||||
| Last reviewed | 2026-05-16 |
|
| Last reviewed | 2026-05-17 |
|
||||||
| Reviewer | claude-agent |
|
| Reviewer | claude-agent |
|
||||||
| Commit reviewed | `9c60592` |
|
| Commit reviewed | `39d737e` |
|
||||||
| Open findings | 0 |
|
| Open findings | 3 |
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
@@ -28,16 +28,33 @@ cluster-singleton placement of the handler actor (which can pin to the standby
|
|||||||
node), missing indexes for common query filters, retention/cap purge not enforcing
|
node), missing indexes for common query filters, retention/cap purge not enforcing
|
||||||
the requirement strictly, and several documentation/maintainability issues.
|
the requirement strictly, and several documentation/maintainability issues.
|
||||||
|
|
||||||
|
#### Re-review 2026-05-17 (commit `39d737e`)
|
||||||
|
|
||||||
|
Re-reviewed the module at commit `39d737e`. All eleven prior findings remain closed
|
||||||
|
(SiteEventLogging-001..003, 005..011 Resolved; 004 Won't Fix) and the resolutions
|
||||||
|
hold up under inspection — the background writer, lock-guarded `WithConnection`,
|
||||||
|
`auto_vacuum = INCREMENTAL` plus logical-size measurement, the severity index, and
|
||||||
|
the concrete-recorder DI wiring are all present and correct at this commit. The
|
||||||
|
module source is byte-identical between `39d737e` and current `HEAD`, so this review
|
||||||
|
reflects the live code. Three new findings were recorded, all low-to-medium and none
|
||||||
|
regressions of prior fixes. The most notable (SiteEventLogging-012) is a correctness
|
||||||
|
gap left by the SiteEventLogging-005 background-writer rework: when an event cannot
|
||||||
|
be persisted because the logger has been disposed, the returned `Task` is completed
|
||||||
|
*successfully* rather than faulted, so an `await`-ing caller is told a dropped audit
|
||||||
|
event was written. The other two are minor: unescaped SQL `LIKE` wildcards in the
|
||||||
|
keyword-search filter (SiteEventLogging-013) and the initial purge running
|
||||||
|
synchronously on the host startup thread (SiteEventLogging-014).
|
||||||
|
|
||||||
## Checklist coverage
|
## Checklist coverage
|
||||||
|
|
||||||
| # | Category | Examined | Notes |
|
| # | Category | Examined | Notes |
|
||||||
|---|----------|----------|-------|
|
|---|----------|----------|-------|
|
||||||
| 1 | Correctness & logic bugs | ☑ | `incremental_vacuum` no-op breaks cap purge (-001); over-delete on cap (-002). |
|
| 1 | Correctness & logic bugs | ☑ | `incremental_vacuum` no-op breaks cap purge (-001); over-delete on cap (-002). Re-review: dropped events report success (-012); `LIKE` wildcards unescaped in keyword search (-013). |
|
||||||
| 2 | Akka.NET conventions | ☑ | Handler actor has no supervision/correlation concerns of its own; singleton placement issue (-004). `Ask` boundary is appropriate. |
|
| 2 | Akka.NET conventions | ☑ | Handler actor has no supervision/correlation concerns of its own; singleton placement issue (-004). `Ask` boundary is appropriate. |
|
||||||
| 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). |
|
| 3 | Concurrency & thread safety | ☑ | Shared `SqliteConnection` used by purge/query without the write lock (-003). |
|
||||||
| 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. |
|
| 4 | Error handling & resilience | ☑ | `LogEventAsync` swallows write failures silently into a log line only (-008); purge catches broadly. |
|
||||||
| 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. |
|
| 5 | Security | ☑ | Queries fully parameterised. No authz in module (delegated to caller) — noted, not a finding. |
|
||||||
| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). |
|
| 6 | Performance & resource management | ☑ | Synchronous I/O on actor threads (-005); missing indexes for severity/source/message (-006). Re-review: initial purge blocks host startup thread (-014). |
|
||||||
| 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). |
|
| 7 | Design-document adherence | ☑ | Singleton placement contradicts "active node" model (-004); cap purge does not honour "oldest first within budget" cleanly (-002). |
|
||||||
| 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). |
|
| 8 | Code organization & conventions | ☑ | Concrete-type downcast of `ISiteEventLogger` (-007); `internal Connection` leaks DB handle (-007). |
|
||||||
| 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
|
| 9 | Testing coverage | ☑ | No tests for purge interaction with live writes, vacuum effectiveness, the actor bridge, or query error path (-010). |
|
||||||
@@ -529,3 +546,122 @@ explanatory note added to `AddSiteEventLogging` pointing readers to where the ac
|
|||||||
is actually registered. Documentation/dead-code change only; no regression test was
|
is actually registered. Documentation/dead-code change only; no regression test was
|
||||||
added — the change is a method removal verified by the compiler (no callers) and the
|
added — the change is a method removal verified by the compiler (no callers) and the
|
||||||
full module suite still passing.
|
full module suite still passing.
|
||||||
|
|
||||||
|
### SiteEventLogging-012 — Dropped events report success: `Task` is completed, not faulted, when the event cannot be persisted
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Medium |
|
||||||
|
| Category | Correctness & logic bugs |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:160-166,193-197` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`LogEventAsync` returns a `Task` that, per the interface XML doc (corrected under
|
||||||
|
SiteEventLogging-009), "completes once the event is durably persisted and faults if
|
||||||
|
the write fails, so callers that `await` it observe success or failure." Two paths
|
||||||
|
break that contract by signalling **success** for an event that was never written:
|
||||||
|
|
||||||
|
1. In `LogEventAsync`, if `_writeQueue.Writer.TryWrite(pending)` fails (the channel
|
||||||
|
has been completed because the logger was disposed), the code calls
|
||||||
|
`pending.Completion.TrySetResult()` — completing the `Task` successfully — even
|
||||||
|
though the comment immediately above acknowledges "there is nowhere to persist the
|
||||||
|
event."
|
||||||
|
2. In `ProcessWriteQueueAsync`, `WithConnection` returns `false` when the logger has
|
||||||
|
been disposed mid-drain. The code does not inspect the returned `written` flag and
|
||||||
|
unconditionally calls `pending.Completion.TrySetResult()`, again reporting success
|
||||||
|
for an event the comment admits "simply cannot be persisted."
|
||||||
|
|
||||||
|
The event log is the site's diagnostic audit trail. A caller that `await`s
|
||||||
|
`LogEventAsync` to confirm a critical event (deployment applied, alarm activated) was
|
||||||
|
recorded will observe a *successful* completion for an event that was silently
|
||||||
|
dropped. This is the same class of defect SiteEventLogging-008 fixed for write
|
||||||
|
*errors* — but the disposed-drop path was left reporting false success. The window
|
||||||
|
is the disposal/shutdown interval, during which shutdown-related events (graceful
|
||||||
|
singleton handover, instance disable) are exactly the events most likely to be
|
||||||
|
enqueued and lost.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
For both paths, fault the `Task` (or complete it with a sentinel failure) instead of
|
||||||
|
`TrySetResult()` — e.g. `pending.Completion.TrySetException(new ObjectDisposedException(...))`
|
||||||
|
— so an `await`-ing caller can distinguish a dropped event from a persisted one.
|
||||||
|
Inspect the `written` flag returned by `WithConnection` in `ProcessWriteQueueAsync`
|
||||||
|
and only call `TrySetResult()` when `written` is `true`. Update the XML doc if a
|
||||||
|
deliberate "drop silently on shutdown" semantics is chosen instead.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|
||||||
|
### SiteEventLogging-013 — Keyword search does not escape SQL `LIKE` wildcards in user input
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Low |
|
||||||
|
| Category | Correctness & logic bugs |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:79-83` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
The keyword-search filter builds the `LIKE` pattern as `$"%{request.KeywordFilter}%"`
|
||||||
|
and binds it as a parameter. Parameterisation correctly prevents SQL injection, but
|
||||||
|
it does **not** neutralise the `LIKE` metacharacters `%` and `_` inside the
|
||||||
|
user-supplied keyword. A search for a literal `_` (common in event sources and
|
||||||
|
identifiers such as `store_and_forward`, `PLC_1`, or instance IDs) is interpreted as
|
||||||
|
"match any single character", and a `%` matches any run of characters. The design
|
||||||
|
calls keyword search "free-text search on message and source fields ... Useful for
|
||||||
|
finding events by script name, alarm name, or error message" — users will reasonably
|
||||||
|
expect a literal substring match, so a query for `store_and_forward` silently returns
|
||||||
|
events containing `storeXandYforward` and similar false positives. There is no way
|
||||||
|
for the caller to search for a literal underscore or percent.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Escape `%`, `_`, and the escape character itself in `request.KeywordFilter` before
|
||||||
|
wrapping it in `%...%`, and append an `ESCAPE` clause to the `LIKE` expression
|
||||||
|
(e.g. `... LIKE $keyword ESCAPE '\'`). Alternatively document that the keyword field
|
||||||
|
accepts `LIKE` wildcard syntax, but a literal-substring match is the behaviour the
|
||||||
|
design implies.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|
||||||
|
### SiteEventLogging-014 — Initial purge runs synchronously on the host startup thread
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Low |
|
||||||
|
| Category | Performance & resource management |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:34-48` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`EventLogPurgeService.ExecuteAsync` calls `RunPurge()` (a fully synchronous method
|
||||||
|
that runs `PurgeByRetention` and `PurgeByStorageCap`) *before* the first `await`
|
||||||
|
(`await timer.WaitForNextTickAsync(...)`). A `BackgroundService`'s `ExecuteAsync` is
|
||||||
|
invoked from `StartAsync`, and the host's startup pipeline does not proceed past a
|
||||||
|
`BackgroundService` until its `ExecuteAsync` yields at the first real `await`. Because
|
||||||
|
`RunPurge()` precedes any `await`, the entire initial purge — including a cap-purge
|
||||||
|
that deletes rows in 1000-row batches and runs `PRAGMA incremental_vacuum` until a
|
||||||
|
near-1 GB database is back under the cap — executes inline on the startup thread,
|
||||||
|
blocking host startup (and therefore the `/health/ready` gate) for as long as the
|
||||||
|
purge takes. On a site that has accumulated a large log this can be a multi-second
|
||||||
|
stall during every node start/failover. The class doc states the service "runs on a
|
||||||
|
background thread and does not block event recording" — the startup-thread block is
|
||||||
|
inconsistent with that intent.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Yield before the initial purge so it runs on the background scheduler rather than the
|
||||||
|
startup thread — e.g. `await Task.Yield();` as the first statement of `ExecuteAsync`,
|
||||||
|
or move the initial `RunPurge()` to after the first `await timer.WaitForNextTickAsync`
|
||||||
|
(accepting a one-interval delay), or offload it with `await Task.Run(RunPurge, stoppingToken)`.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|||||||
@@ -5,10 +5,10 @@
|
|||||||
| Module | `src/ScadaLink.SiteRuntime` |
|
| Module | `src/ScadaLink.SiteRuntime` |
|
||||||
| Design doc | `docs/requirements/Component-SiteRuntime.md` |
|
| Design doc | `docs/requirements/Component-SiteRuntime.md` |
|
||||||
| Status | Reviewed |
|
| Status | Reviewed |
|
||||||
| Last reviewed | 2026-05-16 |
|
| Last reviewed | 2026-05-17 |
|
||||||
| Reviewer | claude-agent |
|
| Reviewer | claude-agent |
|
||||||
| Commit reviewed | `9c60592` |
|
| Commit reviewed | `39d737e` |
|
||||||
| Open findings | 0 |
|
| Open findings | 3 |
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
@@ -28,6 +28,24 @@ in a comment but ships it anyway). Test coverage exists for the coordinator acto
|
|||||||
persistence and scripting, but the short-lived execution actors, the replication
|
persistence and scripting, but the short-lived execution actors, the replication
|
||||||
actor, and the repositories are untested.
|
actor, and the repositories are untested.
|
||||||
|
|
||||||
|
#### Re-review 2026-05-17 (commit `39d737e`)
|
||||||
|
|
||||||
|
The module was re-reviewed at commit `39d737e`. No source under
|
||||||
|
`src/ScadaLink.SiteRuntime` has changed since the previous review at `9c60592`
|
||||||
|
(the only intervening commits are code-review documentation updates), so all of
|
||||||
|
SiteRuntime-001..013, 015, 016 remain Resolved and SiteRuntime-014 remains
|
||||||
|
Deferred — its Deferred justification (a trigger-evaluation concurrency design
|
||||||
|
decision is required before either recommended fix can land in-module) still
|
||||||
|
holds verbatim against the unchanged `ScriptActor`/`AlarmActor` source. The
|
||||||
|
re-review nonetheless worked through all 10 checklist categories afresh and
|
||||||
|
surfaced three new findings that the prior pass did not record: a cross-thread
|
||||||
|
`Dictionary` enumeration race when the Instance Actor's live `_attributes`
|
||||||
|
dictionary is handed by reference into child `ScriptActor`/`AlarmActor`
|
||||||
|
constructors (SiteRuntime-017, Medium); a stale `ScriptExecutionActor` XML doc
|
||||||
|
that still claims a "dedicated blocking I/O dispatcher" (SiteRuntime-018, Low);
|
||||||
|
and two dead lifecycle handlers in `InstanceActor` that the Deployment Manager
|
||||||
|
never routes to (SiteRuntime-019, Low). Open findings: 3.
|
||||||
|
|
||||||
## Checklist coverage
|
## Checklist coverage
|
||||||
|
|
||||||
| # | Category | Examined | Notes |
|
| # | Category | Examined | Notes |
|
||||||
@@ -733,3 +751,126 @@ harness is a larger test-infrastructure task tracked separately and out of scope
|
|||||||
Low-severity coverage finding; the highest-value untested paths the finding called out
|
Low-severity coverage finding; the highest-value untested paths the finding called out
|
||||||
(script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests
|
(script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests
|
||||||
green.
|
green.
|
||||||
|
|
||||||
|
### SiteRuntime-017 — Instance Actor's live `_attributes` dictionary is shared by reference into child actor constructors
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Medium |
|
||||||
|
| Category | Concurrency & thread safety |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:625`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:675`, `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:83`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:93` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`InstanceActor.CreateChildActors` passes the Instance Actor's own mutable
|
||||||
|
`_attributes` field (a plain `Dictionary<string, object?>`) by reference into the
|
||||||
|
`Props.Create(...)` factory for every `ScriptActor` and `AlarmActor` (as the
|
||||||
|
`initialAttributes` constructor argument). Each child constructor then iterates
|
||||||
|
that dictionary to seed its `_attributeSnapshot`:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
if (initialAttributes != null)
|
||||||
|
foreach (var kvp in initialAttributes)
|
||||||
|
_attributeSnapshot[kvp.Key] = kvp.Value;
|
||||||
|
```
|
||||||
|
|
||||||
|
`Context.ActorOf` returns immediately; the child actor's constructor runs later on
|
||||||
|
the *child's* mailbox thread. Meanwhile the Instance Actor's `PreStart` returns and
|
||||||
|
the Instance Actor begins processing its mailbox — `HandleTagValueUpdate` and
|
||||||
|
`HandleAttributeValueChanged` both mutate `_attributes` (`_attributes[...] = ...`).
|
||||||
|
A DCL tag update that arrives before a child has finished its constructor copy
|
||||||
|
therefore mutates the dictionary on the Instance Actor thread while the child
|
||||||
|
thread is enumerating it. `Dictionary<,>` is explicitly not safe for concurrent
|
||||||
|
read/write: the enumeration can throw `InvalidOperationException` ("collection was
|
||||||
|
modified") — which surfaces as an `ActorInitializationException` and, under the
|
||||||
|
Instance Actor's `SupervisorStrategy`, **stops** the child (the strategy returns
|
||||||
|
`Stop` for `ActorInitializationException`). The script or alarm is then silently
|
||||||
|
absent for the life of the instance. A torn read of an entry is also possible. The
|
||||||
|
window is small but deterministically reachable on a busy site at startup/failover
|
||||||
|
— exactly the staggered-startup scenario the design is most concerned about.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Do not share the live dictionary. Snapshot it on the Instance Actor thread before
|
||||||
|
constructing the child — e.g. pass `new Dictionary<string, object?>(_attributes)`
|
||||||
|
(or an immutable copy) into each `Props.Create`. The copy is made on the Instance
|
||||||
|
Actor thread inside `CreateChildActors`, so it is race-free, and each child gets a
|
||||||
|
private dictionary to seed from.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|
||||||
|
### SiteRuntime-018 — `ScriptExecutionActor` XML doc still claims a "dedicated blocking I/O dispatcher"
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Low |
|
||||||
|
| Category | Documentation & comments |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs:17` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
The class-level XML summary on `ScriptExecutionActor` states "Runs on a dedicated
|
||||||
|
blocking I/O dispatcher." That is not what the code does. SiteRuntime-009 was
|
||||||
|
resolved by introducing `ScriptExecutionScheduler` (a bounded dedicated
|
||||||
|
`TaskScheduler`); the *actor itself and its mailbox* run on the **default** Akka
|
||||||
|
dispatcher, and only the script body runs on the scheduler's threads via
|
||||||
|
`Task.Factory.StartNew(..., scheduler)`. The resolution of SiteRuntime-009
|
||||||
|
explicitly chose the `TaskScheduler` route *instead of* a HOCON dispatcher and
|
||||||
|
even removed the "in production, configure a dedicated dispatcher" comments
|
||||||
|
elsewhere — but this stale summary line was missed. A reader is told the actor is
|
||||||
|
on a dedicated dispatcher when it is not, which is misleading when reasoning about
|
||||||
|
mailbox throughput and thread-pool pressure. (`AlarmExecutionActor` does not carry
|
||||||
|
the equivalent claim — its summary only says "Same pattern as ScriptExecutionActor.")
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Correct the summary to describe the actual model: the actor runs on the default
|
||||||
|
dispatcher and the script body is dispatched onto the dedicated
|
||||||
|
`ScriptExecutionScheduler` (SiteRuntime-009). Align the wording with the accurate
|
||||||
|
comment already present at `ScriptExecutionActor.cs:71-73`.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|
||||||
|
### SiteRuntime-019 — Dead `DisableInstanceCommand` / `EnableInstanceCommand` handlers in `InstanceActor`
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Low |
|
||||||
|
| Category | Correctness & logic bugs |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:106`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:113` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`InstanceActor`'s constructor registers `Receive<DisableInstanceCommand>` and
|
||||||
|
`Receive<EnableInstanceCommand>` handlers that log and reply with a successful
|
||||||
|
`InstanceLifecycleResponse`. These handlers are unreachable. The Deployment Manager
|
||||||
|
is the only sender of those commands, and `DeploymentManagerActor.HandleDisable` /
|
||||||
|
`HandleEnable` handle the lifecycle entirely themselves — they call
|
||||||
|
`Context.Stop(actor)` (disable) or `CreateInstanceActor(...)` (enable) directly and
|
||||||
|
reply to the original sender from the Deployment Manager. Neither command is ever
|
||||||
|
`Forward`-ed or `Tell`-ed to the Instance Actor. The handlers are dead code, and
|
||||||
|
they are actively misleading: a maintainer reading `InstanceActor` would reasonably
|
||||||
|
believe disable/enable is partly an Instance-Actor responsibility, and the no-op
|
||||||
|
"true" reply implies an instance-side acknowledgement contract that does not exist.
|
||||||
|
If a future change *did* route these commands here, the disable handler would do
|
||||||
|
nothing useful (it does not stop children or tear down state — Akka does that when
|
||||||
|
the parent stops the actor).
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Remove the two `Receive<...>` registrations and their handler bodies from
|
||||||
|
`InstanceActor`, since the Deployment Manager owns the disable/enable lifecycle.
|
||||||
|
If the intent is to keep them for a future instance-side hook, add an XML comment
|
||||||
|
stating that the Deployment Manager currently handles these and the handlers are a
|
||||||
|
reserved placeholder — but removal is preferred.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|||||||
@@ -5,10 +5,10 @@
|
|||||||
| Module | `src/ScadaLink.StoreAndForward` |
|
| Module | `src/ScadaLink.StoreAndForward` |
|
||||||
| Design doc | `docs/requirements/Component-StoreAndForward.md` |
|
| Design doc | `docs/requirements/Component-StoreAndForward.md` |
|
||||||
| Status | Reviewed |
|
| Status | Reviewed |
|
||||||
| Last reviewed | 2026-05-16 |
|
| Last reviewed | 2026-05-17 |
|
||||||
| Reviewer | claude-agent |
|
| Reviewer | claude-agent |
|
||||||
| Commit reviewed | `9c60592` |
|
| Commit reviewed | `39d737e` |
|
||||||
| Open findings | 0 (3 Deferred: 002, 011, 012 — see notes) |
|
| Open findings | 3 (3 Deferred: 002, 011, 012 — see notes) |
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
@@ -30,20 +30,45 @@ status set, and untested critical paths (retry-due timing, replication-from-acti
|
|||||||
the actor bridge). None of the findings are blockers for compilation, but the
|
the actor bridge). None of the findings are blockers for compilation, but the
|
||||||
replication and retry-count issues are functional defects against the design.
|
replication and retry-count issues are functional defects against the design.
|
||||||
|
|
||||||
|
#### Re-review 2026-05-17 (commit `39d737e`)
|
||||||
|
|
||||||
|
Re-reviewed at commit `39d737e` after the batch-3 fixes. All of findings 001 and
|
||||||
|
003–010, plus 013–014, are confirmed `Resolved` against the current source: the
|
||||||
|
replication wiring (`BufferAsync`/`ReplicateRemove`/`ReplicatePark`), the corrected
|
||||||
|
retry-count semantics, the conditional `UpdateMessageIfStatusAsync` writes, the
|
||||||
|
transactioned parked-message reads, the `PipeTo` refactor, the `RaiseActivity`
|
||||||
|
hardening, the `RetryParkedMessageAsync` `last_attempt_at` reset and the database
|
||||||
|
directory creation are all present as described. Findings 002, 011 and 012 were
|
||||||
|
re-verified and remain validly `Deferred` — their preconditions are unchanged (002's
|
||||||
|
residual no-handler gap, 011's Commons-owned enum, 012's Commons-owned entity placement).
|
||||||
|
|
||||||
|
This pass surfaced **three new findings**. StoreAndForward-015 records the
|
||||||
|
StoreAndForward side of the cross-module `MaxRetries == 0` ambiguity flagged by
|
||||||
|
ExternalSystemGateway-015: `EnqueueAsync`'s public contract documents `maxRetries` only
|
||||||
|
as "parked once `MaxRetries` is reached" and never states the `0 = no limit / retry
|
||||||
|
forever` special case that `RetryMessageAsync` actually enforces, so an ESG caller
|
||||||
|
passing `0` to mean "never retry" gets the opposite behaviour with no warning from the
|
||||||
|
S&F API surface. StoreAndForward-016 records that operator-initiated parked-message
|
||||||
|
retry and discard are not replicated to the standby — only the add/remove/park sweep
|
||||||
|
paths are — so a failover diverges the standby buffer from the active one.
|
||||||
|
StoreAndForward-017 records that the Retry/Discard activity-log entries hard-code the
|
||||||
|
`ExternalSystem` category, mislabelling notification and cached-DB-write messages in
|
||||||
|
the site event log.
|
||||||
|
|
||||||
## Checklist coverage
|
## Checklist coverage
|
||||||
|
|
||||||
| # | Category | Examined | Notes |
|
| # | Category | Examined | Notes |
|
||||||
|---|----------|----------|-------|
|
|---|----------|----------|-------|
|
||||||
| 1 | Correctness & logic bugs | ☑ | Off-by-one in retry counting (003); parked-message retry timing (010). |
|
| 1 | Correctness & logic bugs | ☑ | Off-by-one in retry counting (003); parked-message retry timing (010); Retry/Discard activity log hard-codes the category (017). |
|
||||||
| 2 | Akka.NET conventions | ☑ | `ContinueWith` used instead of `PipeTo`-friendly continuations; default supervision; see 007. |
|
| 2 | Akka.NET conventions | ☑ | `ContinueWith` used instead of `PipeTo`-friendly continuations; default supervision; see 007. |
|
||||||
| 3 | Concurrency & thread safety | ☑ | Sweep guarded by `Interlocked`, but no guard against retry-vs-manage races (005); `OnActivity` event not thread-safe (009). |
|
| 3 | Concurrency & thread safety | ☑ | Sweep guarded by `Interlocked`, but no guard against retry-vs-manage races (005); `OnActivity` event not thread-safe (009). |
|
||||||
| 4 | Error handling & resilience | ☑ | Replication never invoked from active path (001); no-handler messages buffered then stuck (002). |
|
| 4 | Error handling & resilience | ☑ | Replication never invoked from active path (001); no-handler messages buffered then stuck (002); operator retry/discard not replicated to standby (016). |
|
||||||
| 5 | Security | ☑ | No issues found — parameterised SQL throughout; no secrets handled directly; payload JSON treated opaquely. |
|
| 5 | Security | ☑ | No issues found — parameterised SQL throughout; no secrets handled directly; payload JSON treated opaquely. |
|
||||||
| 6 | Performance & resource management | ☑ | New SQLite connection per call; multi-statement operations not wrapped in a transaction (006, 008). |
|
| 6 | Performance & resource management | ☑ | New SQLite connection per call; multi-statement operations not wrapped in a transaction (006, 008). |
|
||||||
| 7 | Design-document adherence | ☑ | Replication gap (001); `InFlight` status undocumented/unused (011); "retrying" status from design doc not modelled. |
|
| 7 | Design-document adherence | ☑ | Replication gap (001); `InFlight` status undocumented/unused (011); "retrying" status from design doc not modelled. |
|
||||||
| 8 | Code organization & conventions | ☑ | `StoreAndForwardMessage` is an entity-like POCO living in the component, not Commons (012). |
|
| 8 | Code organization & conventions | ☑ | `StoreAndForwardMessage` is an entity-like POCO living in the component, not Commons (012). |
|
||||||
| 9 | Testing coverage | ☑ | Retry-due timing, replication-from-active, and `ParkedMessageHandlerActor` are untested (013). |
|
| 9 | Testing coverage | ☑ | Retry-due timing, replication-from-active, and `ParkedMessageHandlerActor` are untested (013). |
|
||||||
| 10 | Documentation & comments | ☑ | XML doc on `RegisterDeliveryHandler` contract is inconsistent with code (004). |
|
| 10 | Documentation & comments | ☑ | XML doc on `RegisterDeliveryHandler` contract is inconsistent with code (004); `EnqueueAsync` never documents the `maxRetries == 0` = "retry forever" special case (015). |
|
||||||
|
|
||||||
## Findings
|
## Findings
|
||||||
|
|
||||||
@@ -703,3 +728,158 @@ and bare filenames are skipped). Regression test
|
|||||||
`InitializeAsync_FileInMissingDirectory_CreatesDirectory` fails against the pre-fix code;
|
`InitializeAsync_FileInMissingDirectory_CreatesDirectory` fails against the pre-fix code;
|
||||||
all six `SiteActorPathTests` now pass. Fixed by the commit whose message references
|
all six `SiteActorPathTests` now pass. Fixed by the commit whose message references
|
||||||
`StoreAndForward-014`.
|
`StoreAndForward-014`.
|
||||||
|
|
||||||
|
### StoreAndForward-015 — `EnqueueAsync`'s public contract never documents that `maxRetries == 0` means "retry forever"
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Medium |
|
||||||
|
| Category | Documentation & comments |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:114`–`:130`, `:285` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
The re-review brief asks for the StoreAndForward side of the cross-module ambiguity
|
||||||
|
recorded as `ExternalSystemGateway-015`. The semantics are split across this module and
|
||||||
|
its callers, and the StoreAndForward side carries a genuine documentation/API-contract
|
||||||
|
fault:
|
||||||
|
|
||||||
|
- `RetryMessageAsync` parks a message only when `message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries`
|
||||||
|
(`StoreAndForwardService.cs:285`). When `MaxRetries == 0` the guard is false on every
|
||||||
|
sweep, so a `0` value means **"no limit — retry forever"**. The
|
||||||
|
`StoreAndForwardMessage.MaxRetries` XML doc (`StoreAndForwardMessage.cs:31`) does state
|
||||||
|
`"0 = no limit"`, so the persistence model is internally consistent.
|
||||||
|
- But `EnqueueAsync` — the *only* public entry point into the engine — exposes a
|
||||||
|
`maxRetries` parameter (`StoreAndForwardService.cs:128`) with **no parameter
|
||||||
|
documentation at all**, and its method summary (lines 114–122) describes the lifecycle
|
||||||
|
only as "On max retries → park" / "parked once `MaxRetries` is reached" (see also the
|
||||||
|
`_deliveryHandlers` field doc, line 50–51). Nothing on the public surface tells a
|
||||||
|
caller that passing `0` flips the meaning from "park immediately / never retry" to
|
||||||
|
"retry forever". A caller reading only `EnqueueAsync` would reasonably assume `0`
|
||||||
|
retries means zero retries.
|
||||||
|
- This is exactly the trap ESG fell into: `ExternalSystemClient.CachedCallAsync` /
|
||||||
|
`DatabaseGateway.CachedWriteAsync` pass the source entity's `MaxRetries` verbatim,
|
||||||
|
intending `0` to mean "never retry", and instead get unbounded retry — the
|
||||||
|
duplicate-delivery / unbounded-buffer-growth hazard the design doc's idempotency note
|
||||||
|
warns against. The fault is not solely ESG's: the S&F public API silently overloads
|
||||||
|
`0` with the opposite of its natural reading and does not document it.
|
||||||
|
|
||||||
|
The defect is in this module's API contract and documentation, so it is recorded here
|
||||||
|
in addition to `ExternalSystemGateway-015`. (Whether `0` *should* mean "no limit" or
|
||||||
|
"no retry" is the cross-module design decision tracked by ESG-015; this finding is
|
||||||
|
specifically that the StoreAndForward public surface fails to document whichever
|
||||||
|
meaning is chosen.)
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Document the `maxRetries` parameter on `EnqueueAsync` explicitly with a `<param>` tag
|
||||||
|
that states the `0` special case in the same words as `StoreAndForwardMessage.MaxRetries`
|
||||||
|
(`"0 = no limit — retried on every sweep until delivered, never parked"`), and add the
|
||||||
|
`0` case to the method summary's lifecycle description. Better still — and consistent
|
||||||
|
with the resolution of ESG-015 — make the engine reject the ambiguity at the API: accept
|
||||||
|
a nullable/`enum` retry policy, or treat `0` as an explicit "no retry" (do not buffer, or
|
||||||
|
park on the first sweep) so the natural reading and the behaviour agree. Either way the
|
||||||
|
public `EnqueueAsync` contract must state the chosen meaning; today it states nothing.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|
||||||
|
### StoreAndForward-016 — Operator-initiated parked-message retry and discard are not replicated to the standby
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Medium |
|
||||||
|
| Category | Error handling & resilience |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:339`–`:362`; `src/ScadaLink.StoreAndForward/ReplicationService.cs:131`–`:136` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`StoreAndForward-001`'s fix wired replication into the active *delivery* paths:
|
||||||
|
`BufferAsync` replicates an `Add`, a successful retry replicates a `Remove`, and a park
|
||||||
|
replicates a `Park`. But the two *operator* paths — `RetryParkedMessageAsync` (line 339)
|
||||||
|
and `DiscardParkedMessageAsync` (line 353) — change buffer state and never touch
|
||||||
|
`_replication`:
|
||||||
|
|
||||||
|
- `RetryParkedMessageAsync` flips a row from `Parked` back to `Pending` (and clears
|
||||||
|
`retry_count` / `last_attempt_at`) in the local SQLite only. The standby's copy stays
|
||||||
|
`Parked`.
|
||||||
|
- `DiscardParkedMessageAsync` `DELETE`s the row from the local SQLite only. The standby's
|
||||||
|
copy is left in place, still `Parked`.
|
||||||
|
|
||||||
|
The Component design doc ("Persistence") requires the active node to forward "each
|
||||||
|
buffer operation (add, remove, park)" so that on failover "the new active node has a
|
||||||
|
near-complete copy of the buffer." An operator retrying a parked message is a buffer
|
||||||
|
state change; an operator discarding one is a removal. After a failover that follows an
|
||||||
|
operator action:
|
||||||
|
|
||||||
|
1. A **discarded** message reappears on the new active node — it is still `Parked`
|
||||||
|
there, so it resurfaces in the central UI's parked-message list and an operator must
|
||||||
|
discard it a second time. For a message deliberately removed (e.g. a known-bad
|
||||||
|
payload) this is a correctness regression of the operator's intent.
|
||||||
|
2. A **retried** message is still `Parked` on the new active node, so the operator's
|
||||||
|
"move it back to the queue" action is silently lost across the failover and the
|
||||||
|
message is not re-attempted.
|
||||||
|
|
||||||
|
`ReplicationOperationType` only models `Add`/`Remove`/`Park` — there is no operation for
|
||||||
|
"un-park / move back to pending", so even a minimal fix needs either a new operation
|
||||||
|
type or a re-use of `Add` to overwrite the standby row. This is the same class of defect
|
||||||
|
as the now-resolved `StoreAndForward-001`, for the operator paths rather than the sweep
|
||||||
|
paths.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Replicate both operator actions. `DiscardParkedMessageAsync` should call
|
||||||
|
`_replication?.ReplicateRemove(messageId)` after a successful local delete (the existing
|
||||||
|
`Remove` op already deletes on the standby). For `RetryParkedMessageAsync`, add a
|
||||||
|
`Requeue`/`Unpark` `ReplicationOperationType` whose `ApplyReplicatedOperationAsync` case
|
||||||
|
resets the standby row to `Pending` with `retry_count = 0`, or have the method re-load
|
||||||
|
the updated message and replicate it as an `Add`-style upsert. Add replication tests for
|
||||||
|
both operator paths (the existing `StoreAndForwardReplicationTests` only cover the sweep
|
||||||
|
paths).
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|
||||||
|
### StoreAndForward-017 — Retry/Discard activity-log entries hard-code the `ExternalSystem` category
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Low |
|
||||||
|
| Category | Correctness & logic bugs |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:344`, `:358` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`RetryParkedMessageAsync` and `DiscardParkedMessageAsync` raise an S&F activity
|
||||||
|
notification (consumed by Site Event Logging — WP-14) but pass a hard-coded
|
||||||
|
`StoreAndForwardCategory.ExternalSystem` as the category argument:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
RaiseActivity("Retry", StoreAndForwardCategory.ExternalSystem, $"Parked message {messageId} moved back to queue");
|
||||||
|
RaiseActivity("Discard", StoreAndForwardCategory.ExternalSystem, $"Parked message {messageId} discarded");
|
||||||
|
```
|
||||||
|
|
||||||
|
Both methods take only a `messageId` and never load the message, so they have no access
|
||||||
|
to its real category. When an operator retries or discards a parked **Notification** or
|
||||||
|
**CachedDbWrite** message, the site event log records the activity under
|
||||||
|
`ExternalSystem`. Every other `RaiseActivity` call in the service passes the message's
|
||||||
|
true `Category` (`EnqueueAsync`, `RetryMessageAsync`), so the operator paths are
|
||||||
|
inconsistent and produce mislabelled audit entries — misleading when an operator later
|
||||||
|
filters or reviews S&F activity by category.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Load the message (or have `StoreAndForwardStorage.RetryParkedMessageAsync` /
|
||||||
|
`DiscardParkedMessageAsync` return the affected row's category) and pass the real
|
||||||
|
`Category` to `RaiseActivity`. If loading the row is considered too costly on these
|
||||||
|
infrequent operator paths, change the `OnActivity` event / `RaiseActivity` signature to
|
||||||
|
allow a nullable category for management actions rather than asserting a false one.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|||||||
@@ -5,10 +5,10 @@
|
|||||||
| Module | `src/ScadaLink.TemplateEngine` |
|
| Module | `src/ScadaLink.TemplateEngine` |
|
||||||
| Design doc | `docs/requirements/Component-TemplateEngine.md` |
|
| Design doc | `docs/requirements/Component-TemplateEngine.md` |
|
||||||
| Status | Reviewed |
|
| Status | Reviewed |
|
||||||
| Last reviewed | 2026-05-16 |
|
| Last reviewed | 2026-05-17 |
|
||||||
| Reviewer | claude-agent |
|
| Reviewer | claude-agent |
|
||||||
| Commit reviewed | `9c60592` |
|
| Commit reviewed | `39d737e` |
|
||||||
| Open findings | 0 |
|
| Open findings | 2 |
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
@@ -29,11 +29,30 @@ create, optimistic concurrency on instance state) are claimed but not implemente
|
|||||||
Themes: validation that is weaker than the design promises, and asymmetric handling
|
Themes: validation that is weaker than the design promises, and asymmetric handling
|
||||||
of attributes vs. alarms vs. scripts throughout the resolve/flatten/derive paths.
|
of attributes vs. alarms vs. scripts throughout the resolve/flatten/derive paths.
|
||||||
|
|
||||||
|
#### Re-review 2026-05-17 (commit `39d737e`)
|
||||||
|
|
||||||
|
Re-reviewed the whole module against all ten checklist categories at commit
|
||||||
|
`39d737e`. All fourteen prior findings remain closed — the batch-4 fixes
|
||||||
|
(`bc88a36`/`804697f` and predecessors) hold up: the recursive composition walk,
|
||||||
|
the per-slot alarm override mechanism, the code-region-aware delimiter scanner,
|
||||||
|
and the single-source deletion-constraint logic are all correctly in place. Two
|
||||||
|
new Medium findings surfaced, both in the composition-cascade path and both
|
||||||
|
affecting **nested** (depth ≥ 2) compositions specifically — the same blind spot
|
||||||
|
that produced TemplateEngine-001. **TemplateEngine-015**: `RenameCompositionAsync`
|
||||||
|
renames only the directly slot-owned derived template, leaving cascaded inner
|
||||||
|
derived templates with a stale dotted-path name. **TemplateEngine-016**:
|
||||||
|
`FlatteningService` hard-codes `ScriptScope.ParentPath` to the empty string for
|
||||||
|
every composed script regardless of nesting depth, so a script two or more
|
||||||
|
levels deep cannot resolve `Parent.X` references to its real parent module.
|
||||||
|
Both are limited-impact (nested compositions are the less common case and there
|
||||||
|
is design-time visibility) but represent genuine drift from the recursive-nesting
|
||||||
|
design promise.
|
||||||
|
|
||||||
## Checklist coverage
|
## Checklist coverage
|
||||||
|
|
||||||
| # | Category | Examined | Notes |
|
| # | Category | Examined | Notes |
|
||||||
|---|----------|----------|-------|
|
|---|----------|----------|-------|
|
||||||
| 1 | Correctness & logic bugs | ✓ | Multiple real bugs: deep composed-member loss, derived alarms omitted, granularity bypass, no-op create-time collision block. |
|
| 1 | Correctness & logic bugs | ✓ | Prior bugs (001–005, 013) all resolved and verified. Re-review 2026-05-17 found two new nested-composition defects: rename does not cascade (TemplateEngine-015), composed-script `ParentPath` always empty (TemplateEngine-016). |
|
||||||
| 2 | Akka.NET conventions | ✓ | No actors in this module (`AddTemplateEngineActors` is an empty placeholder). Nothing to assess. |
|
| 2 | Akka.NET conventions | ✓ | No actors in this module (`AddTemplateEngineActors` is an empty placeholder). Nothing to assess. |
|
||||||
| 3 | Concurrency & thread safety | ✓ | Services are stateless, scoped per request; static helpers hold no mutable state. Design says template editing is last-write-wins; that is honoured. See TemplateEngine-010 re: a doc claim of optimistic concurrency that is not implemented. |
|
| 3 | Concurrency & thread safety | ✓ | Services are stateless, scoped per request; static helpers hold no mutable state. Design says template editing is last-write-wins; that is honoured. See TemplateEngine-010 re: a doc claim of optimistic concurrency that is not implemented. |
|
||||||
| 4 | Error handling & resilience | ✓ | `Result<T>` used consistently; repository nulls guarded. `FlatteningService` wraps in try/catch. No store-and-forward or failover surface in this module. |
|
| 4 | Error handling & resilience | ✓ | `Result<T>` used consistently; repository nulls guarded. `FlatteningService` wraps in try/catch. No store-and-forward or failover surface in this module. |
|
||||||
@@ -648,3 +667,102 @@ reports all blocking reasons and uses `TemplateDeletionService`'s phrasing — t
|
|||||||
affected `TemplateServiceTests` delete tests were updated to the unified messages,
|
affected `TemplateServiceTests` delete tests were updated to the unified messages,
|
||||||
and a regression test `DeleteTemplate_MultipleConstraints_ReportsAllNotJustFirst`
|
and a regression test `DeleteTemplate_MultipleConstraints_ReportsAllNotJustFirst`
|
||||||
verifies all three constraint categories are surfaced together.
|
verifies all three constraint categories are surfaced together.
|
||||||
|
|
||||||
|
### TemplateEngine-015 — `RenameCompositionAsync` does not cascade-rename nested derived templates
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Medium |
|
||||||
|
| Category | Correctness & logic bugs |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.TemplateEngine/TemplateService.cs:680` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`AddCompositionAsync` builds a cascade of derived templates whose names follow a
|
||||||
|
dotted path: composing `$Sensor` (which itself composes `$Probe` as `Probe1`)
|
||||||
|
into `$Pump` as `TempSensor` produces `$Pump.TempSensor` **and** the nested
|
||||||
|
`$Pump.TempSensor.Probe1` (see `CreateCascadedCompositionAsync` and the
|
||||||
|
`AddComposition_CascadesChildCompositions` test). `RenameCompositionAsync`,
|
||||||
|
however, renames only the **directly** slot-owned derived template:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
var derived = await _repository.GetTemplateByIdAsync(composition.ComposedTemplateId, ...);
|
||||||
|
if (derived != null && derived.IsDerived && derived.OwnerCompositionId == compositionId)
|
||||||
|
{
|
||||||
|
var newDerivedName = $"{owner.Name}.{newInstanceName}";
|
||||||
|
...
|
||||||
|
derived.Name = newDerivedName;
|
||||||
|
await _repository.UpdateTemplateAsync(derived, ...);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
There is no recursion into `derived.Compositions`. After renaming the `TempSensor`
|
||||||
|
slot to `MainSensor`, the parent derived becomes `$Pump.MainSensor` but the
|
||||||
|
cascaded child stays `$Pump.TempSensor.Probe1` — its name no longer reflects the
|
||||||
|
slot path it lives under, breaking the dotted-path naming invariant the cascade
|
||||||
|
otherwise maintains. `DeleteCompositionAsync` correctly recurses
|
||||||
|
(`CascadeDeleteDerivedAsync`), so rename is the asymmetric outlier. The
|
||||||
|
`RenameComposition_RenamesSlotAndDerivedTemplate` test only exercises a
|
||||||
|
single-level derived, so the gap is untested. The stale name also breaks the
|
||||||
|
`AddComposition_DerivedNameCollision_Fails` / cascade-name pre-check on any
|
||||||
|
subsequent compose that walks the now-inconsistent name tree.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Recurse over `derived.Compositions` (mirroring `CascadeDeleteDerivedAsync`),
|
||||||
|
re-deriving each cascaded child's name from the renamed parent
|
||||||
|
(`$"{parentDerivedName}.{childComposition.InstanceName}"`), and run the
|
||||||
|
existing same-name collision pre-check across every name the cascade will
|
||||||
|
produce — not just the top-level one. Add a regression test covering a
|
||||||
|
two-level cascade rename.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|
||||||
|
### TemplateEngine-016 — Composed-script `ScriptScope.ParentPath` is always empty, breaking `Parent.X` resolution for nested modules
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|--|--|
|
||||||
|
| Severity | Medium |
|
||||||
|
| Category | Correctness & logic bugs |
|
||||||
|
| Status | Open |
|
||||||
|
| Location | `src/ScadaLink.TemplateEngine/Flattening/FlatteningService.cs:750` |
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
`ResolveComposedScriptsRecursive` assigns each composed script a `ScriptScope`:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
Scope = new Commons.Types.Scripts.ScriptScope(SelfPath: prefix, ParentPath: "")
|
||||||
|
```
|
||||||
|
|
||||||
|
`prefix` is the accumulated path-qualified module path (`Outer` at depth 1,
|
||||||
|
`Outer.Inner` at depth 2, etc.), so `SelfPath` is correct. `ParentPath`, however,
|
||||||
|
is hard-coded to the empty string at every depth. Per `ScriptScope`'s own XML
|
||||||
|
doc, `ParentPath` is "computed at flattening time and seeded into the script's
|
||||||
|
globals … so `Attributes["X"]` / `Parent.X` can prepend the right path-prefix."
|
||||||
|
For a script directly composed at depth 1 the parent is the root and `""` is
|
||||||
|
correct, but for a script in a nested module (`Outer.Inner.Foo`) the parent
|
||||||
|
module is `Outer` — yet `ParentPath` is still `""`. A nested composed script
|
||||||
|
that references `Parent.X` will therefore resolve the reference against the root
|
||||||
|
flat namespace instead of its actual parent module, reading the wrong attribute
|
||||||
|
(or failing to find one). This is the same depth-≥2 nesting blind spot as
|
||||||
|
TemplateEngine-001; the recursive walk was added there but the `Scope`
|
||||||
|
construction was not updated to carry the parent path. `ResolveComposedScripts`
|
||||||
|
for direct (root-template) scripts leaves `Scope` at the default `ScriptScope.Root`,
|
||||||
|
which is correct.
|
||||||
|
|
||||||
|
**Recommendation**
|
||||||
|
|
||||||
|
Thread the parent module path through `ResolveComposedScriptsRecursive` (the
|
||||||
|
caller already knows it — it is the `prefix` of the enclosing recursion frame,
|
||||||
|
or `""` for a depth-1 composition) and set
|
||||||
|
`ParentPath` to that value, so `SelfPath = "Outer.Inner"` pairs with
|
||||||
|
`ParentPath = "Outer"`. Add a flattening test asserting the `Scope` of a
|
||||||
|
two-level composed script.
|
||||||
|
|
||||||
|
**Resolution**
|
||||||
|
|
||||||
|
_Unresolved._
|
||||||
|
|||||||
Reference in New Issue
Block a user