fix(communication): resolve Communication-001 — early stream termination handling

DebugStreamService.StartStreamAsync awaited the initial debug snapshot inside
a try whose only handler was catch (OperationCanceledException). When the
stream terminated before the snapshot arrived, onTerminatedWrapper completed
the await with an InvalidOperationException that escaped the catch — the
caller got a raw, untranslated exception and the service did no teardown of
its own on that path.

Replaced with catch (Exception): it removes the session entry, sends
StopDebugStream to the bridge actor via the local reference (deterministic
teardown, idempotent), and throws a descriptive exception — TimeoutException
for the 30s timeout, otherwise an InvalidOperationException naming the
instance/site and wrapping the cause.

Re-triaged Critical -> Medium: the originally-claimed multi-minute site-side
resource leak does not occur (the bridge actor self-terminates on every
onTerminated path). Adds the first DebugStreamService test, which fails
against the pre-fix code.
This commit is contained in:
Joseph Doherty
2026-05-16 18:32:52 -04:00
parent 239bee3bc4
commit a9ceba00d0
4 changed files with 146 additions and 44 deletions

View File

@@ -30,15 +30,15 @@ code-reviews/
All 19 modules were reviewed at commit `9c60592` (241 findings: 6 Critical, 46 High,
100 Medium, 89 Low). The tables below track what remains **open** as findings are
resolved.
resolved and re-triaged.
| Severity | Open findings |
|----------|---------------|
| Critical | 5 |
| Critical | 4 |
| High | 46 |
| Medium | 100 |
| Low | 89 |
| **Total** | **240** |
| **Total** | **239** |
## Module Status
@@ -48,7 +48,7 @@ resolved.
| [CLI](CLI/findings.md) | 2026-05-16 | `9c60592` | 0/1/6/6 | 13 | 13 |
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-16 | `9c60592` | 0/1/4/3 | 8 | 8 |
| [Commons](Commons/findings.md) | 2026-05-16 | `9c60592` | 0/0/4/8 | 12 | 12 |
| [Communication](Communication/findings.md) | 2026-05-16 | `9c60592` | 1/2/5/3 | 11 | 11 |
| [Communication](Communication/findings.md) | 2026-05-16 | `9c60592` | 0/2/5/3 | 10 | 11 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-16 | `9c60592` | 0/1/4/6 | 11 | 11 |
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-16 | `9c60592` | 0/4/6/2 | 12 | 13 |
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-16 | `9c60592` | 0/3/6/5 | 14 | 14 |
@@ -71,12 +71,11 @@ Resolved findings drop off this list but remain recorded in their module's
`findings.md` (see [REVIEW-PROCESS.md](REVIEW-PROCESS.md) §4§5). Full detail —
description, location, recommendation — lives in the module's `findings.md`.
### Critical (5)
### Critical (4)
| ID | Module | Title |
|----|--------|-------|
| CentralUI-001 | [CentralUI](CentralUI/findings.md) | Test Run sandbox executes arbitrary C# with no trust-model enforcement |
| Communication-001 | [Communication](Communication/findings.md) | Snapshot timeout leaves orphaned bridge actor and site subscription |
| ExternalSystemGateway-001 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | No S&F delivery handler registered; cached calls and writes can never be delivered |
| NotificationService-001 | [NotificationService](NotificationService/findings.md) | Buffered notifications are never retried (no S&F delivery handler) |
| StoreAndForward-001 | [StoreAndForward](StoreAndForward/findings.md) | Replication to standby is never triggered by the active node |