docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked
Full per-module re-review of the 16 stale modules (last seen1eb6e97/ 2026-05-28) plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD4307c381. 67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commitfd618cf1closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings (IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision. Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024), an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001), and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary is semantically sound (symbol-based) in the production cluster config. README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ZB.MOM.WW.ScadaBridge.Communication` |
|
||||
| Design doc | `docs/requirements/Component-Communication.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Last reviewed | 2026-06-20 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Open findings | 2 |
|
||||
| Commit reviewed | `4307c381` |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -68,6 +68,41 @@ keyed by caller-supplied `CorrelationId` could orphan a subscriber on ID reuse
|
||||
(Communication-022). Seven new findings, all Open: one High, one Medium, five
|
||||
Low.
|
||||
|
||||
#### Re-review 2026-06-20 (commit `4307c381`) — full review
|
||||
|
||||
All prior findings (Communication-001..022) remain closed (Resolved, or Resolved-by-Comm-016).
|
||||
This module was renamed `ScadaLink → ZB.MOM.WW.ScadaBridge` since `1eb6e97`, so the
|
||||
`git diff 1eb6e97 HEAD` shows the whole module as added; the review re-examined the
|
||||
full current state across all 10 categories, with focus on the surface added since the
|
||||
last review — the four new gRPC RPCs on `SiteStreamService` (`IngestAuditEvents`,
|
||||
`IngestCachedTelemetry`, `PullAuditEvents`, `PullSiteCalls`) and the M7 command/control
|
||||
forwards (`SearchAddressSpaceCommand`, `WriteTagRequest`, `VerifyEndpointCommand`,
|
||||
cert-trust commands). The module is healthy: actor state stays on the actor thread,
|
||||
`PipeTo`/`Forward`/Sender-capture are clean, gRPC channel/stream/CTS lifecycles are
|
||||
disciplined, the duplicate-stream and Subscribe-throws cleanup paths are correct, and
|
||||
the gRPC ingest/pull handlers degrade best-effort (empty ack / empty response, rows stay
|
||||
Pending) on fault. The only new findings are **design-doc drift**: the doc still claims
|
||||
`SiteStreamService` has "a single RPC `SubscribeInstance`" and that audit/telemetry
|
||||
reconciliation rides ClusterClient, when in fact four RPCs now exist and the
|
||||
reconciliation *pull* runs over gRPC (Communication-023); and the AlarmStateUpdate field
|
||||
table stops at field 21 while the proto/code carry fields 22–23 (Communication-024). Two
|
||||
new findings, both Open: zero Critical/High, zero Medium-code, one Medium-doc, one Low.
|
||||
|
||||
## Checklist coverage 2026-06-20
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ✓ | `SubscribeInstance` duplicate-removal + Subscribe-throws cleanup (Comm-021) correct; telemetry open/close gauge is balanced (counted only after Subscribe succeeds, decremented in finally). `DebugStreamBridgeActor` stream-first buffer/dedup sound. No new issues. |
|
||||
| 2 | Akka.NET conventions | ✓ | Coordinator `Resume` strategies present on both actors; `Forward` preserves Sender on every site relay; `SiteAddressCacheLoaded` now read-only-typed (Comm-020). gRPC ingest handlers capture Sender before `PipeTo`. No new issues. |
|
||||
| 3 | Concurrency & thread safety | ✓ | `_activeStreams`/`_subscriptions` use ownership-preserving `TryRemove(KeyValuePair)`; `GetOrCreate` uses `AddOrUpdate` atomically; bridge state mutated only on the actor thread via `Self.Tell`. No new issues. |
|
||||
| 4 | Error handling & resilience | ✓ | gRPC `Ingest*`/`Pull*` degrade to empty ack/response on fault (rows stay Pending) — divergent from the ClusterClient path's `Status.Failure` pipe but behaviourally equivalent (both → Pending+retry), and explicitly documented. `LoadSiteAddressesFromDb` lifecycle-CT bounded (Comm-019). No new issues. |
|
||||
| 5 | Security | ✓ | `correlation_id` validated against `ActorPath.IsValidPathElement` before actor naming (Comm-014). `Pull*` reject `batch_size <= 0` with `InvalidArgument`. Secrets ride encrypted blocks, not this layer. No new issues. |
|
||||
| 6 | Performance & resource management | ✓ | gRPC channel disposed synchronously (Comm-007); per-stream CTS `CancelAfter(GrpcMaxStreamLifetime)`; bounded `Channel` (1000, DropOldest). `RemoveSiteAsync` still test-only but Comm-013 documents that endpoint change is handled by `GetOrCreate`. No new issues. |
|
||||
| 7 | Design-document adherence | ✓ | Doc claims a *single* RPC and ClusterClient-only reconciliation; reality is 4 RPCs + gRPC reconciliation pulls (Communication-023). AlarmStateUpdate field table stops at 21; proto/code carry 22–23 (Communication-024). |
|
||||
| 8 | Code organization & conventions | ✓ | DTO mappers correctly hosted in Communication (owns generated DTOs); proto evolution is additive (field numbers 22–23 appended, never reused); Options pattern intact. No new issues. |
|
||||
| 9 | Testing coverage | ✓ | `ProtoContractTests` guards oneof variants; `StreamRelayActorTests` + `ConvertToDomainEvent` cover enriched fields 8–23 end-to-end; dedicated proto tests cover the audit/telemetry/pull DTOs. `ProtoRoundtripTests.AlarmStateUpdate_RoundTrip` only asserts fields 1–5, but the gap is covered elsewhere — no separate finding. |
|
||||
| 10 | Documentation & comments | ✓ | In-code XML docs are thorough and accurate (the prior-finding comments are precise). The drift is in the design doc only (Communication-023/024), not the code comments. |
|
||||
|
||||
## Checklist coverage 2026-05-28
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
@@ -1120,3 +1155,102 @@ The `_debugSubscriptions` dictionary, `TrackMessageForCleanup` helper, and the
|
||||
`HandleConnectionStateChanged` handler that consumed them were all deleted as
|
||||
part of Comm-016's dead-code purge. There is no longer any caller-supplied
|
||||
correlation-id keyed map to overwrite — the orphan-on-reuse hazard is gone.
|
||||
|
||||
---
|
||||
|
||||
### Communication-023 — Design doc describes `SiteStreamService` as a single RPC and reconciliation as ClusterClient-only; both are now stale
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Resolved |
|
||||
| Location | `docs/requirements/Component-Communication.md:85`, `docs/requirements/Component-Communication.md:160-166`, `src/ZB.MOM.WW.ScadaBridge.Communication/Protos/sitestream.proto:8-14` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design doc's "gRPC Proto Definition" section states: *"**Service**: `SiteStreamService`
|
||||
with **a single RPC** `SubscribeInstance(InstanceStreamRequest) returns (stream
|
||||
SiteStreamEvent)`."* (line 85). The proto and the generated service now expose **five**
|
||||
RPCs — `SubscribeInstance`, `IngestAuditEvents`, `IngestCachedTelemetry`,
|
||||
`PullAuditEvents`, and `PullSiteCalls` (`sitestream.proto:8-14`,
|
||||
`SiteStreamGrpcServer.cs:211/330/386/436/519`). The four new RPCs are not mentioned
|
||||
anywhere in the component doc — neither the proto section, the responsibilities, the
|
||||
topology diagram, nor the Dependencies (the server now also depends on `ISiteAuditQueue`
|
||||
and `IOperationTrackingStore`).
|
||||
|
||||
This drift also contradicts two transport statements elsewhere in the same doc. The
|
||||
"Cached Call Telemetry" section (line 166) and the "Notification Submission" section
|
||||
(line 158) both assert **"Transport: ClusterClient"**, and the reconciliation-pull
|
||||
paragraph (line 164) describes `CachedCallReconcileRequest`/`CachedCallReconcileResponse`.
|
||||
In the shipped code the reconciliation **pull** runs over **gRPC**, not ClusterClient:
|
||||
`GrpcPullAuditEventsClient` / `GrpcPullSiteCallsClient`
|
||||
(`src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/`) call the `PullAuditEvents` / `PullSiteCalls`
|
||||
RPCs against `SiteStreamGrpcServer`. So the gRPC channel — documented as carrying
|
||||
**real-time data only** ("`SiteCommunicationActor` and `CentralCommunicationActor` have no
|
||||
role in streaming event delivery", "gRPC server-streaming for real-time data") — now also
|
||||
carries site→central audit/telemetry **ingest** and central→site **reconciliation pulls**.
|
||||
A reader relying on the doc to understand the two-transport split (the central CLAUDE.md
|
||||
"Data & Communication" invariant) would be materially misled about what flows over which
|
||||
transport.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Update `Component-Communication.md`: (1) replace "a single RPC" with the full RPC list
|
||||
and document each new RPC's direction, payload, and idempotency; (2) reconcile the
|
||||
"Transport: ClusterClient" / `CachedCallReconcile*` wording in §10 (and the Site Call
|
||||
Audit / Audit Log interaction notes) with the gRPC `PullAuditEvents` / `PullSiteCalls`
|
||||
reality; (3) add `ISiteAuditQueue` / `IOperationTrackingStore` to the SiteStreamGrpcServer
|
||||
description and Dependencies. If the doc's "gRPC = real-time data only" framing is meant
|
||||
to be a hard invariant, state explicitly that audit/telemetry ingest and reconciliation
|
||||
are the documented exceptions.
|
||||
|
||||
**Resolution**
|
||||
|
||||
Resolved 2026-06-20 (commit `fd618cf1`): the design doc (Component-Communication.md) now
|
||||
enumerates all five `SiteStreamService` RPCs (the one server-streaming `SubscribeInstance`
|
||||
plus the unary `IngestAuditEvents`/`IngestCachedTelemetry`/`PullAuditEvents`/`PullSiteCalls`),
|
||||
reconciles the §10 reconciliation wording with the gRPC-pull reality (push telemetry stays
|
||||
ClusterClient; reconciliation pulls are gRPC request/response), and adds `ISiteAuditQueue`/`IOperationTrackingStore`
|
||||
to Dependencies. Doc-only.
|
||||
|
||||
---
|
||||
|
||||
### Communication-024 — AlarmStateUpdate proto carries fields 22–23 but the design-doc field table stops at 21
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Resolved |
|
||||
| Location | `docs/requirements/Component-Communication.md:92-114`, `src/ZB.MOM.WW.ScadaBridge.Communication/Protos/sitestream.proto:87-88` |
|
||||
|
||||
**Description**
|
||||
|
||||
The "Enriched AlarmStateUpdate" section states the message "was extended **additively**:
|
||||
existing fields 1–7 are unchanged, and **fields 8–21** carry the enriched native-alarm
|
||||
state" (line 92), and the field table enumerates fields 8–21 (lines 94-109). The proto
|
||||
now also defines **field 22 `native_source_canonical_name`** and **field 23
|
||||
`is_configured_placeholder`** (`sitestream.proto:87-88`), both populated on the server
|
||||
(`StreamRelayActor.HandleAlarmStateChanged`, lines 97-98) and read back on the client
|
||||
(`SiteStreamGrpcClient.ConvertToDomainEvent`, lines 257-258). The doc's field table and
|
||||
its "fields 8–21" / "repopulated from fields 8–21" prose (lines 92, 112) are stale — they
|
||||
omit fields 22–23. `is_configured_placeholder` in particular is load-bearing: it is the
|
||||
discriminator `StreamRelayActor` uses to drop snapshot-only placeholder rows before
|
||||
packing the live stream (lines 63-66), a behaviour the doc does not capture.
|
||||
|
||||
The proto evolution itself is correct (additive, field numbers appended and never reused),
|
||||
so this is purely doc staleness, not a contract regression.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Extend the AlarmStateUpdate field table to fields 22–23, update the "fields 8–21" prose to
|
||||
"8–23" in both the intro and the client-mapping bullet, and add a one-line note that
|
||||
`is_configured_placeholder` rows are dropped at the relay (never relayed to the live gRPC
|
||||
stream).
|
||||
|
||||
**Resolution**
|
||||
|
||||
Resolved 2026-06-20 (commit `fd618cf1`): the `AlarmStateUpdate` field table was extended to
|
||||
fields 22–23 (`native_source_canonical_name`, `is_configured_placeholder`) with a note that
|
||||
placeholder rows are dropped at the relay. Doc-only.
|
||||
|
||||
Reference in New Issue
Block a user