code-reviews: 2026-06-25 re-review of Galaxy-adoption delta at 3cd7776

Re-review Server + Tests modules over 88915c3..3cd7776 (the
ZB.MOM.WW.GalaxyRepository 0.2.0 adoption + array-write fixes).
Security-critical browse-scope wiring verified sound. 3 new findings:

- Server-059 (Medium): dashboard Galaxy summary memoized only on
  cache Sequence -> status/timestamps freeze during a Galaxy SQL outage.
- Server-060 (Low): no DI test asserts IGalaxyBrowseScopeProvider
  resolves to GatewayBrowseScopeProvider (registration-order invariant).
- Tests-041 (Medium): memoization invalidation path untested.
This commit is contained in:
Joseph Doherty
2026-06-25 13:21:23 -04:00
parent 3cd7776fe8
commit b062cc0440
3 changed files with 166 additions and 9 deletions
+81 -3
View File
@@ -4,13 +4,56 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.Server` |
| Reviewer | Claude Code |
| Review date | 2026-06-18 |
| Commit reviewed | `88915c3` |
| Review date | 2026-06-25 |
| Commit reviewed | `3cd7776` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 2 |
## Checklist coverage
### 2026-06-25 re-review (commit 3cd7776)
Scoped re-review of the Galaxy-library adoption + array-write fix delta
(`git diff 88915c3..3cd7776 -- src/ZB.MOM.WW.MxGateway.Server`, commits
`8e196a7`/`80bf4ac`/`8678b6c` for the adoption and `2671639` for the array-write
finding fixes). The big change deletes all inline `Galaxy/*` and `Grpc/Galaxy*`
code (2961 LOC) and adopts `ZB.MOM.WW.GalaxyRepository` 0.2.0 via
`AddZbGalaxyRepository("MxGateway:Galaxy")` + `MapZbGalaxyRepository()`; the
mxaccessgw-only pieces reviewed hardest are `GatewayBrowseScopeProvider` (NEW),
the registration order in `GatewayApplication.cs`, the galaxy request-type rebind
in `GatewayGrpcScopeResolver`, and the host-side `DashboardGalaxySummaryProjector`
+ the `DashboardSnapshotService` summary memo.
Security wiring verified sound: the gateway registers
`AddSingleton<IGalaxyBrowseScopeProvider, GatewayBrowseScopeProvider>()` **before**
`AddZbGalaxyRepository`, and the lib's default is `TryAddSingleton<…,
NullGalaxyBrowseScopeProvider>` (confirmed in lib source), so the gateway provider
wins and per-key browse scoping is NOT silently disabled. All five lib galaxy RPC
request types (`TestConnectionRequest`, `GetLastDeployTimeRequest`,
`DiscoverHierarchyRequest`, `WatchDeployEventsRequest`, `BrowseChildrenRequest`)
map to `MetadataRead`; the switch default is `Admin` (fail-closed, not permissive).
The global authz interceptor authenticates/authorizes and pushes the ambient
identity before the lib service runs (it is a global gRPC interceptor that wraps
the continuation), so `GatewayBrowseScopeProvider.ResolveBrowseSubtrees` reads the
correct identity; a constrained key always carries non-empty `BrowseSubtrees`, so
it can never be widened. The array-write fixes (Server-057 resolution) are correct:
`AddBufferedItem`/`AddItemBulk` are now normalized at the outbound choke point and
re-normalized (or, for bulk, read from the worker-echoed `SubscribeResult.TagAddress`)
at the tracking site so registrations match the write-capable handle the worker bound.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Server-059 (`DashboardSnapshotService` memoizes the Galaxy summary by `entry.Sequence` only, but the lib mutates `Status`/`LastQueriedAt`/`LastSuccessAt`/`LastError` via `previous with {…}` and the `Current` age-based `ProjectStatus` **without** bumping `Sequence`, so the dashboard serves a stale summary). Array-write fixes verified correct. |
| 2 | mxaccessgw conventions | No issues found — file-scoped namespaces, `sealed`, `Async` suffixes, primary constructors, MXAccess-aligned names; gateway never touches COM; no UI component libraries; no secrets/tag-values logged; generated code untouched. |
| 3 | Concurrency & thread safety | No issues found — the summary memo is a `Volatile` read/write of an immutable record reference; concurrent recompute is benign (pure projection, identical content per sequence). Array-write normalization mutates deep-cloned command copies, never client state. |
| 4 | Error handling & resilience | No issues found — invalid sparse-array payloads still surface as `RpcException(InvalidArgument)`; the lib refresh degrades to Stale/Unavailable; the scope provider returns empty only when no identity is present (documented, unreachable under the global interceptor). |
| 5 | Security | No issues found — scope-provider registration order, the lib `TryAddSingleton` default, the five-RPC → `MetadataRead` mapping, the fail-closed `Admin` default, and the ambient-identity invariant all verified; `ConstraintEnforcer` rebind is namespace-only; the dashboard browse passes `browseSubtreeGlobs: null` intentionally (LDAP operator UI, not an API-key-scoped client). |
| 6 | Performance & resource management | No issues found — the summary memo restores O(1)-per-tick for the common case; the projector allocates only on a sequence change. |
| 7 | Design-document adherence | No issues found in Server source — implementation matches `A2-galaxyrepository-adoption-handoff.md` and the updated `CLAUDE.md`. (Observation, out of Server scope: `ZB.MOM.WW.MxGateway.Contracts` still carries the now-unused `galaxy_repository.proto` + generated code; CLAUDE.md line 14 still lists it as Contracts-owned — a Contracts-module cleanup.) |
| 8 | Code organization & conventions | No issues found — new types live under `Security/Authorization/` and `Dashboard/`; the registration is explicit and commented. (`DashboardGalaxyProjector` now trivially forwards to the public `DashboardGalaxySummaryProjector`; harmless, not filed.) |
| 9 | Testing coverage | Issues found: Server-060 (no test resolves `IGalaxyBrowseScopeProvider` from the gateway DI container to assert `GatewayBrowseScopeProvider` — not the lib's no-op default — wins; the registration-order invariant is the highest-blast-radius wiring in this delta yet is guarded only by manual construction in `GalaxyRepositoryHostWiringTests`). The Server-059 memo path is also untested. Otherwise strong: `GatewayBrowseScopeProviderTests`, `GalaxyRepositoryHostWiringTests`, `DashboardGalaxySummaryProjectorTests`. |
| 10 | Documentation & comments | No issues found beyond Server-059 — the `DashboardSnapshotService` memo comment ("an unchanged sequence means the entry … is unchanged") is inaccurate and is the root justification for that bug; captured under Server-059. |
### 2026-05-20 review (commit 1cd51bb)
This row summarizes the 2026-05-20 review pass at commit `1cd51bb`. Findings from
@@ -1131,3 +1174,38 @@ Additionally, `GatewayAlarmMonitor.ApplyProviderModeChangeAsync` increments the
**Recommendation:** Add a `CheckReadTagAsync` (and a `CheckWriteHandleAsync`) case where the bare/suffixed array attribute resolves but the configured `ReadTagGlobs`/`WriteSubtrees` exclude it, asserting a `read_scope`/`write_scope` `ConstraintFailure` is still returned; and a `CheckWriteHandleAsync` case asserting `MaxWriteClassification` is enforced against the array attribute's `SecurityClassification` via the suffixed registration address.
**Resolution:** 2026-06-18 — Added three `ConstraintEnforcerTests` cases (the test fixture gained a second array attribute `Pump_001.Setpoints[]` with `SecurityClassification = 2` to exercise the classification path): `CheckReadTagAsync_WithBareArrayName_OutOfScope_StillDeniedReadScope` (bare array resolves via the `[]` fallback but is denied `read_scope` when out of `ReadTagGlobs` — guards that the fallback widened resolution, not authorization), `CheckWriteHandleAsync_WithSuffixedArrayRegistration_OutOfScope_StillDeniedWriteScope` (an array handle whose registration `TagAddress` is the suffixed `Pump_001.Levels[]` resolves through `ResolveTarget` and is denied `write_scope`), and `CheckWriteHandleAsync_WithSuffixedArrayRegistration_ClassificationTooHigh_StillDenied` (in-scope suffixed array handle denied `max_write_classification` when the attribute's `SecurityClassification` exceeds `MaxWriteClassification`). Tests-only change; no production code touched.
### Server-059
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSnapshotService.cs:109-128` |
| Status | Open |
**Description:** `ResolveGalaxySummary` memoizes the projected dashboard Galaxy summary keyed **only** on `entry.Sequence` and returns the cached summary whenever the sequence is unchanged. The comment asserts "an unchanged sequence means the entry (and therefore its summary) is unchanged," but that is false for the adopted `ZB.MOM.WW.GalaxyRepository` 0.2.0 cache: `Sequence` is bumped **only on a heavy refresh that detects a deploy change** (`GalaxyHierarchyCache.RefreshCoreAsync`, `nextSequence = previous.Sequence + 1`). Three other paths replace `_current` via `previous with { … }` **preserving the same `Sequence`** while changing fields the dashboard summary surfaces:
- The no-deploy-change tick (the common steady-state path) updates `LastQueriedAt`/`LastSuccessAt` (and clears `LastError`) every interval at the same sequence.
- The refresh-failure path sets `Status = Stale`/`Unavailable` and `LastError = exception.Message` at the same sequence.
- The `Current` getter's age-based `ProjectStatus` returns `snapshot with { Status = Stale }` at the same sequence.
Because the memo keys on `Sequence` alone, the dashboard serves a stale summary: `LastQueriedAt`/`LastSuccessAt` appear frozen at the last heavy-refresh time even though refreshes keep succeeding, and — most importantly — when the Galaxy SQL database goes unreachable the panel keeps showing `Healthy` with no error for the entire outage (the sequence cannot advance because no heavy refresh can succeed). The Galaxy health indicator, whose purpose is to surface Stale/Unavailable, is defeated exactly when it is needed. Impact is limited to the operator dashboard display (the gRPC browse RPCs and `ConstraintEnforcer` read `cache.Current` directly and are unaffected), but it is a regression from the pre-adoption code, which projected `_galaxyHierarchyCache.Current` on every snapshot tick.
**Recommendation:** Do not memoize the cheap, per-tick-volatile fields. Memoize only the O(N)-to-compute parts that genuinely change with `Sequence` — the `TopTemplates` and `ObjectCategories` lists derived from `entry.Objects` — keyed on `Sequence`, and build the `DashboardGalaxySummary` fresh each call from the current entry's `Status`, `LastQueriedAt`, `LastSuccessAt`, `LastDeployTime`, `LastError`, and the precomputed counts (all cheap copies). Add a `DashboardSnapshotServiceTests` case that swaps the cache's `Current` to a new entry with the **same** `Sequence` but a changed `Status`/`LastError`/`LastQueriedAt` and asserts the next `GetSnapshot().Galaxy` reflects the change. Correct the misleading memo comment.
**Resolution:** _(open)_
### Server-060
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:95-99`, `src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs` |
| Status | Open |
**Description:** The per-key browse-subtree scoping depends entirely on the gateway's `AddSingleton<IGalaxyBrowseScopeProvider, GatewayBrowseScopeProvider>()` being registered **before** `AddZbGalaxyRepository(...)`, so that the library's `TryAddSingleton<IGalaxyBrowseScopeProvider, NullGalaxyBrowseScopeProvider>` (no scoping → full hierarchy) loses. This is the highest-blast-radius wiring in the adoption delta: if it ever regressed (the gateway line moved after `AddZbGalaxyRepository`, or the lib switched its default to a plain `AddSingleton`), **all per-API-key browse scoping would silently disable** with no error — a metadata-scoped key would see the entire Galaxy. The provider itself is well covered in isolation (`GatewayBrowseScopeProviderTests`) and end-to-end via manual construction (`GalaxyRepositoryHostWiringTests`), but **no test builds the gateway DI container and asserts `GetRequiredService<IGalaxyBrowseScopeProvider>()` resolves to `GatewayBrowseScopeProvider` rather than `NullGalaxyBrowseScopeProvider`** — the registration-order invariant is guarded only by the inline comment. (The Server-059 memo path is likewise untested; see that finding.)
**Recommendation:** Add a `GatewayApplicationTests` case that runs `GatewayApplication.CreateBuilder([])`, builds the service provider, and asserts `GetRequiredService<IGalaxyBrowseScopeProvider>()` is a `GatewayBrowseScopeProvider`. This pins the registration order and the lib's `TryAdd`-default contract so a future reorder or a lib-default change fails a fast unit test instead of silently widening data exposure.
**Resolution:** _(open)_