Files
mxaccessgw/docs/plans/2026-05-28-lazy-browse-design.md
T

11 KiB

Lazy Browse for Galaxy Repository

Date: 2026-05-28 Status: approved, ready for implementation plan

Problem

GalaxyRepository.DiscoverHierarchy returns the whole hierarchy (paged, but clients still walk every object). The dashboard BrowsePage then materializes the full tree into a Blazor component graph on every render. For large deployments this means oversized gRPC replies for external clients and a heavy DOM for the dashboard, even when an operator only wants to expand a single Area.

External integrations (OPC UA bridges in particular) need OPC UA-style "browse one level on demand" semantics instead of bulk download.

Scope

Lazy browse lands at three layers:

  1. gRPC — a new BrowseChildren RPC that returns the direct children of a parent.
  2. DashboardBrowsePage loads roots only and fetches each level on expand, in-process (no gRPC self-hop).
  3. Server projection — a parent→children index added to the existing IGalaxyHierarchyCache entry; reused by both the RPC and the dashboard service.

Out of scope: changing how the gateway loads Galaxy SQL. The hierarchy cache still pulls the full deployment on each deploy change; persistence and the dashboard summary are unchanged. The lazy story is wire-side and UI-side only.

Architecture

gRPC client                           Dashboard (Blazor circuit)
   |                                       |
   v                                       v
GalaxyRepositoryGrpcService           IDashboardBrowseService (in-process)
   .BrowseChildren                        .GetRoots / .GetChildren
        \                                /
         \                              /
          v                            v
        GalaxyBrowseProjector.ProjectChildren
                       |
                       v
        IGalaxyHierarchyCache.Current  (GalaxyHierarchyCacheEntry)
                       |
                       +-- ObjectViews         (existing)
                       +-- ChildrenByParent    (NEW, built once per refresh)

GalaxyBrowseProjector is the single source of truth for "direct children of a parent, filtered, ordered, paged." Both the gRPC handler and the dashboard service call it.

Proto contract

Added to src/ZB.MOM.WW.MxGateway.Contracts/Protos/galaxy_repository.proto, additive only per the existing wire-compatibility policy.

service GalaxyRepository {
  // ... existing RPCs ...
  rpc BrowseChildren(BrowseChildrenRequest) returns (BrowseChildrenReply);
}

message BrowseChildrenRequest {
  oneof parent {
    int32 parent_gobject_id = 1;
    string parent_tag_name = 2;
    string parent_contained_path = 3;
  }
  int32 page_size = 4;                  // default 500, max 5000
  string page_token = 5;                // opaque

  // Filter parity with DiscoverHierarchy. AND-combined.
  repeated int32 category_ids = 6;
  repeated string template_chain_contains = 7;
  string tag_name_glob = 8;
  optional bool include_attributes = 9; // default true
  bool alarm_bearing_only = 10;
  bool historized_only = 11;
}

message BrowseChildrenReply {
  repeated GalaxyObject children = 1;
  string next_page_token = 2;
  int32 total_child_count = 3;          // matching direct children (post-filter)
  repeated bool child_has_children = 4; // parallel-indexed with `children`
  uint64 cache_sequence = 5;
}

GalaxyObject and GalaxyAttribute are unchanged. Root browse = empty parent oneof.

Error mapping

Condition Status
Unknown parent_gobject_id / parent_tag_name / parent_contained_path NotFound
Stale page_token (cache deployed forward) InvalidArgument; current cache_sequence in trailers
Filter set differs between pages of the same token InvalidArgument
First load not complete within 5s Unavailable
API key missing metadata:read scope PermissionDenied
No API key Unauthenticated

Stale and filter-changed page tokens both surface as InvalidArgument — same contract as DiscoverHierarchy, since BrowseChildren reuses the same token encoding (sequence:filter-signature:offset).

browse_subtrees API-key constraints intersect with the request as today.

Server projection

New index on GalaxyHierarchyCacheEntry

Built once per refresh, in GalaxyHierarchyIndex:

// gobject_id -> direct children, sorted areas-first then by display name.
// Roots (parent_gobject_id == 0 or absent) live under sentinel key 0.
IReadOnlyDictionary<int, IReadOnlyList<GalaxyObjectView>> ChildrenByParent;

Cost: one O(n) pass over ObjectViews during refresh. Memory: O(n) — one int plus a list reference per object, negligible next to the existing object list.

GalaxyBrowseProjector.ProjectChildren

  1. Resolve parent selector to parent_gobject_id (root sentinel for empty). Unknown parent → NotFound.
  2. Look up direct children from ChildrenByParent.
  3. Apply the same filter pipeline as DiscoverHierarchy. Extract the filter logic from GalaxyHierarchyProjector.ApplyFilters into a shared helper used by both projectors so filter semantics cannot drift.
  4. Intersect with API-key browse_subtrees globs.
  5. Memoize the filtered, ordered list keyed by (parent_id, filter_signature) in a ConditionalWeakTable<GalaxyHierarchyCacheEntry, ...> — mirrors the existing GalaxyHierarchyProjector.FilteredViewCache pattern; GC'd when the entry is.
  6. Compute child_has_children for each returned child against the filtered descendant set (recurse down ChildrenByParent, short-circuit on first match; memoize the per-child boolean alongside the filtered list).
  7. Slice page; encode next_page_token = {cache_sequence, parent_id, filter_signature, offset} base64 protobuf.

Sort order: areas first, then OrdinalIgnoreCase by display name — identical to DashboardBrowseTreeBuilder.CompareNodes so the dashboard and external clients see the same ordering.

gRPC handler

GalaxyRepositoryGrpcService.BrowseChildren:

  1. Wait up to 5s for first cache load (same WaitForFirstLoadAsync pattern as DiscoverHierarchy).
  2. Read Current entry.
  3. Call GalaxyBrowseProjector.ProjectChildren.
  4. Map GalaxyObjectViewGalaxyObject proto via existing GalaxyProtoMapper.MapObject. Skeleton branch (include_attributes=false) omits the attribute list — already supported by the mapper for DiscoverHierarchy.

Authorization: GatewayGrpcScopeResolver.BrowseChildren requires metadata:read — same scope as DiscoverHierarchy.

Dashboard

New in-process service:

public interface IDashboardBrowseService
{
    BrowseLevelResult GetRoots(BrowseFilterArgs filter);
    BrowseLevelResult GetChildren(int parentGobjectId, BrowseFilterArgs filter);
    GalaxyObject? FindByContainedPath(string path); // for deep-link expansion
}

public sealed record BrowseLevelResult(
    IReadOnlyList<DashboardBrowseNode> Nodes,
    int TotalCount,
    ulong CacheSequence);

Backed by the same GalaxyBrowseProjector the gRPC service uses — dashboard and external clients render identical results.

BrowsePage.razor changes:

  • Initial render fetches roots only.
  • Each DashboardBrowseNode tracks LoadState (NotLoaded / Loading / Loaded / Error).
  • BrowseTreeNodeView.razor shows the expand triangle from HasChildrenHint (the projector's child_has_children) regardless of whether children have been loaded yet. First expand triggers an async load and re-render.
  • Loaded children are kept for the lifetime of the Blazor circuit OR until the cache sequence advances (subscribed via the existing IGalaxyDeployNotifier). On invalidation: collapse all expansions, show a one-line "Galaxy redeployed, click to re-expand" hint.
  • Attributes render inline under their object (unchanged) since include_attributes=true is the default.
  • Search box: when non-empty, fall back to the existing bulk DiscoverHierarchy path + DashboardBrowseTreeBuilder. Lazy browse is for unfiltered exploration; building a streaming search is out of scope.

DashboardBrowseTreeBuilder.Build stays, used by the search fallback and the existing tests.

Testing

Unit tests (no live MXAccess / Galaxy required):

  • GalaxyHierarchyIndexTestsChildrenByParent correctness: roots under sentinel, deep nesting, self-parented objects, duplicate ids.
  • GalaxyBrowseProjectorTests
    • Direct-children selection by gobject_id, tag_name, contained_path.
    • Filter parity with DiscoverHierarchy: same fixtures, same filter permutations, identical filtered-set output.
    • child_has_children correctness under filters that eliminate descendants.
    • Ordering matches DashboardBrowseTreeBuilder byte-for-byte.
    • Sibling pagination across multiple pages.
    • Page-token round trip (serialize → deserialize → same offset).
    • Stale page_tokenInvalidArgument.
    • Unknown parent → NotFound.
    • Filter change between pages of the same token → InvalidArgument.
  • GalaxyRepositoryGrpcServiceTests — new BrowseChildren happy path, first-load wait, browse_subtrees intersection, metadata:read scope enforcement.
  • DashboardBrowseServiceTests — in-process service matches projector output, deploy-sequence invalidation, error states.
  • ProtobufContractRoundTripTests — extended for BrowseChildrenRequest / BrowseChildrenReply.

Manual verification on a live MXAccess host: expand a deep Area in the dashboard, force a redeploy mid-expand, observe invalidation banner.

No new opt-in live-only tests — the projector is pure over the cache, and the cache itself is already exercised by existing live tests.

Documentation updates (same change)

Per the repo rule that doc updates ride with the code change:

  • docs/GalaxyRepository.md — add BrowseChildren to the RPC Surface table; new subsection covering semantics, filter parity, page-token contract, and error mapping. Update the architecture diagram to show the shared projector.
  • gateway.md — one-line entry for BrowseChildren in the Galaxy RPC table.
  • clients/{dotnet,python,rust,go,java}/README.md — short "Browsing lazily" snippet showing a BrowseChildren walk with child_has_children driving expand-triangle UI.
  • docs/DesignDecisions.md — entry noting the explicit choice to keep the full hierarchy in the server cache; lazy is wire-only.

Non-goals

  • SQL-side incremental loading. The gateway still pulls the full Galaxy hierarchy on each deploy change.
  • Multi-snapshot retention. A deploy invalidates outstanding page tokens; clients re-walk.
  • Streaming search. Search keeps the bulk DiscoverHierarchy + tree-build path.
  • Reconciling a partially-expanded dashboard tree across a deploy — the invalidation banner forces a re-expand.