Fix runtime review findings

This commit is contained in:
Joseph Doherty
2026-04-29 10:39:49 -04:00
parent 133c83029b
commit d543679044
69 changed files with 2233 additions and 409 deletions
+34 -13
View File
@@ -32,13 +32,14 @@ The service is defined in
|-----|---------|
| `TestConnection` | Connectivity probe. Returns `{ ok: bool }` after a `SELECT 1`. Does not throw on SQL failure — returns `ok = false`. Always hits SQL directly so it remains a true health check. |
| `GetLastDeployTime` | Returns the cached `galaxy.time_of_last_deploy`. Served from the shared hierarchy cache; refreshed in the background. |
| `DiscoverHierarchy` | Returns the full deployed hierarchy plus every object's dynamic attributes. **Served from cache** — see [Hierarchy Cache](#hierarchy-cache). |
| `DiscoverHierarchy` | Returns one page of the deployed hierarchy plus each returned object's dynamic attributes. **Served from cache** — see [Hierarchy Cache](#hierarchy-cache). |
| `WatchDeployEvents` | **Server-streaming.** The server emits the current state immediately on subscribe (so clients can bootstrap without waiting), then emits one event per detected deploy change. See [Deploy Notifications](#deploy-notifications). |
`DiscoverHierarchy` is intentionally a single unary RPC rather than a stream:
the row set is small (thousands of objects, low tens-of-thousands of
attributes for typical Galaxies) and clients almost always want the whole tree
at once.
`DiscoverHierarchy` is a paged unary RPC. The raw request accepts `page_size`
and `page_token`; the server defaults omitted page size to 1000 objects and
caps every page at 5000 objects. Invalid page tokens and negative page sizes
return `InvalidArgument`. Official high-level clients preserve the older
"return the full hierarchy" behavior by looping pages internally.
## Hierarchy Cache
@@ -56,12 +57,14 @@ Refresh strategy is **deploy-time gated**:
3. If the deploy timestamp is unchanged, the heavy hierarchy + attributes
queries are **skipped**. The cache simply marks `LastSuccessAt`.
4. If the deploy timestamp changed (or no data has loaded yet), the cache
pulls hierarchy + attributes, materializes a `DiscoverHierarchyReply`
once, replaces the entry atomically, and publishes a deploy event.
pulls hierarchy + attributes, materializes a Galaxy object list plus a
dashboard summary once, replaces the entry atomically, and publishes a
deploy event.
Materializing the reply at refresh time means subsequent `DiscoverHierarchy`
calls return a pre-built proto message — no per-request projection, no
per-request allocations beyond the gRPC serializer's frame.
Materializing objects and dashboard summaries at refresh time means subsequent
`DiscoverHierarchy` calls page over an immutable object list. The dashboard
uses the precomputed summary and does not rescan raw SQL rowsets on each
snapshot.
When SQL is unreachable, the cache retains the previous data and flips
`Status` to `Stale` (or `Unavailable` if no data was ever loaded). A
@@ -139,6 +142,17 @@ message GalaxyAttribute {
bool is_historized = 10;
bool is_alarm = 11;
}
message DiscoverHierarchyRequest {
int32 page_size = 1; // omitted/0 uses the server default of 1000
string page_token = 2; // opaque offset token returned by the previous page
}
message DiscoverHierarchyReply {
repeated GalaxyObject objects = 1;
string next_page_token = 2;
int32 total_object_count = 3;
}
```
### Contained Name vs Tag Name
@@ -176,7 +190,8 @@ GalaxyHierarchyRefreshService (BackgroundService)
-> GalaxyRepository.GetLastDeployTimeAsync (cheap, every tick)
-> GalaxyRepository.GetHierarchyAsync (only on deploy change)
-> GalaxyRepository.GetAttributesAsync (only on deploy change)
-> GalaxyProtoMapper.MapObject (materialize DiscoverHierarchyReply once)
-> GalaxyProtoMapper.MapObject (materialize GalaxyObject list once)
-> DashboardGalaxySummary (precompute dashboard counts once)
-> IGalaxyDeployNotifier.Publish (only on deploy change)
```
@@ -189,8 +204,9 @@ Component breakdown:
recursive CTEs and pick the most-derived attribute override per object.
- `GalaxyHierarchyCache`
(`src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`) holds the most
recent immutable `GalaxyHierarchyCacheEntry` (rows + materialized proto
reply + counts + status). All gRPC clients share the same entry.
recent immutable `GalaxyHierarchyCacheEntry` (materialized objects +
precomputed dashboard summary + counts + status). All gRPC clients share the
same entry.
- `GalaxyHierarchyRefreshService`
(`src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs`) is a
hosted `BackgroundService` that drives `RefreshAsync` on the configured
@@ -220,6 +236,11 @@ Security`), but production deployments that use SQL authentication should set
the override via environment variable rather than committing credentials to
`appsettings.json`.
The dashboard parses this connection string and displays only non-secret
fields: server, database, integrated security, encrypt, and trust-server-
certificate. It never displays user id, password, access token, or arbitrary
unparsed connection string text.
## Authorization
All four Galaxy RPCs (including `WatchDeployEvents`) require the
+7 -1
View File
@@ -35,6 +35,8 @@ paths, timeouts, queue sizes, enum values, or protocol values are invalid.
"DefaultCommandTimeoutSeconds": 30,
"MaxSessions": 64,
"MaxPendingCommandsPerSession": 128,
"DefaultLeaseSeconds": 1800,
"LeaseSweepIntervalSeconds": 30,
"AllowMultipleEventSubscribers": false
},
"Events": {
@@ -52,7 +54,8 @@ paths, timeouts, queue sizes, enum values, or protocol values are invalid.
"ShowTagValues": false
},
"Protocol": {
"WorkerProtocolVersion": 1
"WorkerProtocolVersion": 1,
"MaxGrpcMessageBytes": 16777216
},
"Galaxy": {
"ConnectionString": "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;",
@@ -107,6 +110,8 @@ to avoid accidental large allocations from malformed or oversized frames.
| `MxGateway:Sessions:DefaultCommandTimeoutSeconds` | `30` | Default timeout used while the gateway waits for a worker command reply when an open-session request does not provide a positive command timeout. |
| `MxGateway:Sessions:MaxSessions` | `64` | Maximum number of concurrently open gateway sessions. Session opens reserve a slot atomically before worker creation. |
| `MxGateway:Sessions:MaxPendingCommandsPerSession` | `128` | Maximum number of pending worker commands for one session. Excess commands fail fast instead of queueing indefinitely. |
| `MxGateway:Sessions:DefaultLeaseSeconds` | `1800` | Initial session lease and refresh duration. Unary client activity extends the lease by this duration. |
| `MxGateway:Sessions:LeaseSweepIntervalSeconds` | `30` | Hosted monitor interval for closing expired leases. Active event-stream subscribers keep a session from expiring while the stream remains attached. |
| `MxGateway:Sessions:AllowMultipleEventSubscribers` | `false` | Controls whether multiple `StreamEvents` subscribers may attach to one session. `true` is rejected until event fan-out is implemented. |
All numeric session options must be greater than zero. The current event stream
@@ -146,6 +151,7 @@ and `RecentSessionLimit` must be greater than or equal to zero.
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Protocol:WorkerProtocolVersion` | `1` | Worker IPC protocol version expected by the gateway and worker. This must match `GatewayContractInfo.WorkerProtocolVersion`. |
| `MxGateway:Protocol:MaxGrpcMessageBytes` | `16777216` | Public gRPC max send and receive message size in bytes. The same default is used by official clients. The validator allows values from `1024` through `268435456`. |
The protocol option is exposed for diagnostics and explicit deployment
configuration, not for compatibility negotiation. A mismatch fails validation
+5
View File
@@ -31,6 +31,11 @@ A second gRPC service, `GalaxyRepositoryGrpcService`, is mapped alongside it. It
`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract.
Public gRPC send and receive message sizes are configured from
`MxGateway:Protocol:MaxGrpcMessageBytes` (default 16 MiB). Official clients use
the same default so paged Galaxy browse replies and larger MXAccess payloads
fail consistently instead of depending on language-specific gRPC defaults.
### `OpenSession`
`OpenSession` validates the request, asks `ISessionManager` to open a session under the caller's identity, and returns a reply that advertises both protocol versions and the capabilities the gateway supports. Capability strings are static because the gateway has a fixed feature set per build; clients use them as a forward-compatibility hint rather than runtime negotiation.
+2 -2
View File
@@ -178,9 +178,9 @@ The order — fault, deregister, dispose, release slot, record metric, log, reth
While `Ready`, callers reach the worker through `SessionManager.InvokeAsync` or `ReadEventsAsync`. Both delegate to `GatewaySession`, which checks the state under lock and updates `LastClientActivityAt` on every invocation. `GatewaySession` also exposes typed bulk helpers (`AddItemBulkAsync`, `SubscribeBulkAsync`, etc.) that wrap `WorkerCommand` round-trips and translate non-`Ok` `ProtocolStatus` replies into `SessionManagerException` with `SessionNotReady`.
Event streaming uses `AttachEventSubscriber` which returns a disposable lease. When `allowMultipleSubscribers` is false the second attach throws `EventSubscriberAlreadyActive`; this prevents two gRPC streams from racing on the same worker event channel.
Event streaming uses `AttachEventSubscriber` which returns a disposable lease. When `allowMultipleSubscribers` is false the second attach throws `EventSubscriberAlreadyActive`; this prevents two gRPC streams from racing on the same worker event channel. Active event subscribers keep the session lease from expiring until the stream is disposed.
`ExtendLease` and `IsLeaseExpired` cooperate with `SessionManager.CloseExpiredLeasesAsync`, which iterates a registry snapshot and closes any session whose lease has expired with `LeaseExpiredReason`.
Sessions open with `MxGateway:Sessions:DefaultLeaseSeconds` (default 1800) added to the open timestamp. Unary client activity refreshes the lease by the same duration. `ExtendLease` and `IsLeaseExpired` cooperate with `SessionManager.CloseExpiredLeasesAsync`, which iterates a registry snapshot and closes any session whose lease has expired with `LeaseExpiredReason`. `SessionLeaseMonitorHostedService` runs that sweep every `MxGateway:Sessions:LeaseSweepIntervalSeconds` seconds (default 30).
### Close