fix: resolve code-review findings (locally verified)
Server-054/055/056, Contracts-020/021/022, Tests-036/038/039, IntegrationTests-030/031/032 (+033 deferred to live rig), Client.Dotnet-026/028/029 (+027 won't-fix), Client.Go-030..034, Client.Python-032..036, Client.Rust-033..038. Key fix: SessionEventDistributor orphaned a subscriber that registered after the pump completed but before disposal (Server-056) -> register paths now complete late registrants under _lifecycleLock; regression test added. The racy dashboard-mirror gRPC test made deterministic (Tests-039). Verified green locally: gateway Tests targeted classes (GatewaySession, SessionEventDistributor, GatewayOptionsValidator, ProtobufContractRoundTrip, GatewaySessionDashboardMirror) + dotnet/go/python/rust client suites.
This commit is contained in:
+60
-21
@@ -62,37 +62,67 @@ Implementation guidance:
|
||||
|
||||
## Session Reconnect
|
||||
|
||||
Decision: no reconnectable sessions for v1.
|
||||
Reconnectable sessions with event replay are shipped and config-gated. The
|
||||
original "no reconnectable sessions" constraint is superseded.
|
||||
|
||||
One `OpenSession` creates one gateway session and one worker process. The
|
||||
session ends on `CloseSession`, client disconnect policy, lease expiry, worker
|
||||
fault, or gateway shutdown.
|
||||
fault, gateway shutdown, or — when `DetachGraceSeconds > 0` — detach-grace
|
||||
expiry after the last external event subscriber drops.
|
||||
|
||||
Rationale: reconnectable sessions require event replay, orphan ownership,
|
||||
security checks, and more complicated worker lifetime rules. They are not needed
|
||||
for the first parity slice.
|
||||
`MxGateway:Sessions:DetachGraceSeconds` (default `30`) controls the retention
|
||||
window. When positive, a session whose last external gRPC event-stream
|
||||
subscriber drops stays `Ready` for that many seconds so a client can reconnect
|
||||
to the same session instead of triggering a new `OpenSession` → worker spawn.
|
||||
Setting it to `0` reverts to closing only on normal lease expiry.
|
||||
|
||||
A reconnecting client issues `StreamEvents` with `after_worker_sequence` set to
|
||||
the last sequence it observed; the gateway replays retained events newer than
|
||||
that watermark (capped by `MxGateway:Events:ReplayBufferCapacity` and
|
||||
`MxGateway:Events:ReplayRetentionSeconds`) then transitions seamlessly to live
|
||||
delivery. If the requested position precedes the oldest retained event, a
|
||||
`ReplayGap` sentinel signals the client to re-snapshot. The replay→live handoff
|
||||
is atomic (no gap, no duplicate). See [Sessions](./Sessions.md) for the full
|
||||
reconnect and replay protocol.
|
||||
|
||||
## Event Subscribers
|
||||
|
||||
Decision: one active `StreamEvents` subscriber per session for v1.
|
||||
Multi-subscriber fan-out for data-side `StreamEvents` is shipped and
|
||||
config-gated. The original "one active subscriber per session" constraint is
|
||||
superseded for deployments that opt in.
|
||||
|
||||
A second subscriber should be rejected with a clear session error. Multi-client
|
||||
fan-out may be added later with explicit backpressure semantics.
|
||||
`MxGateway:Sessions:AllowMultipleEventSubscribers` (default `false`) controls
|
||||
the mode. When `false` the session still rejects a second `StreamEvents`
|
||||
subscriber with `EventSubscriberAlreadyActive`, preserving the original
|
||||
single-subscriber behavior. When `true`, up to
|
||||
`MxGateway:Sessions:MaxEventSubscribersPerSession` (default `8`) concurrent
|
||||
external subscribers may attach; a new attach that would exceed the cap is
|
||||
rejected with `EventSubscriberLimitReached`. The count-check-and-increment is
|
||||
atomic under the session lock.
|
||||
|
||||
Rationale: one subscriber preserves simple event ordering and failure behavior
|
||||
while parity is being proven.
|
||||
Failure semantics differ by mode: in single-subscriber mode a slow consumer's
|
||||
channel overflow faults the whole session (`FailFast` backpressure); in
|
||||
multi-subscriber mode the same condition disconnects only that subscriber so one
|
||||
slow consumer never faults a session shared by others. The mode is fixed at
|
||||
session construction and is not changed by a live subscriber-count snapshot.
|
||||
|
||||
### Alarms — superseded for the alarm subsystem
|
||||
The gateway-owned internal dashboard mirror subscribes directly on the
|
||||
distributor with `isInternal: true` and is not counted toward the cap or the
|
||||
detach-grace subscriber-count in either mode.
|
||||
|
||||
The single-subscriber rule above no longer applies to alarms. The gateway runs
|
||||
an always-on central alarm monitor (`GatewayAlarmMonitor`) that owns one
|
||||
See [Sessions](./Sessions.md) for the full event-distributor and backpressure
|
||||
design.
|
||||
|
||||
### Alarms — separate fan-out architecture
|
||||
|
||||
The single-subscriber rule never applied to alarms. The gateway runs an
|
||||
always-on central alarm monitor (`GatewayAlarmMonitor`) that owns one
|
||||
gateway-managed worker session, caches the active-alarm set, and fans it out to
|
||||
any number of clients through the session-less `StreamAlarms` RPC. Per-session
|
||||
alarm auto-subscribe is removed; `AcknowledgeAlarm` is session-less and routes
|
||||
through the monitor. Data-side `StreamEvents` remains one subscriber per
|
||||
session. Rationale: alarm state is gateway-wide, not session-scoped — every
|
||||
client wants the same current set plus updates, and forcing each to own a
|
||||
worker would multiply AVEVA polling load for no benefit.
|
||||
any number of clients through the session-less `StreamAlarms` RPC.
|
||||
`AcknowledgeAlarm` is session-less and routes through the monitor. Rationale:
|
||||
alarm state is gateway-wide, not session-scoped — every client wants the same
|
||||
current set plus updates, and forcing each to own a worker would multiply AVEVA
|
||||
polling load for no benefit.
|
||||
|
||||
## Authentication
|
||||
|
||||
@@ -467,12 +497,21 @@ against the live MXAccess attribute set.
|
||||
|
||||
These are explicit post-v1 revisit items, not open blockers:
|
||||
|
||||
- reconnectable sessions,
|
||||
- multiple event subscribers per session,
|
||||
- restricted worker service account,
|
||||
- production coalescing by item handle,
|
||||
- command batching for high-volume tag setup.
|
||||
|
||||
The following items were previously listed here and have since shipped:
|
||||
|
||||
- **Reconnectable sessions with replay** — shipped, config-gated via
|
||||
`MxGateway:Sessions:DetachGraceSeconds` and
|
||||
`MxGateway:Events:ReplayBufferCapacity` / `ReplayRetentionSeconds`.
|
||||
See [Session Reconnect](#session-reconnect) above and [Sessions](./Sessions.md).
|
||||
- **Multiple event subscribers per session** — shipped, config-gated via
|
||||
`MxGateway:Sessions:AllowMultipleEventSubscribers` and
|
||||
`MxGateway:Sessions:MaxEventSubscribersPerSession`.
|
||||
See [Event Subscribers](#event-subscribers) above and [Sessions](./Sessions.md).
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
|
||||
|
||||
Reference in New Issue
Block a user