feat(sessions): per-subscriber backpressure isolation in SessionEventDistributor

This commit is contained in:
Joseph Doherty
2026-06-15 13:39:25 -04:00
parent 61627fc5b0
commit 039111ca05
9 changed files with 308 additions and 66 deletions
+11 -5
View File
@@ -136,14 +136,20 @@ ordering and avoids competing consumers.
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Events:QueueCapacity` | `10000` | Capacity for bounded per-session event queues used by the gateway worker event channel and the public gRPC event stream queue. |
| `MxGateway:Events:BackpressurePolicy` | `FailFast` | Event backpressure behavior. `FailFast` faults the session on public stream queue overflow. `DisconnectSubscriber` disconnects only the slow stream. |
| `MxGateway:Events:BackpressurePolicy` | `FailFast` | Per-subscriber event backpressure behavior when a subscriber's bounded event channel overflows. Overflow is isolated to the offending subscriber: it is always disconnected with an `EventQueueOverflow` fault while the session pump and other subscribers keep running. `FailFast` additionally faults the whole session only in the legacy single-subscriber case (the current default mode); with multiple subscribers it degrades to a per-subscriber disconnect so one slow consumer never faults a shared session. `DisconnectSubscriber` disconnects only the slow subscriber in all cases. |
| `MxGateway:Events:ReplayBufferCapacity` | `1024` | Maximum number of events retained per session in the replay ring buffer, used to re-deliver events a returning subscriber missed (reconnect/reattach). The oldest retained event is evicted once this count is exceeded. `0` disables replay retention. |
| `MxGateway:Events:ReplayRetentionSeconds` | `300` | Maximum age, in seconds, of an event retained in the replay ring buffer. Entries older than this are evicted regardless of capacity. `0` disables age-based eviction. |
`QueueCapacity` must be greater than zero. With `FailFast`, queue overflow
faults the affected worker or session instead of silently dropping MXAccess
events. With `DisconnectSubscriber`, public gRPC stream overflow terminates only
the affected stream while the MXAccess session remains active.
`QueueCapacity` must be greater than zero; it bounds each per-subscriber event
channel fed by the session's single event pump. A slow subscriber overflows only
its own channel and is always disconnected with an `EventQueueOverflow` fault
rather than silently dropping MXAccess events — the pump, the session, and other
subscribers are unaffected. With `FailFast` in the single-subscriber case (the
default mode), that overflow additionally faults the whole session; with multiple
subscribers `FailFast` degrades to a per-subscriber disconnect, matching
`DisconnectSubscriber`, so one slow consumer cannot fault a session shared by
healthy subscribers. With `DisconnectSubscriber`, overflow terminates only the
affected stream while the MXAccess session remains active.
`ReplayBufferCapacity` and `ReplayRetentionSeconds` must each be greater than or
equal to zero (either dimension can be disabled with `0`). A returning subscriber