docs(stillpending): reflect session-resilience merge (multi-subscriber resolved, reconnect core) + disable-login flag

This commit is contained in:
Joseph Doherty
2026-06-16 15:53:11 -04:00
parent 121ab7e263
commit 82755a3623
+9 -7
View File
@@ -3,6 +3,8 @@
**Generated:** 2026-06-15 · **Commit:** `c7f754c` (main) · **Method:** six parallel read-only audits (Server, Worker, Contracts/proto, all five clients, docs/design/plans, tests + review backlog). Every item cites a verified `file:line`.
> **Resolution update (2026-06-15, branch `feat/stillpending-completion`):** The actionable items were implemented and verified per `docs/plans/2026-06-15-stillpending-completion.md`. **§1.1** (all 11 worker command kinds), **§1.2** (audit CorrelationId), and the **§4** client CLI/helper parity gaps are **Resolved** — see per-item annotations below. Worker COM commands are live-verified on the dev rig (`efd9971`, `f7ada90`). Remaining open items are the documented residuals (**§1.3**, **§1.4**, the **§3** vendor/capture-gated questions incl. the new **§3.2** multi-sample buffered residual) and the deliberate v1 scope of **§2**. Zero `.proto` changes were needed (all reply messages already existed).
>
> **Resolution update (2026-06-16, `main`):** The **session-resilience epic** (`docs/plans/2026-06-15-session-resilience.md`) landed its first **12 of 28 tasks** on `main` (through merge `c446bef`), which moves several **§2** items off "deferred": **multi-subscriber fan-out is now Resolved**, and **reconnectable sessions** has its server-side core (detach-grace window + replay-on-reconnect + a new `ReplayGap` signal on `MxEvent`). Still pending in that epic: reconnect Tasks 1315 (owner re-validation, client `ReplayGap` handling, integration test), **Phase 4** per-session dashboard ACL (Tasks 1619 — the §2/§7.6/§8 EventsHub-ACL item), and **Phase 5** orphan-worker reattach (Tasks 2028). Separately, a `MxGateway:Dashboard:DisableLogin` dev flag shipped (`ca443b1`) — auto-authenticates the dashboard as a multi-role admin; default off, **enabled on the 10.100.0.48 deployment**. Per-item §2 status annotated below; remaining epic tasks tracked in `oldtasks.md`.
## How to read this
@@ -49,9 +51,9 @@ Items are graded by what they actually are, because most "pending" surface in th
These are documented, deliberate, and mostly enforced. Listed so the deferred surface is in one place — **none are bugs.** Canonical register: `docs/DesignDecisions.md:466-474` ("Later Revisit Items") + `gateway.md` "Post-v1 revisit items".
- 🔵 **Reconnectable sessions** — not in v1. `docs/DesignDecisions.md:63-73`, `gateway.md:1087,1101`.
- 🔵 **Multi-event-subscriber fan-out***plumbed but blocked.* The option flows all the way to `Sessions/GatewaySession.cs:387-408 AttachEventSubscriber(allowMultipleSubscribers)`, but `Configuration/GatewayOptionsValidator.cs:181-185` hard-rejects the only enabling value: *"AllowMultipleEventSubscribers is not supported until event fan-out is implemented."* So the fan-out code path never runs. `docs/DesignDecisions.md:75-80`.
- 🔵 **Gateway restart does not reattach orphan workers** — terminates them on startup. `docs/DesignDecisions.md:65-69`, `CLAUDE.md`.
- 🟡 **Reconnectable sessions — server-side core landed, not yet complete (2026-06-16, `main`).** Epic Phase 3 added the detach-grace retention window, replay-on-reconnect from the bounded replay ring, and the `ReplayGap` signal on `MxEvent` (Tasks 1012, merged). Still pending: owner re-validation on reconnect (Task 13), client `ReplayGap` handling across all five clients (Task 14), and the fake-worker reconnect integration test (Task 15) — so reconnect is **not** safe to rely on end-to-end yet. Overturns `docs/DesignDecisions.md:63-73`. See `docs/plans/2026-06-15-session-resilience.md` Phase 3.
- **Multi-event-subscriber fan-out — RESOLVED (2026-06-16, `main`).** Epic Phase 2 (Tasks 79) removed the `GatewayOptionsValidator` rejection, added a `MaxEventSubscribersPerSession` cap (default 8), and built fan-out on the new `SessionEventDistributor` (single pump → N bounded per-subscriber channels with per-subscriber backpressure), with FakeWorkerHarness end-to-end coverage. Was: validator-blocked. `docs/plans/2026-06-15-session-resilience.md` Phase 2.
- 🟡 **Gateway restart does not reattach orphan workers — still true on `main`, planned (epic Phase 5, not started).** Workers are still terminated on startup. Epic Phase 5 (Tasks 2028) is designed to reverse this — stable pipe naming, a SQLite adoption manifest, a worker adopt/reconnect proto frame, and nonce-validated re-adoption behind an `EnableOrphanReattach` flag (default off). Reverses the hard rule in `docs/DesignDecisions.md:65-69` and `CLAUDE.md`. `docs/plans/2026-06-15-session-resilience.md` Phase 5.
- 🔵 **Workers run as the gateway service identity** — restricted service account is a reserved extension point. `docs/DesignDecisions.md:179-184`.
- 🔵 **Fail-fast event backpressure, no coalescing** — opt-in coalescing is post-v1. `docs/DesignDecisions.md:187-203`.
- 🔵 **No public command batching**`docs/DesignDecisions.md:206-212`.
@@ -60,7 +62,7 @@ These are documented, deliberate, and mostly enforced. Listed so the deferred su
- 🔵 **Lazy browse is wire-only** — no lazy SQL / cache loading. `docs/DesignDecisions.md:365-376`, `docs/plans/2026-05-28-lazy-browse-design.md:30`.
- 🔵 **No server-side / streaming browse search**`docs/plans/2026-05-28-lazy-browse-design.md:208`.
- 🔵 **Alarm command surface is ack + query only** — no Clear/Disable/Enable/Silence/Shelve/Inhibit; matches the MXAccess alarm-client set. `Worker/MxAccess/AlarmCommandHandler.cs`, shelve/suppress out of scope per `docs/AlarmClientDiscovery.md:60-66`.
- 🔵 **Dashboard EventsHub has no per-session ACL** — any authenticated dashboard user may subscribe to any session group. `Dashboard/Hubs/EventsHub.cs:36-50` (`TODO(per-session-acl)`); only relevant once a per-session role model exists.
- 🟡 **Dashboard EventsHub has no per-session ACL — still true on `main`, planned (epic Phase 4, not started).** Any authenticated dashboard user may still subscribe to any session group (`Dashboard/Hubs/EventsHub.cs` `TODO(per-session-acl)`). The enabling foundation (session `OwnerKeyId`) already merged in epic Phase 1; epic Phase 4 (Tasks 1619) adds the gRPC session-owner gate, a session tag + group-to-tag config, and EventsHub per-session ACL with a hub-token tag claim. `docs/plans/2026-06-15-session-resilience.md` Phase 4. (See also §8.)
---
@@ -121,7 +123,7 @@ No placeholder/empty/`Assert.True(true)` tests were found anywhere.
## 6. Config-gated functional gaps (work only after configuration)
- 🟠 **6.1 Alarm ack in subtag mode requires `AckComment` subtag configured** — empty by default; ack fails in subtag mode until set. Names must be validated against live MXAccess, not guessed. `docs/DesignDecisions.md:454-458`. (`AckCommentSubtag` is write-only; `Worker/MxAccess/SubtagAlarmStateMachine.cs:21`.)
- 🔵 **6.2 Multi-subscriber** — see 2 (option exists, validator-blocked).
- **6.2 Multi-subscriber — RESOLVED** (validator block removed; fan-out implemented — see §2).
---
@@ -142,7 +144,7 @@ No placeholder/empty/`Assert.True(true)` tests were found anywhere.
## 8. Deferred test-coverage follow-ups (noted in resolutions, never filed as findings)
- **Java CLI bulk-subcommand coverage** — 6 of 13 non-trivial subcommands untested: `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk` (plus `stream-events`, the four `galaxy-*`, `close-session`). `code-reviews/Client.Java/findings.md:495` (Client.Java-026).
- **Per-session-ACL TODO** at `Server/Dashboard/Hubs/EventsHub.cs` (`code-reviews/Server/findings.md:765`).
- **Per-session-ACL TODO** at `Server/Dashboard/Hubs/EventsHub.cs` (`code-reviews/Server/findings.md:765`) — now scheduled as session-resilience epic Phase 4 (Tasks 1619); not yet started.
- **Worker-Ready retry race** noted at `code-reviews/Server/findings.md:611`.
- **Duplicated `FakeWorkerProcess` harness** flagged as a latent regression vector — `code-reviews/Tests/findings.md:463`.
@@ -157,6 +159,6 @@ No placeholder/empty/`Assert.True(true)` tests were found anywhere.
- **§1.4 / §3.4 / §3.5** — the AVEVA 8-arg `AlarmAckByName` is a vendor stub (55) and `AlarmAckByGUID` is `E_NOTIMPL`; the `domain`/`full_name` fields stay forward-compat-only until AVEVA implements them.
- **§3.2** — buffered commands work and the empty bootstrap converts cleanly live, but a multi-sample buffered batch is undrivable on the rig (unit-tested only).
- **§3.1 / §3.3 / §3.6 / §3.7** — await live MXAccess captures.
- **§2** — deliberate v1 scope. **§5** — opt-in verification gates. **§7.6** — accepted `Won't Fix` review findings.
- **§2** — mostly deliberate v1 scope, but the session-resilience epic (12/28 tasks merged to `main`) has since **resolved multi-subscriber fan-out** and landed the **reconnect server-side core**; reconnect Tasks 1315, per-session ACL (Phase 4), and orphan-worker reattach (Phase 5) remain (see `docs/plans/2026-06-15-session-resilience.md`, `oldtasks.md`). **§5** — opt-in verification gates. **§7.6** — accepted `Won't Fix` review findings.
MXAccess **event/data/value/write** mapping, the **Galaxy** RPC surface, and now the **full command surface** are complete; no `NotImplementedException`s, stubbed RPC bodies, or empty tests remain in the production paths.