merge: T87 phase 3.5 docs sweep — prune shipped backlog, capture phase 3.6 residuals

2026-04-26 22:46:26 -04:00
parent 097073ede5 3be8ed8915
commit 74bb42397d
1 changed files with 34 additions and 44 deletions
@@ -204,15 +204,7 @@ Phase 2.5 cleanup shipped end-to-end across 8 tasks (T68–T75). Two CLAUDE.md b

 ### Phase 2.6 / 3 backlog

-New follow-ups discovered during Phase 2.5 execution. None are blocking; pick up at any time.
-
- **Frontend handler for `turn_html_replace` SSE event (from T73.1 review)**: regenerate's backend broadcast lands, but no live tab swaps the regenerated turn until a JS handler is wired. The existing `turn_html` event uses HTMX `sse-swap` to append; `turn_html_replace` ships JSON with `supersedes_id` for replacement semantics. Phase 2.6 should wire the JS to swap the prior turn's DOM node in place.
- **Cancel/stop hook for in-flight regenerate streams (from T73 review)**: `post_turn` registers stream tasks in `_in_flight_tasks` so the user can stop them. Regenerate doesn't. A user clicking "Stop" mid-regenerate has no cancel hook today.
- **DRY: regenerate vs post_turn (from T73 review)**: recent-dialogue assembly and prior-edges block are duplicated between `chat/services/regenerate.py` and `chat/web/turns.py`. Extract to shared helpers analogous to `_gather_state_update_inputs`.
- **Sibling-discovery query optimization (from T73 review)**: `regenerate.py`'s sibling-assistant-turn lookup scans all non-superseded `assistant_turn` rows globally. Adding a `chat_id` predicate via JSON extraction (or a denormalized column) bounds the cost to per-chat scale.
- **`_witness_role_for` defensive coding (from T71 review)**: helper returns `"guest"` when `host_bot_id is None`, which is wrong for Phase-1 chats. Defensive: `return "host" if host_bot_id is None or speaker_bot_id == host_bot_id else "guest"`. Not exercised by current tests; harden as a precaution.
- **Confidence type tightening (from T74 review)**: `chat/services/addressee.py::AddresseeDecision.confidence` could be typed as `Literal["high","medium","low"]` for stricter validation. Currently `str` with a comment.
- **Scene-close-on-cancel UX revisit**: T74.3 pinned the existing behavior (close fires even on cancel). If real play-testing surfaces a regression, revisit.
+All items shipped — see Phase 3.5 status below.

 ## Phase 3 status

@@ -249,51 +241,49 @@ Phase 3 shipped end-to-end across 19 tasks (T49–T67). Events with full lifecyc

 ### Phase 3.5 / 4 backlog

-New follow-ups discovered during Phase 3 reviews and execution. None are blocking; pick up at any time.
+All items shipped — see Phase 3.5 status below.

-#### From T53 review
+## Phase 3.5 status

- **`narrate_skip` `timeout_s` not piped through to `client.generate`**: parameter accepted but ignored. Fix: pass `timeout_s=timeout_s` to `client.generate(**...)`, or drop the parameter entirely if Featherless's client doesn't honor it.
+Phase 3.5 cleanup shipped end-to-end across 12 tasks (T76–T87). Two CLAUDE.md backlogs (Phase 2.6/3, Phase 3.5/4) are now empty; deferred follow-ups discovered during execution are tracked in a new "Phase 3.6 / 4 backlog" section below. Test count grew from 315 (Phase 3) to 343 (+28 new tests).

-#### From T57 review
+- **Wave 1 — trivial polish (parallel)**:
+  - **T76** `narrate_skip` `timeout_s` plumbed through to `client.generate`.
+  - **T77** `AddresseeDecision.confidence` typed as `Literal["high","medium","low"]`.
+  - **T78** `search_memories` docstring notes SQL-side significance bias (`SIGNIFICANCE_RANK_BIAS`).
+  - **T79** `_witness_role_for` defensive `host_bot_id is None` handling (returns `"host"` for Phase-1 chats).
+- **Wave 2 — scene_summarize polish (single)**:
+  - **T80** five T58 follow-ups: re-close suffix bloat guard, transcript scoping by scene, swallowed-exception logging in `detect_threads`, chat-clock `closed_at`, and three new tests covering T58 gaps (200-char truncation, `thread_updated`/`thread_closed` candidate paths, try/except fallback).
+- **Wave 3 — typed exception (single)**:
+  - **T81** `ChatNotFoundError` replaces string-prefix sniff in skip routes; mapped to 404 (vs 400 for other `ValueError` cases).
+- **Wave 4 — turn-flow wiring (single)**:
+  - **T82** `consume_pending_meanwhile_digests` wired into `post_turn` (closes T66 gap; meanwhile digests no longer pile up); natural-language skip dispatch now runs scene close detection first.
+- **Wave 5 — regenerate polish (single)**:
+  - **T83** five sub-fixes — cancel/stop hook (regenerate registers stream task in `_in_flight_tasks`); DRY extraction of `read_recent_dialogue` and `gather_prior_edges` into `chat/services/turn_common.py`; chat-scoped sibling-assistant-turn lookup; lifecycle-rollback warning log on regenerate; ordering-symmetry comment between post_turn and regenerate event-detection paths.
+- **Wave 6 — final polish (parallel)**:
+  - **T84** unified `record_turn_memory` API with `you_present` kwarg; `record_meanwhile_memory` becomes a thin wrapper.
+  - **T85** JSON-build audit (no findings) + meanwhile cancel route-level test.
+  - **T86** frontend `turn_html_replace` SSE handler + turn_id stamping on rendered HTML so the in-place swap actually works.

- **`search_memories` docstring should mention SQL-side significance bias**: the function docstring still describes only the Python composite re-rank; add a one-line note about `SIGNIFICANCE_RANK_BIAS`.
+### Phase 3.6 / 4 backlog

-#### From T58 review
+New follow-ups discovered during Phase 3.5 reviews and execution. None are blocking; pick up at any time.

- **Scene close re-close suffix bloat risk**: `_build_key_quotes_suffix` reads from `memories.pov_summary`. If a scene close runs twice, the second pass would read the rewritten text plus the previous "Key quotes:" suffix and append a second one. Either guard for double-suffix or source quotes from `event_log` `assistant_turn`/`user_turn` text instead.
- **Thread detection transcript scoping**: `_read_recent_dialogue` returns chat-wide history with no `scene_id` filter (Phase 1 turns lack one). Feeding chat-wide history to `detect_threads` will misattribute threads to the closing scene when the scene boundary falls inside the last 50 turns. Scope by `scene_id` once turns carry it, or by `started_at` against scene-open timestamp.
- **Swallowed exceptions in `detect_threads` try/except**: bare `Exception` swallows programmer errors silently. Log at debug level so silent regressions are recoverable.
- **Scene close `closed_at` clock divergence**: T58 uses `datetime.now(timezone.utc).isoformat()` instead of chat-clock time. Diverges from chat-clock semantics elsewhere; revisit if event reconstructions need chat-clock ordering.
- **Test coverage gaps in T58**: no test for 200-char quote truncation; no test for `thread_updated`/`thread_closed` candidate paths; no test for the `try/except` fallback.
+#### From T80 review

-#### From T61 review
+- **`read_recent_dialogue` chat-id pushdown**: helper filters `chat_id` post-fetch in Python. Could push the `json_extract(payload_json, '$.chat_id') = ?` predicate into SQL (matching T83.3's pattern) for tighter LIMIT semantics. Currently a chat-with-many-other-chats can have its 50-row LIMIT consumed by foreign rows.
+- **Lifecycle warning wording in regenerate**: T83.4's warning log lists ALL lifecycle event ids that exist after the original `assistant_turn` id, not just ones produced by the superseded turn. For the typical "regenerate the most recent" flow these are identical, but if a user regenerates an OLDER turn, the warning will list intervening-turn lifecycle events that legitimately stand. Tighten warning wording to "lifecycle transitions at-or-after turn X" (operator-friendly); a code-level fix would require a schema change to add explicit back-reference from lifecycle events to their producing turn.

- **Regenerate doesn't roll back lifecycle transitions from superseded turn**: `event_started`/`event_completed` rows from a superseded turn remain. Phase 3.5 should add a lifecycle-undo step. Caveat: regenerate-after-completion may double-emit promotion artifacts if the new text re-completes the same event.
- **Asymmetry in event-detection ordering**: post_turn runs lifecycle BETWEEN interjection and scene-close; regenerate runs lifecycle at the END. Benign because regenerate has no scene-close path, but worth tidying.
+#### From T84 review

-#### From T62 review
+- **`record_turn_memory` legacy single-bot function** still exists alongside the unified `record_turn_memory_for_present`. Could be consolidated in a follow-up.

- **Error-message prefix sniff for 404 vs 400 routing**: drawer skip routes use `str(exc).startswith("chat not found")` to distinguish 404 from 400. Fragile if error wording changes. Use a typed exception subclass.
- **Skip command bypasses scene close detection**: a user typing "fade out, skip an hour" would skip without closing the scene. Acceptable for Phase 3 but worth noting.
+#### From T86 fix-up

-#### From T63 review
+- **Test fixtures + `tests/test_phase3_integration.py`** that seed turns directly via `append_event`+`project` may need updating once any new test asserts the rendered HTML carries the new turn ids end-to-end. Existing tests pass because they don't read the stamped attribute, but they're brittle if the contract evolves.

- **`participants_json` JSON injection** (FIXED in T63 but worth noting in backlog as a "double-check other JSON-string-build sites" task): T63 originally used f-string interpolation; fixed to use `json.dumps`. Audit other state modules for similar patterns.
+#### Deferred items (carry-over)

-#### From T64 review
-
- **`record_meanwhile_memory` and `record_turn_memory_for_present` share private `_write_one_memory` helper**: minor DRY note; both helpers are similar enough that a unified API with a `you_present: bool` kwarg might be cleaner long-term.
- **Stop button cancellation for meanwhile turns**: T64 fix-up registered tasks in `_in_flight_tasks`; verify the `/turns/cancel` endpoint actually cancels meanwhile streams (the test pins registration but not the cancel-from-route path).
-
-#### From cross-feature interactions discovered in Wave 6b merge
-
- **Cross-feature canned-queue brittleness**: meanwhile-scene close test required a canned response for T65's digest call after T64+T65 merge. Future close-path additions will keep extending the queue; consider a structured fixture builder rather than positional canned arrays.
-
-#### From T66 integration tests
-
- **`consume_pending_meanwhile_digests` is defined but NOT wired into `post_turn`**: the helper lives in `chat/services/prompt.py` (T65) but `chat/web/turns.py` never calls it. Meanwhile digests stay pending forever in production. Phase 3.5 should call the helper after the first you-turn following a meanwhile close — probably right after the assistant_turn lands but before the next prompt assembly. Pinned by `tests/test_phase3_integration.py::test_meanwhile_close_digest_surfaces_then_consumed` which currently calls the helper directly.
-
-#### Discovered during Phase 3 execution
-
- **`_witness_role_for` defensive `host_bot_id is None`** (carry-over from Phase 2.5 T71 backlog) — still pending.
+- **Scene-close-on-cancel UX revisit** (Phase 2.5 carry-over): T74.3 pinned the existing behavior; revisit if real play-testing surfaces a regression.
+- **Cross-feature canned-queue brittleness**: meanwhile-scene close test required a canned response for T65's digest call after T64+T65 merge. Future close-path additions will keep extending the queue. Consider a structured fixture builder rather than positional canned arrays. NOT addressed in Phase 3.5.
+- **Lifecycle-transition rollback in regenerate**: T83.4 added a warning log; actual rollback (with proper schema linkage from lifecycle event back to producing turn) is Phase 4 work.