Phase 3.5 cleanup: 17-item backlog burndown #5

2026-04-26T22:47:09-04:00

dohertj2 commented

2026-04-26 22:47:09 -04:00

Summary

Phase 3.5 burns down the combined Phase 2.6/3 + Phase 3.5/4 backlog tracked in CLAUDE.md (17 items total). 12 tasks across 7 waves; both old backlog sections are now empty. New residuals discovered during execution are tracked in a fresh Phase 3.6 / 4 backlog section.

What shipped (12 tasks, 7 waves)

Wave 1 (parallel 4-way) — trivial polish:
- T76 narrate_skip timeout_s plumbed through to client.generate
- T77 AddresseeDecision.confidence as Literal type
- T78 search_memories docstring SQL-bias note
- T79 _witness_role_for defensive None handling
Wave 2 (single) — T80 scene_summarize.py polish bundle (5 sub-fixes for T58 follow-ups: re-close suffix bloat guard, transcript scoping by scene, swallowed-exception logging, chat-clock closed_at, test coverage gaps)
Wave 3 (single) — T81 ChatNotFoundError typed exception (replaces fragile string-prefix sniff in skip routes)
Wave 4 (single) — T82 turns.py wiring:
- T82.1 fixes a real bug: consume_pending_meanwhile_digests is now wired into post_turn (T66 surfaced this — meanwhile digests previously stayed pending forever)
- T82.2: natural-language skip dispatch now runs scene close detection first
Wave 5 (single) — T83 regenerate.py polish bundle: cancel/stop hook + DRY extraction (`chat/services/turn_common.py`) + chat-scoped sibling lookup + lifecycle-rollback warning log + ordering symmetry
Wave 6 (parallel 3-way):
- T84 unified record_turn_memory API with you_present kwarg
- T85 JSON-build audit (no findings — all writes already use json.dumps) + meanwhile cancel route-level test
- T86 closes Phase 1.5/2.5 carryover: frontend turn_html_replace SSE handler + turn_id stamping on rendered HTML so in-place swap actually works (regenerate live-tab swap was half-done since Phase 2.5)
Wave 7 (single) — T87 docs sweep

Architecture notes

No schema migrations. Schema baseline stays at version 11.
All changes file-disjoint within waves; merges were conflict-free.
Backward compatible: 315 of 343 tests are unchanged from Phase 3 baseline; 28 new tests pin the cleanup behaviors.

Test plan

Full pytest suite: 343 passing (315 Phase 3 + 28 Phase 3.5). 0 failures.
Each task verified in its own worktree before merge to phase-3.5.
Each task reviewed for spec compliance + code quality.
Manual smoke (recommend before merging this PR):
- Multi-tab regenerate: open two tabs on same chat; click Regenerate; verify the OTHER tab swaps the regenerated turn IN PLACE (not appended)
- Type "fade out, skip an hour" — verify both scene_closed + time_skip_elision events fire in correct order
- Trigger a meanwhile scene; close it; resume the main chat; verify the digest renders ONCE on the next you-turn AND is consumed automatically
- Trigger a regenerate after an event_completed turn — observe the warning log mentioning un-rolled-back lifecycle transitions

Phase 3.6 / 4 backlog (discovered during T76-T86)

Captured in CLAUDE.md:

read_recent_dialogue chat-id pushdown into SQL (T80 review nit)
Lifecycle warning wording in regenerate (T83.4 — operator-friendly tightening)
Legacy single-bot record_turn_memory consolidation (T84 review nit)
Test fixture event_id assertions (T86 fix-up)

Deferred:

Scene-close-on-cancel UX revisit (Phase 2.5 carry-over — pinned, no action)
Cross-feature canned-queue brittleness (structured fixture builder)
Full lifecycle rollback in regenerate (Phase 4 work; warning log shipped in T83.4)

Plan

`docs/plans/2026-04-26-v3.5-phase3.5-cleanup.md` (committed in fb7e972).

## Summary Phase 3.5 burns down the combined Phase 2.6/3 + Phase 3.5/4 backlog tracked in CLAUDE.md (17 items total). 12 tasks across 7 waves; both old backlog sections are now empty. New residuals discovered during execution are tracked in a fresh Phase 3.6 / 4 backlog section. ## What shipped (12 tasks, 7 waves) - **Wave 1 (parallel 4-way)** — trivial polish: - T76 narrate_skip timeout_s plumbed through to client.generate - T77 AddresseeDecision.confidence as Literal type - T78 search_memories docstring SQL-bias note - T79 _witness_role_for defensive None handling - **Wave 2 (single)** — T80 scene_summarize.py polish bundle (5 sub-fixes for T58 follow-ups: re-close suffix bloat guard, transcript scoping by scene, swallowed-exception logging, chat-clock closed_at, test coverage gaps) - **Wave 3 (single)** — T81 ChatNotFoundError typed exception (replaces fragile string-prefix sniff in skip routes) - **Wave 4 (single)** — T82 turns.py wiring: - **T82.1 fixes a real bug**: `consume_pending_meanwhile_digests` is now wired into post_turn (T66 surfaced this — meanwhile digests previously stayed pending forever) - T82.2: natural-language skip dispatch now runs scene close detection first - **Wave 5 (single)** — T83 regenerate.py polish bundle: cancel/stop hook + DRY extraction (\`chat/services/turn_common.py\`) + chat-scoped sibling lookup + lifecycle-rollback warning log + ordering symmetry - **Wave 6 (parallel 3-way)**: - T84 unified record_turn_memory API with you_present kwarg - T85 JSON-build audit (no findings — all writes already use json.dumps) + meanwhile cancel route-level test - **T86 closes Phase 1.5/2.5 carryover**: frontend turn_html_replace SSE handler + turn_id stamping on rendered HTML so in-place swap actually works (regenerate live-tab swap was half-done since Phase 2.5) - **Wave 7 (single)** — T87 docs sweep ## Architecture notes - No schema migrations. Schema baseline stays at version 11. - All changes file-disjoint within waves; merges were conflict-free. - Backward compatible: 315 of 343 tests are unchanged from Phase 3 baseline; 28 new tests pin the cleanup behaviors. ## Test plan - [x] Full pytest suite: 343 passing (315 Phase 3 + 28 Phase 3.5). 0 failures. - [x] Each task verified in its own worktree before merge to phase-3.5. - [x] Each task reviewed for spec compliance + code quality. - [ ] Manual smoke (recommend before merging this PR): - [ ] Multi-tab regenerate: open two tabs on same chat; click Regenerate; verify the OTHER tab swaps the regenerated turn IN PLACE (not appended) - [ ] Type "fade out, skip an hour" — verify both scene_closed + time_skip_elision events fire in correct order - [ ] Trigger a meanwhile scene; close it; resume the main chat; verify the digest renders ONCE on the next you-turn AND is consumed automatically - [ ] Trigger a regenerate after an event_completed turn — observe the warning log mentioning un-rolled-back lifecycle transitions ## Phase 3.6 / 4 backlog (discovered during T76-T86) Captured in CLAUDE.md: - read_recent_dialogue chat-id pushdown into SQL (T80 review nit) - Lifecycle warning wording in regenerate (T83.4 — operator-friendly tightening) - Legacy single-bot record_turn_memory consolidation (T84 review nit) - Test fixture event_id assertions (T86 fix-up) Deferred: - Scene-close-on-cancel UX revisit (Phase 2.5 carry-over — pinned, no action) - Cross-feature canned-queue brittleness (structured fixture builder) - Full lifecycle rollback in regenerate (Phase 4 work; warning log shipped in T83.4) ## Plan \`docs/plans/2026-04-26-v3.5-phase3.5-cleanup.md\` (committed in fb7e972).

dohertj2 added 33 commits 2026-04-26 22:47:10 -04:00

fix: plumb narrate_skip timeout_s through to client.generate (T76) d759b90aa1

docs: search_memories docstring mentions SQL-side significance bias (T78) e3dfe18811

fix: AddresseeDecision.confidence as Literal[high|medium|low] (T77) 4199038b8b

fix: _witness_role_for defensive None handling (T79) 9c9d71eb31

merge: T76 narrate_skip timeout_s plumbed through bdc93b4b67

merge: T77 AddresseeDecision.confidence as Literal ce4f56adfa

merge: T78 search_memories docstring SQL-bias note cb570a5adc

merge: T79 _witness_role_for defensive None handling 29e6f346ef

fix: guard scene close key-quote suffix against re-close bloat (T80.1) d123684f9a

Re-running apply_scene_close_summary on the same scene previously caused
recursive bloat: _build_key_quotes_suffix sourced quote text from
memories.pov_summary, which after the first close already carried a
"Key quotes:" suffix. The next close would then quote the quotes,
nesting deeper each time.

Strip any existing suffix from candidate text before truncating to
200 chars in the suffix builder, and from the fresh classifier output
before composing the new value in _summarize_and_apply_for_witness so
the rewrite replaces rather than stacks.

Adds test_scene_close_re_run_does_not_double_suffix.

fix: scope thread detection transcript to closing scene (T80.2) dae481eb92

apply_scene_close_summary fed detect_threads the chat-wide last-50
turns. When a chat has accumulated multiple scenes' worth of dialogue,
that bleeds prior-scene turns into the second close's classifier prompt
and risks mis-attributing threads (closing one that opened earlier,
re-opening one that already closed).

Add an optional ``since_event_id`` kwarg to ``_read_recent_dialogue``
that lower-bounds by event_log id, plus a ``_scene_opened_event_id``
helper that resolves the scene-open event for a given scene_id. Wire
both into the thread-detection call site so its scene_transcript
holds only the closing scene's turns. The per-POV summarizer keeps the
chat-wide approximation it had before — that's intentional.

Adds test_thread_detection_uses_scene_scoped_transcript.

fix: log swallowed exceptions in detect_threads try/except (T80.3) 9d06eaf57a

The broad ``except Exception`` around detect_threads silently dropped
programmer errors (wrong kwargs, import-time failures, etc), making
diagnostics painful. Log at DEBUG with full exc_info so the failure
surfaces in local logs without breaking the close pipeline's
failure-tolerant contract.

Adds test_detect_threads_failure_is_logged using caplog.

fix: thread_closed uses chat-clock time, not wall clock (T80.4) b91a5e9293

T58 stamped emitted ``thread_closed`` events with
``datetime.now(timezone.utc).isoformat()``. The rest of the close
pipeline (memories.chat_clock_at, scene_closed.ended_at, edge writes)
uses the chat's in-world clock. Threads must agree so timeline
reconstruction stays consistent under time skips and replay.

Read ``chat["time"]`` (already loaded for the per-POV path) and pass
it through as ``closed_at``. Falls back to UTC now only when chat_state
has no clock yet — defensive; chat_created always seeds it.

Adds test_thread_closed_uses_chat_clock_time.

test: T58 coverage gaps (truncation, update/close paths) (T80.5) 0d3bbf4272

Three gaps left by T58's initial test coverage:

* test_key_quote_truncation_at_200_chars — exercises the 200-char hard
  slice in _build_key_quotes_suffix so any future change to the
  truncation strategy (ellipsis, word boundary, etc) trips the test.
* test_thread_detection_update_candidate_emits_thread_updated —
  pins the ``update`` action emission shape (thread_id, summary,
  last_referenced_scene_id).
* test_thread_detection_close_candidate_emits_thread_closed — pins
  the ``close`` action emission shape (thread_id, closed_at).

No production change; pure coverage add.

merge: T80 scene_summarize.py polish bundle (T58 follow-ups) 6f0716201f

fix: typed ChatNotFoundError replaces string-prefix sniff in skip routes (T81) f816d44438

merge: T81 ChatNotFoundError typed exception for skip routes 6f50ce5b7a

fix: post_turn consumes pending meanwhile digests (T82.1) be92691f9a

Wire chat.services.prompt.consume_pending_meanwhile_digests into
chat.web.turns.post_turn at the END of the handler, after scene-close
detection and before the response broadcast. Without this call digests
created by a meanwhile close stay pending forever — they surface in the
next you-turn's prompt (via T65) but are never marked consumed, so they
re-render on every subsequent turn.

Idempotent: re-calling the helper produces zero events when nothing's
pending. The T66 cross-feature note is updated to reflect the new
wiring; the existing direct-helper test in test_phase3_integration.py
is preserved as defensive coverage of the helper contract in isolation.

fix: natural-language skip runs scene close detection (T82.2) 71245fb85a

The natural-language skip dispatch in chat.web.turns.post_turn
(intent="skip_elision") previously bypassed scene close detection
entirely. User prose like "fade out, skip an hour" carries both a
close signal and a skip directive — the close summary must capture
the closing scene's final beat (and promote per-POV memories) before
the time advances.

Insert detect_scene_close + apply_scene_close_summary BEFORE the skip
controller invocation in the skip_elision branch. Order: scene close
-> skip narration -> time advance. When there's no active scene or
the prose carries no close signal, detect_scene_close returns the
safe should_close=False default and the flow drops straight to the
skip controller — same behavior as today.

merge: T82 turns.py wiring (consume meanwhile digests + skip runs scene close) 9e7c16de40

feat: regenerate registers stream task in _in_flight_tasks (T83.1) f2fd30c5a9

Both the primary and the interjection sub-stream in
``regenerate_assistant_turn`` are now wrapped in ``asyncio.create_task``
and registered in the chat-keyed ``_in_flight_tasks`` registry that the
``/turns/cancel`` route reads. Without this, hitting Stop during a
mid-regenerate stream was a no-op.

Mirrors the meanwhile registration pattern in chat/web/meanwhile.py
(snapshot-tested by tests/test_meanwhile_turn_flow.py).

Test added: test_regenerate_registers_task_in_in_flight_tasks captures
``"chat_bot_a" in _in_flight_tasks`` at the first stream yield via a
custom MockLLMClient subclass and asserts post-flight cleanup.

refactor: extract turn_common helpers from regenerate + turns (T83.2) d833bbc3e7

The recent-dialogue read and the directed-pair edge gather were
duplicated between ``chat.services.regenerate`` and ``chat.web.turns``.
Extracted into ``chat.services.turn_common`` with two helpers:

- ``read_recent_dialogue(conn, chat_id, *, limit, exclude_event_id)``:
  oldest-first ``[{speaker, text}]`` over user_turn / user_turn_edit /
  assistant_turn rows, with the standard ``superseded_by IS NULL AND
  hidden = 0`` filter. ``exclude_event_id`` covers regenerate's need to
  drop the original assistant_turn before its supersede UPDATE lands.
- ``gather_prior_edges(conn, present_ids)``: ``{(src, tgt): edge}`` over
  every directed pair across ``present_ids``, with the schema default
  50/50 baseline for missing rows.

``chat.web.turns._read_recent_dialogue`` becomes a thin delegate so the
chat-detail template and other in-module callers keep their import
shape; ``_gather_state_update_inputs`` now calls into the shared edge
gather. ``regenerate_assistant_turn`` calls both helpers in three call
sites (primary + post-interjection edges, primary + interjection
recent reads), still post-processing speaker ids to display names for
its prompts.

Decision: ``chat.services.scene_summarize._read_recent_dialogue`` is
left in place — it has a ``since_event_id`` clamp (T80.2) and excludes
``user_turn_edit`` deliberately. Folding it into the shared helper
would either silently change its read shape or require a second flag,
both more invasive than the duplication. Documented in the new module
docstring.

Tests: tests/test_turn_common.py covers chronological ordering,
supersede / other-chat / exclude_event_id filtering, and prior-edge
default-fallback. Existing 6 regenerate + 18 turn_flow tests pass
unchanged.

perf: scope regenerate sibling-lookup to chat_id (T83.3) a1e2d9a24d

The sibling assistant_turn lookup in ``regenerate_assistant_turn``
previously scanned every non-superseded ``assistant_turn`` row across
the whole database and filtered in Python. With many chats in the log
this is O(total_assistant_turns) per regenerate.

Push the chat_id filter into SQL via ``json_extract(payload_json,
'$.chat_id') = ?`` and add ``ORDER BY id DESC LIMIT 50`` so worst-case
work is bounded even within a single chat. Mirrors the SQL pattern in
``chat.web.meanwhile._last_meanwhile_speaker``.

Test added: test_regenerate_sibling_lookup_scoped_to_chat seeds two
chats — the second has an interjection whose ``interjection_of`` value
collides with the first chat's primary speaker. Regenerating chat A
must leave chat B's rows untouched and the regenerated chat A
interjection's ``regenerated_from`` must point at chat A's original
(not chat B's). Pre-T83.3 a global query could in principle latch
onto cross-chat rows.

chore: document regenerate lifecycle-rollback limitation with warning log (T83.4) b667a21c99

When a regenerate replaces an assistant_turn that already produced
lifecycle transitions (``event_started`` / ``event_completed`` /
``event_cancelled``), those transitions are NOT rolled back before
``detect_event_transitions`` re-runs against the new text. A
regenerate-after-completion can therefore double-emit promotion
artifacts.

Phase 3.5 first cut (per the task plan): documentation + WARNING log
naming the affected event_log ids. The actual undo pass is invasive
(re-projection / inverse-handler dispatch) and is deferred to Phase 4.

Implementation:
- TODO docstring block at the top of ``regenerate_assistant_turn``.
- Module-level ``_log = logging.getLogger(__name__)``.
- Scan immediately after the original assistant_turn row is located:
  joins event_log lifecycle rows to the events table on event_id so we
  can scope by chat (lifecycle payloads carry only ``event_id``, not
  ``chat_id``). Filter ``id > original_assistant_event_id`` as the
  positional linkage to "transitions emitted as part of (or after)
  this turn's processing."

Decision (asked in the brief): the scan uses the ``id > original``
heuristic rather than scanning for explicit references. Lifecycle
event payloads do not carry a back-pointer to the assistant_turn that
triggered them — the linkage is positional in the event log. A tighter
linkage would require either adding a payload field on lifecycle
events (cross-cutting schema change) or threading the just-appended
assistant_turn id into ``detect_event_transitions``'s emit calls
(narrow but still cross-cutting). The positional heuristic is loose
but conservative: a turn that emits no lifecycle events produces no
warning, and the warning's purpose is operator-visible breadcrumbs
not an exact rollback set.

Test: test_regenerate_with_prior_lifecycle_logs_warning seeds a turn
that produced ``event_started`` + ``event_completed`` rows and asserts
the WARNING fires with the expected ids.

refactor: regenerate event-detection ordering mirrors post_turn (T83.5) 0de4d1252c

Cosmetic-only renumbering of the event-lifecycle detection block in
``regenerate_assistant_turn`` from ``# 10.`` to ``# 9a.`` — mirrors the
``# 8a.`` shape in ``chat.web.turns.post_turn``. The block was already
in the correct structural position (immediately after the interjection
branch); only the numbering and comment reflected an earlier draft
where it read as a final step rather than the post-interjection /
pre-(absent)-scene-close slot.

No behavioural change. All 9 regenerate tests + 18 turn_flow tests
pass without modification.

merge: T83 regenerate.py polish bundle (cancel + DRY + scoped query + warning + ordering) 82701d3c18

refactor: unified record_turn_memory API with you_present kwarg (T84) da7aa88b8e

Extends record_turn_memory_for_present with a you_present: bool = True
kwarg so a single entry-point covers both you-scenes (witness_you=1)
and meanwhile scenes (witness_you=0). Validates that meanwhile callers
provide a guest_bot_id.

record_meanwhile_memory becomes a thin backward-compat wrapper that
delegates with you_present=False, preserving the call site in
chat/web/meanwhile.py without churn.

test: meanwhile cancel route + JSON-build audit (T85) 9493d24a53

T85.1 — JSON-build audit (chat/state, chat/services, chat/eventlog):
no findings. Every JSON column write in those modules already uses
``json.dumps`` (chat/state/events.py, world.py, edges.py, group_node.py,
meanwhile.py, manual_edit.py, entities.py, chat/services/snapshot.py,
chat/eventlog/log.py); chat/state/meanwhile.py:48-49 even carries an
explicit comment about the ``json.dumps`` choice for safety against
quote/backslash injection. No production changes.

T85.2 — meanwhile cancel route-level coverage:

* ``test_meanwhile_turn_cancellation_via_route`` — pins the
  end-to-end shape produced when /turns/cancel fires mid-meanwhile-
  beat: assistant_turn lands with truncated=True (and the right
  meanwhile_scene_id + speaker_id), no memory_written events fire, no
  post-turn edge_update events fire, and _in_flight_tasks is empty
  post-flight. Drives the cancel by hijacking client.stream to raise
  CancelledError on first iteration — same pattern proven by
  test_cancelled_turn_still_closes_scene_when_user_prose_signals_close
  in tests/test_turn_flow.py. The synchronous TestClient can't issue
  a second POST mid-stream from the same thread, and driving via
  task.cancel() trips GeneratorExit-on-dependency that prevents the
  conn from committing the partial; the inline-raise mirrors what
  cancel_turn produces (CancelledError delivered on next await) and
  is the standard idiom in this codebase. Combined with the existing
  test_meanwhile_turn_registered_in_in_flight_tasks (registration
  pin), the full Stop-button lifecycle for meanwhile beats is now
  covered.
* ``test_meanwhile_cancel_route_no_op_after_turn_completes`` — runs
  a meanwhile turn to completion, then POSTs /turns/cancel; asserts
  204 no-op, no error, registry stays empty. Pins the cancel
  endpoint's robustness against the racy "Stop just after stream
  finished" sequence.

Suite: 334 -> 336 passing.

feat: frontend turn_html_replace SSE handler for regenerate live-swap (T86) aea20a2c83

merge: T84 unified record_turn_memory API with you_present kwarg 73625c0ac4

merge: T85 JSON-build audit + meanwhile cancel route-level test 4a2617565b

merge: T86 frontend turn_html_replace SSE handler + event_id stamping 097073ede5

docs: phase 3.5 status, prune shipped backlog items, capture phase 3.6 follow-ups (T87) 3be8ed8915

merge: T87 phase 3.5 docs sweep — prune shipped backlog, capture phase 3.6 residuals 74bb42397d

dohertj2 merged commit 1b66a2821c into main

2026-04-27 01:56:29 -04:00

dohertj2 referenced this issue from a commit

2026-04-27 01:56:30 -04:00

Merge pull request 'Phase 3.5 cleanup: 17-item backlog burndown' (#5) from phase-3.5 into main

Sign in to join this conversation.