Phase 2.5 cleanup: 15-item backlog burndown #3

Merged
dohertj2 merged 25 commits from phase-2.5 into main 2026-04-26 20:00:40 -04:00
Owner

Summary

Phase 2.5 burns down the 15-item backlog (5 from Phase 1.5 + 10 from Phase 2.5/3) tracked in CLAUDE.md. 8 tasks across 5 waves; both old backlog sections are now empty. New follow-ups discovered during execution are tracked in a fresh "Phase 2.6 / 3 backlog" section.

Note: this PR currently includes Phase 2 (PR #2) since #2 hasn't merged. Once #2 merges, this PR auto-narrows to just Phase 2.5's 27 commits / ~3.4k lines.

What shipped (8 tasks, 5 waves)

  • Wave 1 (parallel):
    • T68 — open_db refactor with check_same_thread parameter (eliminates duplicated PRAGMA setup in chat/web/bots.py)
    • T69 — bot_reset purges orphaned "you" activity rows (also fixes a latent FK constraint crash)
    • T70 — LLM-merged group meta-summary (replaces Phase 2's naive concat; falls back to concat on classifier failure)
  • Wave 2 (single): T71 prompt.py polish (3 sub-commits)
    • T71.1 witness role parametric (host vs guest derived from chat membership; fixes guest-as-speaker memory retrieval)
    • T71.2 single ACTIVITIES: block with bullet-level trim
    • T71.3 NICE trim order kept with documented rationale
  • Wave 3 (single): T72 drawer polish (3 sub-commits)
    • T72.1 deferred v1 edits: edge_trust slider, edge_summary textarea, memory pov_summary textarea, knowledge_facts add/remove (NEW edge_knowledge_fact projector branch)
    • T72.2 first-meeting gate on Add-guest form
    • T72.3 witness flag inline-edit (NEW memory_witness projector branch)
  • Wave 4 (parallel): T73 + T74 (each 3-4 sub-commits)
    • T73 — regenerate.py polish: turn_html_replace SSE event, interjection regenerate, stale-guest cleanup
    • T74 — turn-flow polish + new chat/services/addressee.py classifier service: classifier-based addressee detection, significance for interjection memories, scene-close-on-cancel pinned, stale-guest cleanup
  • Wave 5 (single): T75 docs sweep — empties old backlogs, adds Phase 2.5 status, captures Phase 2.6 follow-ups

Architecture notes

  • No schema migrations (baseline stays at version 8).
  • All changes are file-disjoint within waves; merges were conflict-free.
  • Backward compatible: 244 of 247 tests are unchanged from Phase 2 baseline; 35+ new tests added throughout.
  • Featherless 2-connection cap respected throughout.

Test plan

  • Full pytest suite: 247 passing (212 Phase 2 + 35 Phase 2.5). 0 failures, 0 skips.
  • Each task verified in its own worktree before merge to phase-2.5.
  • Each task reviewed for spec compliance + code quality + security (Jinja2 autoescape verified, no |safe on user content, all SQL parameterized).
  • Manual smoke (recommend before merging this PR):
    • Drawer: edit edge_trust on a chat; verify the new value sticks after refresh
    • Drawer: edit edge_summary; refresh; verify
    • Drawer: toggle a memory's witness flag; refresh; verify
    • Drawer: open Add-guest form for a (host, guest) pair that already shares an edge; verify the gate disables the prose textarea
    • Reset a bot; verify "you" activity rows for that bot's chats are gone
    • Trigger an interjection turn; verify a SignificanceJob was enqueued for the interjection memory
    • Use a bot named "Sam"; ask "did you see what Sam wrote?" — verify host gets the floor (classifier addressee, not substring)
    • Multi-tab regenerate (note: requires Phase 2.6 frontend handler for turn_html_replace to actually swap live)

Phase 2.6 / 3 backlog (discovered during execution, tracked in CLAUDE.md)

  • Frontend handler for turn_html_replace SSE event (closes Phase 1.5 backlog #2 end-to-end)
  • Cancel/stop hook for in-flight regenerate streams
  • DRY: regenerate vs post_turn (recent-dialogue + prior-edges duplication)
  • Sibling-discovery query optimization in regenerate.py
  • _witness_role_for defensive coding for host_bot_id is None
  • Literal["high","medium","low"] type tightening on AddresseeDecision.confidence
  • Scene-close-on-cancel UX revisit if play-testing surfaces a regression

Plan

docs/plans/2026-04-26-v2.5-phase2.5-cleanup.md (committed in e05f28e).

## Summary Phase 2.5 burns down the 15-item backlog (5 from Phase 1.5 + 10 from Phase 2.5/3) tracked in CLAUDE.md. 8 tasks across 5 waves; both old backlog sections are now empty. New follow-ups discovered during execution are tracked in a fresh "Phase 2.6 / 3 backlog" section. **Note:** this PR currently includes Phase 2 (PR #2) since #2 hasn't merged. Once #2 merges, this PR auto-narrows to just Phase 2.5's 27 commits / ~3.4k lines. ## What shipped (8 tasks, 5 waves) - **Wave 1 (parallel):** - T68 — `open_db` refactor with `check_same_thread` parameter (eliminates duplicated PRAGMA setup in `chat/web/bots.py`) - T69 — `bot_reset` purges orphaned "you" activity rows (also fixes a latent FK constraint crash) - T70 — LLM-merged group meta-summary (replaces Phase 2's naive concat; falls back to concat on classifier failure) - **Wave 2 (single):** T71 `prompt.py` polish (3 sub-commits) - T71.1 witness role parametric (`host` vs `guest` derived from chat membership; fixes guest-as-speaker memory retrieval) - T71.2 single `ACTIVITIES:` block with bullet-level trim - T71.3 NICE trim order kept with documented rationale - **Wave 3 (single):** T72 drawer polish (3 sub-commits) - T72.1 deferred v1 edits: edge_trust slider, edge_summary textarea, memory pov_summary textarea, knowledge_facts add/remove (NEW `edge_knowledge_fact` projector branch) - T72.2 first-meeting gate on Add-guest form - T72.3 witness flag inline-edit (NEW `memory_witness` projector branch) - **Wave 4 (parallel):** T73 + T74 (each 3-4 sub-commits) - T73 — regenerate.py polish: turn_html_replace SSE event, interjection regenerate, stale-guest cleanup - T74 — turn-flow polish + new `chat/services/addressee.py` classifier service: classifier-based addressee detection, significance for interjection memories, scene-close-on-cancel pinned, stale-guest cleanup - **Wave 5 (single):** T75 docs sweep — empties old backlogs, adds Phase 2.5 status, captures Phase 2.6 follow-ups ## Architecture notes - No schema migrations (baseline stays at version 8). - All changes are file-disjoint within waves; merges were conflict-free. - Backward compatible: 244 of 247 tests are unchanged from Phase 2 baseline; 35+ new tests added throughout. - Featherless 2-connection cap respected throughout. ## Test plan - [x] Full pytest suite: 247 passing (212 Phase 2 + 35 Phase 2.5). 0 failures, 0 skips. - [x] Each task verified in its own worktree before merge to phase-2.5. - [x] Each task reviewed for spec compliance + code quality + security (Jinja2 autoescape verified, no `|safe` on user content, all SQL parameterized). - [ ] Manual smoke (recommend before merging this PR): - [ ] Drawer: edit edge_trust on a chat; verify the new value sticks after refresh - [ ] Drawer: edit edge_summary; refresh; verify - [ ] Drawer: toggle a memory's witness flag; refresh; verify - [ ] Drawer: open Add-guest form for a (host, guest) pair that already shares an edge; verify the gate disables the prose textarea - [ ] Reset a bot; verify "you" activity rows for that bot's chats are gone - [ ] Trigger an interjection turn; verify a `SignificanceJob` was enqueued for the interjection memory - [ ] Use a bot named "Sam"; ask "did you see what Sam wrote?" — verify host gets the floor (classifier addressee, not substring) - [ ] Multi-tab regenerate (note: requires Phase 2.6 frontend handler for `turn_html_replace` to actually swap live) ## Phase 2.6 / 3 backlog (discovered during execution, tracked in CLAUDE.md) - Frontend handler for `turn_html_replace` SSE event (closes Phase 1.5 backlog #2 end-to-end) - Cancel/stop hook for in-flight regenerate streams - DRY: regenerate vs post_turn (recent-dialogue + prior-edges duplication) - Sibling-discovery query optimization in regenerate.py - `_witness_role_for` defensive coding for `host_bot_id is None` - `Literal["high","medium","low"]` type tightening on `AddresseeDecision.confidence` - Scene-close-on-cancel UX revisit if play-testing surfaces a regression ## Plan `docs/plans/2026-04-26-v2.5-phase2.5-cleanup.md` (committed in `e05f28e`).
dohertj2 added 99 commits 2026-04-26 17:47:51 -04:00
- .gitignore: add *.egg-info/ so editable installs don't show in git status.
- pyproject.toml: add [build-system] and [tool.setuptools.packages.find]
  scoped to chat*, fixing pip install -e . which was failing on data/
  auto-discovery.
- CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status,
  capturing the small follow-ups surfaced in implementer reviews
  (open_db refactor, regenerate SSE broadcast, you-activity purge,
  drawer edits for deferred fields, NICE trim order).
Idempotent seeder for three sample bots (Maya — coworker slow-burn,
Eli — live-in partner, Sam — bartender / new connection). Each is a
distinct relational archetype to exercise the system from different
angles. Run from repo root:

    .venv/bin/python scripts/seed_sample_bots.py

Re-running skips ids that already exist. After seeding, walk each bot
through kickoff parse-and-confirm at /bots/<id>/kickoff.
The kickoff parse-and-confirm route was 500-ing intermittently because
Hermes-3 + Featherless's response_format={"type":"json_object"} only
guarantees JSON output, NOT a particular schema. The model was inventing
its own field names (sceneTime, entities, settingDetails) instead of
the KickoffParse fields, causing Pydantic validation to fail on both
classify() retries.

Three changes:

1. Include the Pydantic JSON schema in the system prompt so the model
   knows exactly which keys to produce. Affects every classify() call
   (kickoff parse, turn parse, scene-close detect, significance,
   state-update, scene summarize). Strip ```json fences if the model
   wraps its output. Bump retries 2 → 3 (model is stochastic; one extra
   attempt closes most of the remaining gap).

2. parse_kickoff() now passes a default empty KickoffParse so the
   route degrades to a fillable form instead of 500 when the classifier
   ultimately fails. The confirm form is the human-in-the-loop; an
   empty form is strictly better UX than a stack trace.

3. Tests updated: bumped canned-failure arrays from 2 → 3 entries to
   match the new attempt count; renamed kickoff test from
   "raises_when_classifier_fails_twice" to
   "falls_back_to_empty_when_classifier_fails" reflecting the new
   degraded-but-usable behavior.

Verified live with all 3 sample bots (maya/eli/sam) — kickoff route
returns 200 across multiple attempts. Full suite: 168 passed.
Two related issues blocking real-world use of the kickoff parse:

1. Classifier calls take ~12s end-to-end on Featherless for the
   complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of
   structured JSON). The 10s timeout was firing on most attempts,
   causing all 3 retries to time out and the empty-fallback to render
   with blank form values. Bumping the default
   classifier_timeout_s 10 → 30s gives generous headroom; measured
   p99 is ~13s, so 30s is comfortable.

2. Featherless caps concurrent connections per account (2 on free /
   lower paid tiers). Each turn flow can fire 4–5 calls (parse,
   scene-close detect, narrative stream, two state-update passes)
   plus the background significance worker. Without a gate, we'd
   exceed the cap and fail.

   Added a class-level ``asyncio.Semaphore`` to FeatherlessClient,
   shared across all instances, configured once in lifespan from
   ``Settings.featherless_max_concurrent`` (default 2). Both
   ``generate`` and ``stream`` acquire the semaphore for the duration
   of the call; the stream holds it until the async generator
   completes, so token streaming is correctly accounted for.

Verified live: 4/4 sequential kickoff parses for the same bot all
succeed with real parsed values (previously ~50% blank-fallback).
Full suite: 168 passed.
Empty submission was producing a blank user_turn event in the log and
firing the LLM stream anyway — the bot would invent a response from the
kickoff context alone, producing a monologue with no user input. Two-
layer fix:

- Browser: add `required` to the prose textarea in chat.html so the
  form refuses to submit empty.
- Server: 400 in post_turn when prose.strip() is empty. Defense in
  depth — if a client bypasses the textarea attribute (custom UI,
  curl, etc.), the server still rejects.

Verified live: POST with empty body returns 400; POST with whitespace-
only returns 400; chat shell renders the textarea with required.
Full suite: 168 passed.
The form-submit handler in chat.html was setting
``textarea.disabled = true`` synchronously before the browser actually
serialized the form. Disabled form fields are excluded from
submission, so the request body contained ``prose=""`` even when the
user had typed text — which the server (correctly) rejected with the
new empty-prose 400. Net effect: typing "hello" + Send gave a "prose
cannot be empty" error.

Switched to ``readOnly``: same UX (user can't edit while streaming)
but the field IS submitted. The unlock path now also clears the
textarea and refocuses for the next turn.
Bot replies were running long (4 paragraphs of action+dialogue beats
per turn) because we never set max_tokens on the narrative call. Three
tunable knobs now in Settings (set in data/config.toml to override):

- narrative_max_tokens: int = 400
  Hard cap on each generated response. ~400 tokens ≈ 1–2 short
  paragraphs. Drop to 200 for terse banter, bump to 800+ for longer
  scenes.

- narrative_temperature: float = 0.85
  Sampling temperature. 0.7 = grounded/consistent (slightly stiff),
  0.85 = creative-but-in-character (default), 1.0 = wide variety,
  >1.0 = often off-the-rails.

- prompt closing instruction now nudges: "Keep your response to a
  single beat — one or two short paragraphs at most. Don't monologue;
  leave room for the other person to react."

Both turns.py (post_turn) and regenerate.py forward the params to
client.stream(). FeatherlessClient already passes **params through to
the OpenAI-compat endpoint.

Note: temperature doesn't control length — that was a common
misconception. max_tokens is the actual length cap. Lower temperature
makes word choice more predictable (slightly stiffer voice), not
shorter. Both knobs are useful for different goals.
13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel
subagent execution where file-disjointness allows.

Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files
and can be dispatched concurrently via the Agent tool with
isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi-
entity turn flow) are single-task because they touch hot files
(_drawer.html, turns.py) that cannot be safely co-modified.

Plan covers:
- T36: group_node schema + handlers (new migration 0008)
- T37: guest_added / guest_removed event handlers (modifies world.py)
- T38: relationship-seed service ("have they met?")
- T39: interjection classifier service
- T40: multi-entity state-update coordinator (6 directed pairs)
- T41: multi-witness memory write helper
- T42: drawer guest add/remove UI + render
- T43: multi-entity prompt assembly (extends T18)
- T44: multi-entity turn flow (rewrites post_turn)
- T45: multi-entity per-POV summaries on scene close
- T46: witness filter cross-coverage tests
- T47: bot_reset cascades to guest references
- T48: Phase 2 documentation update

Plan also documents:
- Worktree-per-subagent dispatch pattern using Agent isolation flag
- Merge ordering per wave (file-disjointness = conflict-free merges)
- Failure recovery (cancel failed parallel task, re-dispatch as solo)
- Conflict prevention checklist (verify Files sections disjoint per wave)

Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and
`parallelGroup` so a future executing-plans run can dispatch correctly.

NOT EXECUTING. Plan only.
Rewrites post_turn for the multi-entity world:

- Addressee detection via case-insensitive whole-word match against the
  guest name; defaults to host on no-match or both-match.
- Multi-entity prompt assembly: forwards guest_id so the prompt sees
  the third party's activity / edges / group-node.
- Multi-witness memory write: record_turn_memory_for_present writes one
  memory per present bot witness when a guest is in the room.
- Multi-pair state-update: compute_state_updates_for_present emits one
  edge_update per directed pair (6 with a guest, 2 without).
- Interjection branch (T39): when a guest is present and the primary
  beat completes, the silent witness may follow on. detect_interjection
  decides; on True we stream a second narrative as the witness, append a
  second assistant_turn linked to the same user_turn_id, and re-run the
  multi-pair state update + memory write for the follow-on beat. Cancel
  collapses both halves; a cancelled interjection skips its downstream
  passes so we don't classifier-spam against a half-formed beat.
- Scene-close runs after both beats so apply_scene_close_summary sees
  the full closing scene; T45's guest-aware summarizer handles per-POV
  rewrites for each present witness.

regenerate.py mirrors the prompt / memory / state-update changes for
1:1 and multi-entity scenes. Per the Phase 2 spec, interjection
regeneration is deferred to Phase 2.5 — regenerate only re-streams the
addressee turn for v2.

Tests: adds 5 cases to tests/test_turn_flow.py covering the no-guest
regression, multi-bot without interjection, multi-bot with interjection,
scene-close per-POV rewrites, and addressee routing on a named-bot
prose. Each test pins its own canned MockLLMClient queue with the call
shape documented in the docstring.
19 tasks across 8 waves covering events with lifecycles, time skips
(elision + jump), active threads, significance/retrieval refinements,
and meanwhile scenes (host+guest with no 'you'). Mirrors the Phase 2
plan structure: pre-flight, parallel-execution strategy with worktree
isolation, file-disjointness analysis per wave, and per-task TDD spec
with commit messages.

Phase 3 schema: adds 0009_events.sql, 0010_threads.sql,
0011_meanwhile_scenes.sql (final version 11). Builds on Phase 2's
3-entity scene support and event-sourced architecture.
8 tasks across 5 waves consolidating the 15-item backlog tracked in
CLAUDE.md (5 from Phase 1.5 cleanup + 10 from Phase 2.5/3). Items are
grouped by file ownership so each wave stays file-disjoint:

- Wave 1 (parallel): open_db refactor, bot_reset orphan cleanup,
  LLM-merged group meta-summary
- Wave 2 (single): prompt.py polish — witness role parametric, single
  ACTIVITIES block, NICE trim documented
- Wave 3 (single): drawer polish — deferred v1 edits, first-meeting
  gate, witness flag editing
- Wave 4 (parallel): regenerate.py polish (SSE + interjection
  regenerate + stale-guest cleanup); turn-flow polish + new addressee
  service (classifier addressee + significance for interjection +
  scene-close-on-cancel pinned + stale-guest cleanup)
- Wave 5 (single): docs sweep

No schema migrations. Bundled tasks split into per-item sub-commits
for clean review bisection. Uses task ids T68-T75 to avoid collision
with Phase 3 plan (T49-T67) regardless of merge order.
Phase 2 T46 pinned the witness mask contract on search_memories with a
witness_role parameter (host/guest/you). The prompt-assembly call site
in assemble_narrative_prompt was hardcoded to "host", which silently
returned the wrong rows when the speaker was the guest bot.

Derive the witness role from chat membership via a new private helper
_witness_role_for(speaker_bot_id, host_bot_id), and apply it at the
search_memories call. Behaviour is identical when the speaker is the
host (or when no guest is present); the fix is load-bearing only when
the guest bot is the speaker — exactly the scenario Phase 2 T43 added
support for.

Tests: pin both directions (host-as-speaker and guest-as-speaker) by
patching the imported search_memories reference and asserting the
witness_role argument the call site emits.
Phase 2 T43 added a SECOND ACTIVITIES: block to render guest activity
separately from you+speaker. Two consecutive ACTIVITIES: headers can
read as a duplicate-section bug to the LLM and bloat the prompt.

Consolidate to a single ACTIVITIES: block whose body is composed from
up to three bullets (you, speaker, guest). The block itself is
MUST-tier (always renders); bullet-level trim drops bullets in the
order guest -> group node -> you -> other edges, with the speaker
bullet as the MUST-tier floor (the speaker's own current activity is
the load-bearing slice).

Implementation chose Option B from the polish plan: pre-truncate the
bullets list at trim time before _build_activity_block runs, rather
than introduce a granular tier mode in the trim machinery. Rationale
documented in code; the existing block-level trim ladder gains a
single new toggle (include_you_activity) and the SHOULD-tier
guest_activity_block is gone.

Tests:
- test_single_activities_block_with_three_bullets_when_3_entities:
  exactly one ACTIVITIES: header with all three entity bullets.
- test_tight_budget_drops_guest_activity_bullet_first: speaker bullet
  survives, guest bullet absent under tight budget.
- Existing test_assemble_with_tight_budget_drops_guest_activity_first
  still passes (asserts on bullet absence, not block-header absence).
T18 review (Phase 1) noted the NICE-tier trim drops previous-scene
FIRST while §6.3 spec lists previous-scene LAST in the NICE tier
group. Decision: keep the existing greedy order (previous-scene
first), and document why.

Rationale (now in code at the trim ladder):
  1. Cheapest-impact-first — a per-POV previous-scene summary loses
     less narrative continuity than the older dialogue turns or
     memory hits it competes with.
  2. Greedy lookahead is more expensive than the marginal narrative
     loss. Dropping previous-scene typically clears the soft-budget
     slack in one step.

Test added: test_nice_trim_order_documented pins the observed order
(previous-scene -> memories -> dialogue) so a future refactor can't
silently invert it. Sized so that all-NICE config overflows soft but
dropping just previous-scene fits — proves memories and older
dialogue turns survive while previous-scene is the FIRST drop.
Adds the four POST routes whose state-layer support was already
dispatched by the manual_edit projector (edge_trust, edge_summary,
memory_pov_summary) plus a new edge_knowledge_fact dispatch branch for
add/remove fact list manipulation. Drawer template gains editable
textareas, sliders, and add/remove fact controls. Remove semantics on
knowledge_fact match by string (not index) so concurrent edge_update
events appending facts between drawer renders don't desync the form.
When a host->candidate edge already exists from a prior chat, the
Add-guest form renders the prose textarea disabled with an "already
know each other" note. Submission without the explicit "re-seed
anyway" toggle skips seed_inter_bot_edges so existing edge content
(affinity, trust, knowledge, summaries) survives — guest_added and
group_node_initialized still fire. A small inline script enables /
disables the textarea per-option based on a pre-computed
existing_guest_edges dict surfaced by the GET handler.
Memories grow per-flag witness checkboxes (you / host / guest) that
auto-submit on change via HTMX. The new POST route emits a manual_edit
event with target_kind=memory_witness and a {flag, value} payload;
prior_value mirrors the same shape so an inverse edit restores the
flag. The drawer's recent-memories query now selects the three
witness columns alongside the existing fields so the template can
render checkbox state without a second query per row.
After the new assistant_turn lands, publish a `turn_html_replace` SSE
event carrying the rendered HTML, the new turn_id, and the original
assistant_turn id as `supersedes_id` so connected tabs can swap the
prior DOM node in-place. Phase 1 T29 deferred this — page had to refresh
to see the regenerated turn.

Uses a new event name (not the existing `turn_html`) because the HTMX
`sse-swap="turn_html"` consumer expects raw HTML and an *append*
semantic; regenerate is a *replace*. The new event ships as JSON
(supersedes_id forces sse.py's JSON branch) so the front-end JS can
read the swap target from the payload.

Test: `test_regenerate_broadcasts_turn_html_over_sse` patches the
`publish` reference inside the regenerate module and asserts the
event shape.
Replace the substring _detect_addressee_id helper with a classifier
call for the multi-entity case. The substring helper is kept as a
fast-path for the no-guest case (no LLM round-trip needed when only
one bot is present, preserves throughput).

- New service chat/services/addressee.py wrapping the existing
  classifier wrapper. AddresseeDecision carries addressee_id +
  confidence + reason; classifier failure falls back to the host with
  reason="fallback" (graceful-degradation, matches the relationship_seed
  / interjection pattern).
- chat/web/turns.py post_turn now calls detect_addressee in the
  multi-entity branch; 1:1 keeps the substring path.
- tests/test_addressee.py: 3 new tests (guest pick, host pick,
  classifier-failure fallback).
- tests/test_turn_flow.py: existing multi-entity tests now feed a
  canned addressee response in the queue. The addressee-routing test
  is updated to assert classifier-driven routing rather than substring.
T44's interjection branch wrote interjection memories via
record_turn_memory_for_present but never enqueued a SignificanceJob,
so the interjection beat could land in memory but never be scored —
which meant it could never auto-pin even when it carried a pivotal
moment.

- Capture the host-POV memory id from the interjection's memory write
  result and enqueue a SignificanceJob mirroring the primary turn's
  pattern. One enqueue per beat (host id; guest POV piggybacks on the
  same score since the prose is identical for v2 — per-POV rewrite
  happens at scene close in T45).
- New test test_interjection_enqueues_significance_job pins the
  contract by intercepting worker.enqueue and asserting two distinct
  jobs land per 3-entity turn that fires an interjection.
Phase 2 T44 deferred interjection regenerate — when the original turn
group included a follow-on interjection beat we left it untouched. Now
regenerate redoes BOTH halves:

- Detect a sibling interjection by looking up assistant_turn events
  pinned to the same user_turn_id with `interjection_of` set.
- After streaming the new primary, run `detect_interjection` against
  the new primary text.
- If True: stream a new interjection from the silent witness, append
  with `interjection_of=<new primary speaker_id>`, supersede the
  original interjection, and re-run memory + state-update for the new
  beat.
- If False: supersede the original interjection without a replacement
  (back-pointer goes to the new primary so the row stays consistently
  hidden).

Also broadcast a `turn_html_replace` event for the new interjection so
the front-end can swap the prior interjection node in place (mirrors
T73.1's primary swap).

Tests:
- `test_regenerate_with_interjection_redoes_both_turns`: classifier
  returns True; assert two new assistant_turns land for the same
  user_turn, second carries `interjection_of`, originals superseded.
- `test_regenerate_drops_interjection_when_classifier_returns_false`:
  classifier returns False; assert one new assistant_turn (primary
  only) and the original interjection is superseded with no
  replacement.

`interjection_of` carries the primary's *speaker_id* (matching the
existing convention in chat/web/turns.py) rather than the event_id.
Phase 2 T44 added a defensive degrade-to-1:1 here when
`chat.guest_bot_id` pointed at a deleted bot. T47 fixed the root cause:
`bot_reset` cascade-clears the column when the referenced bot is purged
(verified by tests/test_reset.py), so the guard was dead code.

No corresponding stale-guest test existed in tests/test_regenerate.py
to remove. The bot_reset cascade test in tests/test_reset.py already
covers the root-cause behavior.
Phase 2 T44 review noted that scene close still runs when a primary
turn is cancelled mid-stream and asked the implementer to review.

Review finding: the existing behavior is correct, not a bug. The
close-detection branch in post_turn consumes ONLY the user's prose
(fully appended to the event_log BEFORE streaming starts) and the
current container name. It does NOT consume the bot's output. A user
who types "we're done here, fade out" and then hits Stop mid-stream
still meant to close — the cancelled bot beat doesn't invalidate
that intent.

- Document the rationale with an inline comment near the
  close-detection branch in chat/web/turns.py.
- Add regression test
  test_cancelled_turn_still_closes_scene_when_user_prose_signals_close
  that drives a stream raising CancelledError on first iteration and
  asserts the scene_closed event still lands.
T44 carried a defensive degrade-to-1:1 block in post_turn for the
case where chat.guest_bot_id pointed at a deleted bot. T47 then
fixed the root cause by adding a bot_reset cascade that clears
guest_bot_id from any chat that referenced the deleted bot, so the
post_turn defensive block was rendered dead.

Remove the orphan-clear branch and replace it with a comment
documenting that get_bot now returns a real row when guest_bot_id
is non-None. The cascade behavior is pinned by
test_reset_clears_guest_reference_in_other_chats in tests/test_reset.py.
dohertj2 merged commit e4fd888b53 into main 2026-04-26 20:00:40 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/chat#3