Phase 2.5 cleanup: 15-item backlog burndown #3

2026-04-26T17:47:50-04:00

dohertj2 commented

2026-04-26 17:47:50 -04:00

Summary

Phase 2.5 burns down the 15-item backlog (5 from Phase 1.5 + 10 from Phase 2.5/3) tracked in CLAUDE.md. 8 tasks across 5 waves; both old backlog sections are now empty. New follow-ups discovered during execution are tracked in a fresh "Phase 2.6 / 3 backlog" section.

Note: this PR currently includes Phase 2 (PR #2) since #2 hasn't merged. Once #2 merges, this PR auto-narrows to just Phase 2.5's 27 commits / ~3.4k lines.

What shipped (8 tasks, 5 waves)

Wave 1 (parallel):
- T68 — open_db refactor with check_same_thread parameter (eliminates duplicated PRAGMA setup in chat/web/bots.py)
- T69 — bot_reset purges orphaned "you" activity rows (also fixes a latent FK constraint crash)
- T70 — LLM-merged group meta-summary (replaces Phase 2's naive concat; falls back to concat on classifier failure)
Wave 2 (single): T71 prompt.py polish (3 sub-commits)
- T71.1 witness role parametric (host vs guest derived from chat membership; fixes guest-as-speaker memory retrieval)
- T71.2 single ACTIVITIES: block with bullet-level trim
- T71.3 NICE trim order kept with documented rationale
Wave 3 (single): T72 drawer polish (3 sub-commits)
- T72.1 deferred v1 edits: edge_trust slider, edge_summary textarea, memory pov_summary textarea, knowledge_facts add/remove (NEW edge_knowledge_fact projector branch)
- T72.2 first-meeting gate on Add-guest form
- T72.3 witness flag inline-edit (NEW memory_witness projector branch)
Wave 4 (parallel): T73 + T74 (each 3-4 sub-commits)
- T73 — regenerate.py polish: turn_html_replace SSE event, interjection regenerate, stale-guest cleanup
- T74 — turn-flow polish + new chat/services/addressee.py classifier service: classifier-based addressee detection, significance for interjection memories, scene-close-on-cancel pinned, stale-guest cleanup
Wave 5 (single): T75 docs sweep — empties old backlogs, adds Phase 2.5 status, captures Phase 2.6 follow-ups

Architecture notes

No schema migrations (baseline stays at version 8).
All changes are file-disjoint within waves; merges were conflict-free.
Backward compatible: 244 of 247 tests are unchanged from Phase 2 baseline; 35+ new tests added throughout.
Featherless 2-connection cap respected throughout.

Test plan

Full pytest suite: 247 passing (212 Phase 2 + 35 Phase 2.5). 0 failures, 0 skips.
Each task verified in its own worktree before merge to phase-2.5.
Each task reviewed for spec compliance + code quality + security (Jinja2 autoescape verified, no |safe on user content, all SQL parameterized).
Manual smoke (recommend before merging this PR):
- Drawer: edit edge_trust on a chat; verify the new value sticks after refresh
- Drawer: edit edge_summary; refresh; verify
- Drawer: toggle a memory's witness flag; refresh; verify
- Drawer: open Add-guest form for a (host, guest) pair that already shares an edge; verify the gate disables the prose textarea
- Reset a bot; verify "you" activity rows for that bot's chats are gone
- Trigger an interjection turn; verify a SignificanceJob was enqueued for the interjection memory
- Use a bot named "Sam"; ask "did you see what Sam wrote?" — verify host gets the floor (classifier addressee, not substring)
- Multi-tab regenerate (note: requires Phase 2.6 frontend handler for turn_html_replace to actually swap live)

Phase 2.6 / 3 backlog (discovered during execution, tracked in CLAUDE.md)

Frontend handler for turn_html_replace SSE event (closes Phase 1.5 backlog #2 end-to-end)
Cancel/stop hook for in-flight regenerate streams
DRY: regenerate vs post_turn (recent-dialogue + prior-edges duplication)
Sibling-discovery query optimization in regenerate.py
_witness_role_for defensive coding for host_bot_id is None
Literal["high","medium","low"] type tightening on AddresseeDecision.confidence
Scene-close-on-cancel UX revisit if play-testing surfaces a regression

Plan

docs/plans/2026-04-26-v2.5-phase2.5-cleanup.md (committed in e05f28e).

## Summary Phase 2.5 burns down the 15-item backlog (5 from Phase 1.5 + 10 from Phase 2.5/3) tracked in CLAUDE.md. 8 tasks across 5 waves; both old backlog sections are now empty. New follow-ups discovered during execution are tracked in a fresh "Phase 2.6 / 3 backlog" section. **Note:** this PR currently includes Phase 2 (PR #2) since #2 hasn't merged. Once #2 merges, this PR auto-narrows to just Phase 2.5's 27 commits / ~3.4k lines. ## What shipped (8 tasks, 5 waves) - **Wave 1 (parallel):** - T68 — `open_db` refactor with `check_same_thread` parameter (eliminates duplicated PRAGMA setup in `chat/web/bots.py`) - T69 — `bot_reset` purges orphaned "you" activity rows (also fixes a latent FK constraint crash) - T70 — LLM-merged group meta-summary (replaces Phase 2's naive concat; falls back to concat on classifier failure) - **Wave 2 (single):** T71 `prompt.py` polish (3 sub-commits) - T71.1 witness role parametric (`host` vs `guest` derived from chat membership; fixes guest-as-speaker memory retrieval) - T71.2 single `ACTIVITIES:` block with bullet-level trim - T71.3 NICE trim order kept with documented rationale - **Wave 3 (single):** T72 drawer polish (3 sub-commits) - T72.1 deferred v1 edits: edge_trust slider, edge_summary textarea, memory pov_summary textarea, knowledge_facts add/remove (NEW `edge_knowledge_fact` projector branch) - T72.2 first-meeting gate on Add-guest form - T72.3 witness flag inline-edit (NEW `memory_witness` projector branch) - **Wave 4 (parallel):** T73 + T74 (each 3-4 sub-commits) - T73 — regenerate.py polish: turn_html_replace SSE event, interjection regenerate, stale-guest cleanup - T74 — turn-flow polish + new `chat/services/addressee.py` classifier service: classifier-based addressee detection, significance for interjection memories, scene-close-on-cancel pinned, stale-guest cleanup - **Wave 5 (single):** T75 docs sweep — empties old backlogs, adds Phase 2.5 status, captures Phase 2.6 follow-ups ## Architecture notes - No schema migrations (baseline stays at version 8). - All changes are file-disjoint within waves; merges were conflict-free. - Backward compatible: 244 of 247 tests are unchanged from Phase 2 baseline; 35+ new tests added throughout. - Featherless 2-connection cap respected throughout. ## Test plan - [x] Full pytest suite: 247 passing (212 Phase 2 + 35 Phase 2.5). 0 failures, 0 skips. - [x] Each task verified in its own worktree before merge to phase-2.5. - [x] Each task reviewed for spec compliance + code quality + security (Jinja2 autoescape verified, no `|safe` on user content, all SQL parameterized). - [ ] Manual smoke (recommend before merging this PR): - [ ] Drawer: edit edge_trust on a chat; verify the new value sticks after refresh - [ ] Drawer: edit edge_summary; refresh; verify - [ ] Drawer: toggle a memory's witness flag; refresh; verify - [ ] Drawer: open Add-guest form for a (host, guest) pair that already shares an edge; verify the gate disables the prose textarea - [ ] Reset a bot; verify "you" activity rows for that bot's chats are gone - [ ] Trigger an interjection turn; verify a `SignificanceJob` was enqueued for the interjection memory - [ ] Use a bot named "Sam"; ask "did you see what Sam wrote?" — verify host gets the floor (classifier addressee, not substring) - [ ] Multi-tab regenerate (note: requires Phase 2.6 frontend handler for `turn_html_replace` to actually swap live) ## Phase 2.6 / 3 backlog (discovered during execution, tracked in CLAUDE.md) - Frontend handler for `turn_html_replace` SSE event (closes Phase 1.5 backlog #2 end-to-end) - Cancel/stop hook for in-flight regenerate streams - DRY: regenerate vs post_turn (recent-dialogue + prior-edges duplication) - Sibling-discovery query optimization in regenerate.py - `_witness_role_for` defensive coding for `host_bot_id is None` - `Literal["high","medium","low"]` type tightening on `AddresseeDecision.confidence` - Scene-close-on-cancel UX revisit if play-testing surfaces a regression ## Plan `docs/plans/2026-04-26-v2.5-phase2.5-cleanup.md` (committed in `e05f28e`).

dohertj2 added 99 commits 2026-04-26 17:47:51 -04:00

feat: project skeleton with health endpoint 4a60171035

feat: config loader with toml + env override 01e6975d20

feat: sqlite migration runner with meta version table 67517926aa

feat: LLMClient protocol with Featherless and mock implementations e627356168

feat: classifier wrapper with retry, timeout, schema-default fallback c2aceffda1

feat: append-only event log with projector skeleton 517fe49aef

feat: bot and you entity schemas with projector handlers 5e6bbb586c

docs: fix Task 6 plan snippet: PRAGMA table_info name index is c[1] not c[0] 7e6c2985dd

feat: directed edges with per-turn delta projector bc97d425ef

feat: memory schema with witness flags and FTS5 index 30e6648122

feat: chats, chat_state, containers, scenes, activity tables ec344064f1

feat: kickoff prose parser via classifier a5339fc1d2

feat: bot authoring form with bot_authored event 44ea627a8a

feat: settings page with you-entity authoring e44e2bf93f

feat: kickoff parse-and-confirm flow with chat creation fbb16c86b3

feat: top-level nav and chat list view 0c08745194

feat: chat shell page rendering e79f4d8d22

feat: per-chat SSE channel and pub/sub 656c2558cb

feat: turn input parser via classifier a0f5e818ec

feat: prompt assembly with must/should/nice trim tiers 73d8b0c092

feat: narrative streaming via SSE with assistant_turn event 9b45710cb1

feat: post-turn state-update pass per present entity e8d24a0875

feat: per-turn memory writes with witness flags a45dabb6ae

feat: async significance pass with auto-pin on score 3 eb4cdf9cbb

feat: FTS5 memory retrieval with witness filter and ranking boosts 3995a8671b

feat: read-only drawer with scene, activity, edges, memories 5fc5b8ac23

feat: drawer edits with manual_edit event capture db3005fc17

feat: scene close on hard signals with manual override 0997562e75

feat: per-POV summary and edge summary update on scene close b5175aefaa

feat: rewind with impact preview, pre-rewind snapshot, undo toast aa0563b4fa

feat: regenerate with edit-then-regenerate inline UX 46062973c2

feat: bot reset with hard confirm and event-driven purge 82be8b3f51

feat: periodic snapshots with retention and cold-load fast-path b9644fad31

feat: nightly DB backups with 14-day retention 8390703b73

feat: transcript display formatting with markdown and OOC styling 330077afcf

feat: streaming UX with Stop, disconnect handling, send-lock 0353d592cd

feat: error banners and first-run navigation flow a302ed427a

chore: post-Phase-1 cleanup — gitignore, packaging, backlog 365dacc0d0

- .gitignore: add *.egg-info/ so editable installs don't show in git status.
- pyproject.toml: add [build-system] and [tool.setuptools.packages.find]
  scoped to chat*, fixing pip install -e . which was failing on data/
  auto-discovery.
- CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status,
  capturing the small follow-ups surfaced in implementer reviews
  (open_db refactor, regenerate SSE broadcast, you-activity purge,
  drawer edits for deferred fields, NICE trim order).

chore: add scripts/seed_sample_bots.py 12502d6ec7

Idempotent seeder for three sample bots (Maya — coworker slow-burn,
Eli — live-in partner, Sam — bartender / new connection). Each is a
distinct relational archetype to exercise the system from different
angles. Run from repo root:

    .venv/bin/python scripts/seed_sample_bots.py

Re-running skips ids that already exist. After seeding, walk each bot
through kickoff parse-and-confirm at /bots/<id>/kickoff.

fix: classifier robustness — schema in prompt, retries, kickoff fallback 5aab98e4d7

The kickoff parse-and-confirm route was 500-ing intermittently because
Hermes-3 + Featherless's response_format={"type":"json_object"} only
guarantees JSON output, NOT a particular schema. The model was inventing
its own field names (sceneTime, entities, settingDetails) instead of
the KickoffParse fields, causing Pydantic validation to fail on both
classify() retries.

Three changes:

1. Include the Pydantic JSON schema in the system prompt so the model
   knows exactly which keys to produce. Affects every classify() call
   (kickoff parse, turn parse, scene-close detect, significance,
   state-update, scene summarize). Strip ```json fences if the model
   wraps its output. Bump retries 2 → 3 (model is stochastic; one extra
   attempt closes most of the remaining gap).

2. parse_kickoff() now passes a default empty KickoffParse so the
   route degrades to a fillable form instead of 500 when the classifier
   ultimately fails. The confirm form is the human-in-the-loop; an
   empty form is strictly better UX than a stack trace.

3. Tests updated: bumped canned-failure arrays from 2 → 3 entries to
   match the new attempt count; renamed kickoff test from
   "raises_when_classifier_fails_twice" to
   "falls_back_to_empty_when_classifier_fails" reflecting the new
   degraded-but-usable behavior.

Verified live with all 3 sample bots (maya/eli/sam) — kickoff route
returns 200 across multiple attempts. Full suite: 168 passed.

fix: classifier timeout + Featherless concurrency cap 5c039c8e56

Two related issues blocking real-world use of the kickoff parse:

1. Classifier calls take ~12s end-to-end on Featherless for the
   complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of
   structured JSON). The 10s timeout was firing on most attempts,
   causing all 3 retries to time out and the empty-fallback to render
   with blank form values. Bumping the default
   classifier_timeout_s 10 → 30s gives generous headroom; measured
   p99 is ~13s, so 30s is comfortable.

2. Featherless caps concurrent connections per account (2 on free /
   lower paid tiers). Each turn flow can fire 4–5 calls (parse,
   scene-close detect, narrative stream, two state-update passes)
   plus the background significance worker. Without a gate, we'd
   exceed the cap and fail.

   Added a class-level ``asyncio.Semaphore`` to FeatherlessClient,
   shared across all instances, configured once in lifespan from
   ``Settings.featherless_max_concurrent`` (default 2). Both
   ``generate`` and ``stream`` acquire the semaphore for the duration
   of the call; the stream holds it until the async generator
   completes, so token streaming is correctly accounted for.

Verified live: 4/4 sequential kickoff parses for the same bot all
succeed with real parsed values (previously ~50% blank-fallback).
Full suite: 168 passed.

fix: reject empty prose on turn submit 52555e0455

Empty submission was producing a blank user_turn event in the log and
firing the LLM stream anyway — the bot would invent a response from the
kickoff context alone, producing a monologue with no user input. Two-
layer fix:

- Browser: add `required` to the prose textarea in chat.html so the
  form refuses to submit empty.
- Server: 400 in post_turn when prose.strip() is empty. Defense in
  depth — if a client bypasses the textarea attribute (custom UI,
  curl, etc.), the server still rejects.

Verified live: POST with empty body returns 400; POST with whitespace-
only returns 400; chat shell renders the textarea with required.
Full suite: 168 passed.

fix: use readOnly (not disabled) to lock textarea during stream f0742dd4f9

The form-submit handler in chat.html was setting
``textarea.disabled = true`` synchronously before the browser actually
serialized the form. Disabled form fields are excluded from
submission, so the request body contained ``prose=""`` even when the
user had typed text — which the server (correctly) rejected with the
new empty-prose 400. Net effect: typing "hello" + Send gave a "prose
cannot be empty" error.

Switched to ``readOnly``: same UX (user can't edit while streaming)
but the field IS submitted. The unlock path now also clears the
textarea and refocuses for the next turn.

feat: cap narrative response length + tune sampling d161e7b8e9

Bot replies were running long (4 paragraphs of action+dialogue beats
per turn) because we never set max_tokens on the narrative call. Three
tunable knobs now in Settings (set in data/config.toml to override):

- narrative_max_tokens: int = 400
  Hard cap on each generated response. ~400 tokens ≈ 1–2 short
  paragraphs. Drop to 200 for terse banter, bump to 800+ for longer
  scenes.

- narrative_temperature: float = 0.85
  Sampling temperature. 0.7 = grounded/consistent (slightly stiff),
  0.85 = creative-but-in-character (default), 1.0 = wide variety,
  >1.0 = often off-the-rails.

- prompt closing instruction now nudges: "Keep your response to a
  single beat — one or two short paragraphs at most. Don't monologue;
  leave room for the other person to react."

Both turns.py (post_turn) and regenerate.py forward the params to
client.stream(). FeatherlessClient already passes **params through to
the OpenAI-compat endpoint.

Note: temperature doesn't control length — that was a common
misconception. max_tokens is the actual length cap. Lower temperature
makes word choice more predictable (slightly stiffer voice), not
shorter. Both knobs are useful for different goals.

docs: add Phase 2 implementation plan with parallel-safe waves b8335895e1

13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel
subagent execution where file-disjointness allows.

Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files
and can be dispatched concurrently via the Agent tool with
isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi-
entity turn flow) are single-task because they touch hot files
(_drawer.html, turns.py) that cannot be safely co-modified.

Plan covers:
- T36: group_node schema + handlers (new migration 0008)
- T37: guest_added / guest_removed event handlers (modifies world.py)
- T38: relationship-seed service ("have they met?")
- T39: interjection classifier service
- T40: multi-entity state-update coordinator (6 directed pairs)
- T41: multi-witness memory write helper
- T42: drawer guest add/remove UI + render
- T43: multi-entity prompt assembly (extends T18)
- T44: multi-entity turn flow (rewrites post_turn)
- T45: multi-entity per-POV summaries on scene close
- T46: witness filter cross-coverage tests
- T47: bot_reset cascades to guest references
- T48: Phase 2 documentation update

Plan also documents:
- Worktree-per-subagent dispatch pattern using Agent isolation flag
- Merge ordering per wave (file-disjointness = conflict-free merges)
- Failure recovery (cancel failed parallel task, re-dispatch as solo)
- Conflict prevention checklist (verify Files sections disjoint per wave)

Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and
`parallelGroup` so a future executing-plans run can dispatch correctly.

NOT EXECUTING. Plan only.

feat: guest_added / guest_removed event handlers a1b4e251c5

feat: group_node schema + projector handlers a0d7debce5

feat: relationship-seed service for first-co-appearance prompt c6b3531c64

merge: T36 group_node schema + projector handlers b69a74e69b

merge: T37 guest_added/guest_removed event handlers e58cdbd527

merge: T38 relationship-seed service 6b726b2a4a

test: bump schema_version assertion to 8 after 0008_group_node migration 22db9f3554

feat: interjection classifier service 6a92253ae7

feat: multi-entity state-update coordinator 4ec56dd475

feat: multi-witness memory write helper e7793f2441

merge: T39 interjection classifier service 77b42f1ea5

merge: T40 multi-entity state-update coordinator 9d80b9ae2b

merge: T41 multi-witness memory write helper f24ffb8e4f

feat: drawer guest add/remove + render bb83d97088

merge: T42 drawer guest add/remove + render a90647dddb

feat: per-POV summaries on close for each present witness 4e240347b4

feat: multi-entity prompt assembly with guest activity, edges, group node fcb111310a

merge: T43 multi-entity prompt assembly 9b601650fb

merge: T45 per-POV summaries on close for each present witness 44c8735b27

feat: T44 multi-entity turn flow with interjection support c86b0df411

Rewrites post_turn for the multi-entity world:

- Addressee detection via case-insensitive whole-word match against the
  guest name; defaults to host on no-match or both-match.
- Multi-entity prompt assembly: forwards guest_id so the prompt sees
  the third party's activity / edges / group-node.
- Multi-witness memory write: record_turn_memory_for_present writes one
  memory per present bot witness when a guest is in the room.
- Multi-pair state-update: compute_state_updates_for_present emits one
  edge_update per directed pair (6 with a guest, 2 without).
- Interjection branch (T39): when a guest is present and the primary
  beat completes, the silent witness may follow on. detect_interjection
  decides; on True we stream a second narrative as the witness, append a
  second assistant_turn linked to the same user_turn_id, and re-run the
  multi-pair state update + memory write for the follow-on beat. Cancel
  collapses both halves; a cancelled interjection skips its downstream
  passes so we don't classifier-spam against a half-formed beat.
- Scene-close runs after both beats so apply_scene_close_summary sees
  the full closing scene; T45's guest-aware summarizer handles per-POV
  rewrites for each present witness.

regenerate.py mirrors the prompt / memory / state-update changes for
1:1 and multi-entity scenes. Per the Phase 2 spec, interjection
regeneration is deferred to Phase 2.5 — regenerate only re-streams the
addressee turn for v2.

Tests: adds 5 cases to tests/test_turn_flow.py covering the no-guest
regression, multi-bot without interjection, multi-bot with interjection,
scene-close per-POV rewrites, and addressee routing on a named-bot
prose. Each test pins its own canned MockLLMClient queue with the call
shape documented in the docstring.

merge: T44 multi-entity turn flow with interjection support 60ac33a787

test: witness filter coverage for multi-entity scenarios d40313063c

fix: bot_reset cascades to guest references in other chats fb17ba0657

docs: phase 2 status, behavioral defaults, deferred items 321810fa54

merge: T46 witness filter coverage for multi-entity scenarios 9f35669936

merge: T47 bot_reset cascades to guest references f6b75b25eb

merge: T48 Phase 2 documentation update bb87fcbd4a

docs: add Phase 3 implementation plan with parallel-safe waves 379054755a

19 tasks across 8 waves covering events with lifecycles, time skips
(elision + jump), active threads, significance/retrieval refinements,
and meanwhile scenes (host+guest with no 'you'). Mirrors the Phase 2
plan structure: pre-flight, parallel-execution strategy with worktree
isolation, file-disjointness analysis per wave, and per-task TDD spec
with commit messages.

Phase 3 schema: adds 0009_events.sql, 0010_threads.sql,
0011_meanwhile_scenes.sql (final version 11). Builds on Phase 2's
3-entity scene support and event-sourced architecture.

docs: add Phase 2.5 cleanup plan (Phase 1.5 + 2.5/3 backlog) e05f28e9d5

8 tasks across 5 waves consolidating the 15-item backlog tracked in
CLAUDE.md (5 from Phase 1.5 cleanup + 10 from Phase 2.5/3). Items are
grouped by file ownership so each wave stays file-disjoint:

- Wave 1 (parallel): open_db refactor, bot_reset orphan cleanup,
  LLM-merged group meta-summary
- Wave 2 (single): prompt.py polish — witness role parametric, single
  ACTIVITIES block, NICE trim documented
- Wave 3 (single): drawer polish — deferred v1 edits, first-meeting
  gate, witness flag editing
- Wave 4 (parallel): regenerate.py polish (SSE + interjection
  regenerate + stale-guest cleanup); turn-flow polish + new addressee
  service (classifier addressee + significance for interjection +
  scene-close-on-cancel pinned + stale-guest cleanup)
- Wave 5 (single): docs sweep

No schema migrations. Bundled tasks split into per-item sub-commits
for clean review bisection. Uses task ids T68-T75 to avoid collision
with Phase 3 plan (T49-T67) regardless of merge order.

refactor: open_db with check_same_thread parameter (T68) 994728b5ed

fix: bot_reset purges orphaned 'you' activity rows (T69) c1e419e012

feat: LLM-merged group meta-summary (T70) 13c23fd898

merge: T68 open_db with check_same_thread parameter 1b9144442a

merge: T69 bot_reset purges orphaned 'you' activity rows f701f9d7dd

merge: T70 LLM-merged group meta-summary b13f3b4e47

fix: witness role parametric in prompt assembly (T71.1) 428438b223

Phase 2 T46 pinned the witness mask contract on search_memories with a
witness_role parameter (host/guest/you). The prompt-assembly call site
in assemble_narrative_prompt was hardcoded to "host", which silently
returned the wrong rows when the speaker was the guest bot.

Derive the witness role from chat membership via a new private helper
_witness_role_for(speaker_bot_id, host_bot_id), and apply it at the
search_memories call. Behaviour is identical when the speaker is the
host (or when no guest is present); the fix is load-bearing only when
the guest bot is the speaker — exactly the scenario Phase 2 T43 added
support for.

Tests: pin both directions (host-as-speaker and guest-as-speaker) by
patching the imported search_memories reference and asserting the
witness_role argument the call site emits.

refactor: single ACTIVITIES: block with bullet-level trim (T71.2) afd1a50958

Phase 2 T43 added a SECOND ACTIVITIES: block to render guest activity
separately from you+speaker. Two consecutive ACTIVITIES: headers can
read as a duplicate-section bug to the LLM and bloat the prompt.

Consolidate to a single ACTIVITIES: block whose body is composed from
up to three bullets (you, speaker, guest). The block itself is
MUST-tier (always renders); bullet-level trim drops bullets in the
order guest -> group node -> you -> other edges, with the speaker
bullet as the MUST-tier floor (the speaker's own current activity is
the load-bearing slice).

Implementation chose Option B from the polish plan: pre-truncate the
bullets list at trim time before _build_activity_block runs, rather
than introduce a granular tier mode in the trim machinery. Rationale
documented in code; the existing block-level trim ladder gains a
single new toggle (include_you_activity) and the SHOULD-tier
guest_activity_block is gone.

Tests:
- test_single_activities_block_with_three_bullets_when_3_entities:
  exactly one ACTIVITIES: header with all three entity bullets.
- test_tight_budget_drops_guest_activity_bullet_first: speaker bullet
  survives, guest bullet absent under tight budget.
- Existing test_assemble_with_tight_budget_drops_guest_activity_first
  still passes (asserts on bullet absence, not block-header absence).

chore: document NICE trim order rationale (T71.3) 73bb8c1f17

T18 review (Phase 1) noted the NICE-tier trim drops previous-scene
FIRST while §6.3 spec lists previous-scene LAST in the NICE tier
group. Decision: keep the existing greedy order (previous-scene
first), and document why.

Rationale (now in code at the trim ladder):
  1. Cheapest-impact-first — a per-POV previous-scene summary loses
     less narrative continuity than the older dialogue turns or
     memory hits it competes with.
  2. Greedy lookahead is more expensive than the marginal narrative
     loss. Dropping previous-scene typically clears the soft-budget
     slack in one step.

Test added: test_nice_trim_order_documented pins the observed order
(previous-scene -> memories -> dialogue) so a future refactor can't
silently invert it. Sized so that all-NICE config overflows soft but
dropping just previous-scene fits — proves memories and older
dialogue turns survive while previous-scene is the FIRST drop.

merge: T71 prompt.py polish (witness role + ACTIVITIES + NICE trim docs) 789b9bd042

feat: drawer edits for edge_trust / edge_summary / memory_pov_summary / knowledge_facts (T72.1) 21404a373b

Adds the four POST routes whose state-layer support was already
dispatched by the manual_edit projector (edge_trust, edge_summary,
memory_pov_summary) plus a new edge_knowledge_fact dispatch branch for
add/remove fact list manipulation. Drawer template gains editable
textareas, sliders, and add/remove fact controls. Remove semantics on
knowledge_fact match by string (not index) so concurrent edge_update
events appending facts between drawer renders don't desync the form.

feat: first-meeting gate on drawer Add-guest form (T72.2) c265e4ce0f

When a host->candidate edge already exists from a prior chat, the
Add-guest form renders the prose textarea disabled with an "already
know each other" note. Submission without the explicit "re-seed
anyway" toggle skips seed_inter_bot_edges so existing edge content
(affinity, trust, knowledge, summaries) survives — guest_added and
group_node_initialized still fire. A small inline script enables /
disables the textarea per-option based on a pre-computed
existing_guest_edges dict surfaced by the GET handler.

feat: drawer witness flag inline-edit (T72.3) 607d0971c4

Memories grow per-flag witness checkboxes (you / host / guest) that
auto-submit on change via HTMX. The new POST route emits a manual_edit
event with target_kind=memory_witness and a {flag, value} payload;
prior_value mirrors the same shape so an inverse edit restores the
flag. The drawer's recent-memories query now selects the three
witness columns alongside the existing fields so the template can
render checkbox state without a second query per row.

merge: T72 drawer polish (deferred edits + first-meeting gate + witness flag editing) e632a6247d

feat: regenerate broadcasts turn_html over SSE (T73.1) 6f22e86f54

After the new assistant_turn lands, publish a `turn_html_replace` SSE
event carrying the rendered HTML, the new turn_id, and the original
assistant_turn id as `supersedes_id` so connected tabs can swap the
prior DOM node in-place. Phase 1 T29 deferred this — page had to refresh
to see the regenerated turn.

Uses a new event name (not the existing `turn_html`) because the HTMX
`sse-swap="turn_html"` consumer expects raw HTML and an *append*
semantic; regenerate is a *replace*. The new event ships as JSON
(supersedes_id forces sse.py's JSON branch) so the front-end JS can
read the swap target from the payload.

Test: `test_regenerate_broadcasts_turn_html_over_sse` patches the
`publish` reference inside the regenerate module and asserts the
event shape.

feat: classifier-based addressee detection (T74.1) c874883a84

Replace the substring _detect_addressee_id helper with a classifier
call for the multi-entity case. The substring helper is kept as a
fast-path for the no-guest case (no LLM round-trip needed when only
one bot is present, preserves throughput).

- New service chat/services/addressee.py wrapping the existing
  classifier wrapper. AddresseeDecision carries addressee_id +
  confidence + reason; classifier failure falls back to the host with
  reason="fallback" (graceful-degradation, matches the relationship_seed
  / interjection pattern).
- chat/web/turns.py post_turn now calls detect_addressee in the
  multi-entity branch; 1:1 keeps the substring path.
- tests/test_addressee.py: 3 new tests (guest pick, host pick,
  classifier-failure fallback).
- tests/test_turn_flow.py: existing multi-entity tests now feed a
  canned addressee response in the queue. The addressee-routing test
  is updated to assert classifier-driven routing rather than substring.

fix: enqueue significance for interjection memories (T74.2) 88fae33152

T44's interjection branch wrote interjection memories via
record_turn_memory_for_present but never enqueued a SignificanceJob,
so the interjection beat could land in memory but never be scored —
which meant it could never auto-pin even when it carried a pivotal
moment.

- Capture the host-POV memory id from the interjection's memory write
  result and enqueue a SignificanceJob mirroring the primary turn's
  pattern. One enqueue per beat (host id; guest POV piggybacks on the
  same score since the prose is identical for v2 — per-POV rewrite
  happens at scene close in T45).
- New test test_interjection_enqueues_significance_job pins the
  contract by intercepting worker.enqueue and asserting two distinct
  jobs land per 3-entity turn that fires an interjection.

feat: regenerate covers interjection turns (T73.2) f2a57005e5

Phase 2 T44 deferred interjection regenerate — when the original turn
group included a follow-on interjection beat we left it untouched. Now
regenerate redoes BOTH halves:

- Detect a sibling interjection by looking up assistant_turn events
  pinned to the same user_turn_id with `interjection_of` set.
- After streaming the new primary, run `detect_interjection` against
  the new primary text.
- If True: stream a new interjection from the silent witness, append
  with `interjection_of=<new primary speaker_id>`, supersede the
  original interjection, and re-run memory + state-update for the new
  beat.
- If False: supersede the original interjection without a replacement
  (back-pointer goes to the new primary so the row stays consistently
  hidden).

Also broadcast a `turn_html_replace` event for the new interjection so
the front-end can swap the prior interjection node in place (mirrors
T73.1's primary swap).

Tests:
- `test_regenerate_with_interjection_redoes_both_turns`: classifier
  returns True; assert two new assistant_turns land for the same
  user_turn, second carries `interjection_of`, originals superseded.
- `test_regenerate_drops_interjection_when_classifier_returns_false`:
  classifier returns False; assert one new assistant_turn (primary
  only) and the original interjection is superseded with no
  replacement.

`interjection_of` carries the primary's *speaker_id* (matching the
existing convention in chat/web/turns.py) rather than the event_id.

chore: remove defensive stale-guest degrade in regenerate.py (T73.3) bd13b64959

Phase 2 T44 added a defensive degrade-to-1:1 here when
`chat.guest_bot_id` pointed at a deleted bot. T47 fixed the root cause:
`bot_reset` cascade-clears the column when the referenced bot is purged
(verified by tests/test_reset.py), so the guard was dead code.

No corresponding stale-guest test existed in tests/test_regenerate.py
to remove. The bot_reset cascade test in tests/test_reset.py already
covers the root-cause behavior.

chore: pin scene-close-on-cancel behavior + comment rationale (T74.3) bfb2ffb6f6

Phase 2 T44 review noted that scene close still runs when a primary
turn is cancelled mid-stream and asked the implementer to review.

Review finding: the existing behavior is correct, not a bug. The
close-detection branch in post_turn consumes ONLY the user's prose
(fully appended to the event_log BEFORE streaming starts) and the
current container name. It does NOT consume the bot's output. A user
who types "we're done here, fade out" and then hits Stop mid-stream
still meant to close — the cancelled bot beat doesn't invalidate
that intent.

- Document the rationale with an inline comment near the
  close-detection branch in chat/web/turns.py.
- Add regression test
  test_cancelled_turn_still_closes_scene_when_user_prose_signals_close
  that drives a stream raising CancelledError on first iteration and
  asserts the scene_closed event still lands.

chore: remove defensive stale-guest degrade in turns.py (T74.4) 6d98728a2e

T44 carried a defensive degrade-to-1:1 block in post_turn for the
case where chat.guest_bot_id pointed at a deleted bot. T47 then
fixed the root cause by adding a bot_reset cascade that clears
guest_bot_id from any chat that referenced the deleted bot, so the
post_turn defensive block was rendered dead.

Remove the orphan-clear branch and replace it with a comment
documenting that get_bot now returns a real row when guest_bot_id
is non-None. The cascade behavior is pinned by
test_reset_clears_guest_reference_in_other_chats in tests/test_reset.py.

merge: T73 regenerate.py polish (turn_html SSE + interjection regenerate + stale-guest cleanup) dbc9690358

merge: T74 turn-flow polish + addressee service 67d6f3fe68

docs: phase 2.5 status, prune shipped backlog items, capture phase 2.6 follow-ups (T75) c6e0130e59

merge: T75 phase 2.5 docs sweep + phase 2.6 backlog e61bd9cb08

dohertj2 merged commit e4fd888b53 into main

2026-04-26 20:00:40 -04:00

dohertj2 referenced this issue from a commit

2026-04-26 20:00:40 -04:00

Merge pull request 'Phase 2.5 cleanup: 15-item backlog burndown' (#3) from phase-2.5 into main

Sign in to join this conversation.