Files
chat/chat
Joseph Doherty 3b83786b8b feat: cap narrative output at 2-3 beats via trim_to_max_beats post-processor
Verbose roleplay-tuned narrators (Cydonia, Magnum, etc.) reliably
ignore prompt-level beat-count instructions and ramble for 6-12
asterisk-action beats per turn — even with HARD CAP language and
worked examples in the closing instruction. The fix is a deterministic
post-stream trimmer:

- New trim_to_max_beats(text, max_beats) in chat/services/prompt.py.
  Counts * characters in the streamed output (each beat = 2
  asterisks: open + close), trims at the start of the (max_beats+1)th
  asterisk action, strips trailing whitespace. Idempotent and safe
  on under-cap input.

- Wired into post_turn for both the primary stream (3-beat cap) and
  the optional interjection stream (2-beat cap — interjections are
  by definition shorter chime-ins).

- Tightened the closing instruction: explicit "HARD CAP: 2-3 beats"
  with "After the third beat, STOP". Helps the well-behaved models
  self-cap; the post-processor catches the rest.

- max_tokens: 250 -> 160 (lets the 3rd beat finish naturally before
  hitting the physical cap; trim_to_max_beats handles 4+ beat
  overflow). temperature: 0.85 -> 0.7 (Cydonia is more compliant
  with format instructions at slightly cooler sampling).

- Test budgets bumped (closing grew ~15 tokens with the new wording).
  6 new tests for trim_to_max_beats covering passthrough, exact-cap,
  4-beat trim, 6-beat runaway, lower caps, zero cap.

Verified live: 4-turn bench against chat_maya, every response is
2-3 beats consistently. Suite: 470 passed in 11.7s.
2026-04-27 14:19:21 -04:00
..