Verbose roleplay-tuned narrators (Cydonia, Magnum, etc.) reliably
ignore prompt-level beat-count instructions and ramble for 6-12
asterisk-action beats per turn — even with HARD CAP language and
worked examples in the closing instruction. The fix is a deterministic
post-stream trimmer:
- New trim_to_max_beats(text, max_beats) in chat/services/prompt.py.
Counts * characters in the streamed output (each beat = 2
asterisks: open + close), trims at the start of the (max_beats+1)th
asterisk action, strips trailing whitespace. Idempotent and safe
on under-cap input.
- Wired into post_turn for both the primary stream (3-beat cap) and
the optional interjection stream (2-beat cap — interjections are
by definition shorter chime-ins).
- Tightened the closing instruction: explicit "HARD CAP: 2-3 beats"
with "After the third beat, STOP". Helps the well-behaved models
self-cap; the post-processor catches the rest.
- max_tokens: 250 -> 160 (lets the 3rd beat finish naturally before
hitting the physical cap; trim_to_max_beats handles 4+ beat
overflow). temperature: 0.85 -> 0.7 (Cydonia is more compliant
with format instructions at slightly cooler sampling).
- Test budgets bumped (closing grew ~15 tokens with the new wording).
6 new tests for trim_to_max_beats covering passthrough, exact-cap,
4-beat trim, 6-beat runaway, lower caps, zero cap.
Verified live: 4-turn bench against chat_maya, every response is
2-3 beats consistently. Suite: 470 passed in 11.7s.
Rewrites the closing instruction in assemble_narrative_prompt to enforce
the asterisk-action / interleaved-beat format: actions wrapped in
*asterisks* in third person, dialogue as plain text between beats (no
quote marks), 2-4 short concrete beats per response, no inner monologue
or stage-direction adverbs. Includes a one-line worked example so the
model has a concrete target.
Was producing first-person prose blocks like 'I stare at you... "Well,
that's direct," I murmur'. Target style is short interleaved beats:
'*She turns with soapy hands to cup your face* That's how I know it's
real... *She kisses you softly* You love me when I'm messy...'
Drops narrative_max_tokens 400 -> 250 so the model can't drift into
multi-paragraph monologue. Bumps three test budgets to fit the larger
closing (closing grew ~80 -> ~200 tokens; tests still exercise the
same trim-order behavior, just with proportionally larger budgets).
T18 review (Phase 1) noted the NICE-tier trim drops previous-scene
FIRST while §6.3 spec lists previous-scene LAST in the NICE tier
group. Decision: keep the existing greedy order (previous-scene
first), and document why.
Rationale (now in code at the trim ladder):
1. Cheapest-impact-first — a per-POV previous-scene summary loses
less narrative continuity than the older dialogue turns or
memory hits it competes with.
2. Greedy lookahead is more expensive than the marginal narrative
loss. Dropping previous-scene typically clears the soft-budget
slack in one step.
Test added: test_nice_trim_order_documented pins the observed order
(previous-scene -> memories -> dialogue) so a future refactor can't
silently invert it. Sized so that all-NICE config overflows soft but
dropping just previous-scene fits — proves memories and older
dialogue turns survive while previous-scene is the FIRST drop.
Phase 2 T43 added a SECOND ACTIVITIES: block to render guest activity
separately from you+speaker. Two consecutive ACTIVITIES: headers can
read as a duplicate-section bug to the LLM and bloat the prompt.
Consolidate to a single ACTIVITIES: block whose body is composed from
up to three bullets (you, speaker, guest). The block itself is
MUST-tier (always renders); bullet-level trim drops bullets in the
order guest -> group node -> you -> other edges, with the speaker
bullet as the MUST-tier floor (the speaker's own current activity is
the load-bearing slice).
Implementation chose Option B from the polish plan: pre-truncate the
bullets list at trim time before _build_activity_block runs, rather
than introduce a granular tier mode in the trim machinery. Rationale
documented in code; the existing block-level trim ladder gains a
single new toggle (include_you_activity) and the SHOULD-tier
guest_activity_block is gone.
Tests:
- test_single_activities_block_with_three_bullets_when_3_entities:
exactly one ACTIVITIES: header with all three entity bullets.
- test_tight_budget_drops_guest_activity_bullet_first: speaker bullet
survives, guest bullet absent under tight budget.
- Existing test_assemble_with_tight_budget_drops_guest_activity_first
still passes (asserts on bullet absence, not block-header absence).
Phase 2 T46 pinned the witness mask contract on search_memories with a
witness_role parameter (host/guest/you). The prompt-assembly call site
in assemble_narrative_prompt was hardcoded to "host", which silently
returned the wrong rows when the speaker was the guest bot.
Derive the witness role from chat membership via a new private helper
_witness_role_for(speaker_bot_id, host_bot_id), and apply it at the
search_memories call. Behaviour is identical when the speaker is the
host (or when no guest is present); the fix is load-bearing only when
the guest bot is the speaker — exactly the scenario Phase 2 T43 added
support for.
Tests: pin both directions (host-as-speaker and guest-as-speaker) by
patching the imported search_memories reference and asserting the
witness_role argument the call site emits.