Two related issues blocking real-world use of the kickoff parse:
1. Classifier calls take ~12s end-to-end on Featherless for the
complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of
structured JSON). The 10s timeout was firing on most attempts,
causing all 3 retries to time out and the empty-fallback to render
with blank form values. Bumping the default
classifier_timeout_s 10 → 30s gives generous headroom; measured
p99 is ~13s, so 30s is comfortable.
2. Featherless caps concurrent connections per account (2 on free /
lower paid tiers). Each turn flow can fire 4–5 calls (parse,
scene-close detect, narrative stream, two state-update passes)
plus the background significance worker. Without a gate, we'd
exceed the cap and fail.
Added a class-level ``asyncio.Semaphore`` to FeatherlessClient,
shared across all instances, configured once in lifespan from
``Settings.featherless_max_concurrent`` (default 2). Both
``generate`` and ``stream`` acquire the semaphore for the duration
of the call; the stream holds it until the async generator
completes, so token streaming is correctly accounted for.
Verified live: 4/4 sequential kickoff parses for the same bot all
succeed with real parsed values (previously ~50% blank-fallback).
Full suite: 168 passed.
The kickoff parse-and-confirm route was 500-ing intermittently because
Hermes-3 + Featherless's response_format={"type":"json_object"} only
guarantees JSON output, NOT a particular schema. The model was inventing
its own field names (sceneTime, entities, settingDetails) instead of
the KickoffParse fields, causing Pydantic validation to fail on both
classify() retries.
Three changes:
1. Include the Pydantic JSON schema in the system prompt so the model
knows exactly which keys to produce. Affects every classify() call
(kickoff parse, turn parse, scene-close detect, significance,
state-update, scene summarize). Strip ```json fences if the model
wraps its output. Bump retries 2 → 3 (model is stochastic; one extra
attempt closes most of the remaining gap).
2. parse_kickoff() now passes a default empty KickoffParse so the
route degrades to a fillable form instead of 500 when the classifier
ultimately fails. The confirm form is the human-in-the-loop; an
empty form is strictly better UX than a stack trace.
3. Tests updated: bumped canned-failure arrays from 2 → 3 entries to
match the new attempt count; renamed kickoff test from
"raises_when_classifier_fails_twice" to
"falls_back_to_empty_when_classifier_fails" reflecting the new
degraded-but-usable behavior.
Verified live with all 3 sample bots (maya/eli/sam) — kickoff route
returns 200 across multiple attempts. Full suite: 168 passed.
Idempotent seeder for three sample bots (Maya — coworker slow-burn,
Eli — live-in partner, Sam — bartender / new connection). Each is a
distinct relational archetype to exercise the system from different
angles. Run from repo root:
.venv/bin/python scripts/seed_sample_bots.py
Re-running skips ids that already exist. After seeding, walk each bot
through kickoff parse-and-confirm at /bots/<id>/kickoff.
- .gitignore: add *.egg-info/ so editable installs don't show in git status.
- pyproject.toml: add [build-system] and [tool.setuptools.packages.find]
scoped to chat*, fixing pip install -e . which was failing on data/
auto-discovery.
- CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status,
capturing the small follow-ups surfaced in implementer reviews
(open_db refactor, regenerate SSE broadcast, you-activity purge,
drawer edits for deferred fields, NICE trim order).
Resolves the open/deferred decisions from the v1 requirements brainstorm:
runtime stack, classifier model, token budgets, OOC marker, data layout.
- Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement,
not a polish item). 127.0.0.1 only, no auth in v1.
- Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented
fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated).
- Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier;
Must/Should/Nice trimming tiers spelled out in §3.2.
- OOC marker locked to ((double parens)), configurable.
- All runtime data lives under <repo>/data/ (DB, backups, snapshots,
exports, config). Tree is gitignored. CHAT_DB_PATH env var honored.
CLAUDE.md and the requirements doc updated to match. Decisions log in
the requirements doc appendix extended with the new locks (#17–21).
- docs/plans/2026-04-26-v1-requirements-design.md captures the v1
product requirements and behavioral spec from the initial brainstorm
(use case, scope, data model, authoring, play loop, memory, time,
rollback, phase cut, non-negotiable rules).
- README.md introduces the project for the gitea repo.
- CLAUDE.md updated to reference the requirements doc.
- .gitignore added for macOS metadata.
- rp-engine-design.md: full design for the simulation-based roleplay engine
(entities, containers, directed relationship graph, witnessed-by memory,
scoped events, scene compression, event-sourced state, time skips).
- CLAUDE.md: working summary and conventions for development.