Two related issues blocking real-world use of the kickoff parse:
1. Classifier calls take ~12s end-to-end on Featherless for the
complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of
structured JSON). The 10s timeout was firing on most attempts,
causing all 3 retries to time out and the empty-fallback to render
with blank form values. Bumping the default
classifier_timeout_s 10 → 30s gives generous headroom; measured
p99 is ~13s, so 30s is comfortable.
2. Featherless caps concurrent connections per account (2 on free /
lower paid tiers). Each turn flow can fire 4–5 calls (parse,
scene-close detect, narrative stream, two state-update passes)
plus the background significance worker. Without a gate, we'd
exceed the cap and fail.
Added a class-level ``asyncio.Semaphore`` to FeatherlessClient,
shared across all instances, configured once in lifespan from
``Settings.featherless_max_concurrent`` (default 2). Both
``generate`` and ``stream`` acquire the semaphore for the duration
of the call; the stream holds it until the async generator
completes, so token streaming is correctly accounted for.
Verified live: 4/4 sequential kickoff parses for the same bot all
succeed with real parsed values (previously ~50% blank-fallback).
Full suite: 168 passed.