dohertj2/chat - chat - Gitea: Git with a cup of tea

dohertj2/chat

Fork 0

Commit Graph

Author	SHA1	Message	Date
Joseph Doherty	5c039c8e56	fix: classifier timeout + Featherless concurrency cap Two related issues blocking real-world use of the kickoff parse: 1. Classifier calls take ~12s end-to-end on Featherless for the complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of structured JSON). The 10s timeout was firing on most attempts, causing all 3 retries to time out and the empty-fallback to render with blank form values. Bumping the default classifier_timeout_s 10 → 30s gives generous headroom; measured p99 is ~13s, so 30s is comfortable. 2. Featherless caps concurrent connections per account (2 on free / lower paid tiers). Each turn flow can fire 4–5 calls (parse, scene-close detect, narrative stream, two state-update passes) plus the background significance worker. Without a gate, we'd exceed the cap and fail. Added a class-level ``asyncio.Semaphore`` to FeatherlessClient, shared across all instances, configured once in lifespan from ``Settings.featherless_max_concurrent`` (default 2). Both ``generate`` and ``stream`` acquire the semaphore for the duration of the call; the stream holds it until the async generator completes, so token streaming is correctly accounted for. Verified live: 4/4 sequential kickoff parses for the same bot all succeed with real parsed values (previously ~50% blank-fallback). Full suite: 168 passed.	2026-04-26 15:15:14 -04:00
Joseph Doherty	e627356168	feat: LLMClient protocol with Featherless and mock implementations	2026-04-26 11:35:57 -04:00

Author

SHA1

Message

Date

Joseph Doherty

5c039c8e56

fix: classifier timeout + Featherless concurrency cap

Two related issues blocking real-world use of the kickoff parse:

1. Classifier calls take ~12s end-to-end on Featherless for the
   complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of
   structured JSON). The 10s timeout was firing on most attempts,
   causing all 3 retries to time out and the empty-fallback to render
   with blank form values. Bumping the default
   classifier_timeout_s 10 → 30s gives generous headroom; measured
   p99 is ~13s, so 30s is comfortable.

2. Featherless caps concurrent connections per account (2 on free /
   lower paid tiers). Each turn flow can fire 4–5 calls (parse,
   scene-close detect, narrative stream, two state-update passes)
   plus the background significance worker. Without a gate, we'd
   exceed the cap and fail.

   Added a class-level ``asyncio.Semaphore`` to FeatherlessClient,
   shared across all instances, configured once in lifespan from
   ``Settings.featherless_max_concurrent`` (default 2). Both
   ``generate`` and ``stream`` acquire the semaphore for the duration
   of the call; the stream holds it until the async generator
   completes, so token streaming is correctly accounted for.

Verified live: 4/4 sequential kickoff parses for the same bot all
succeed with real parsed values (previously ~50% blank-fallback).
Full suite: 168 passed.

2026-04-26 15:15:14 -04:00

Joseph Doherty

e627356168

feat: LLMClient protocol with Featherless and mock implementations

2026-04-26 11:35:57 -04:00

2 Commits