LLM Fallback Reason Triage
Grounded assistants rely on retrieval quality and reliable LLM providers. When something goes wrong, the model should refuse and log why. Fallback reasons turn a vague “I don’t know” into actionable telemetry. Here is how to triage them systematically.
Core categories
| Fallback reason | Meaning | Next action |
|---|---|---|
| low_score | Retrieval confidence below threshold | Expand crawl scope, tune chunking, adjust thresholds cautiously. |
| no_context | Nothing retrieved due to empty corpus filters | Verify tenant allowlists, language filters, or embed tokens. |
| provider_error | Upstream LLM rejected the call (5xx, throttling) | Check LLM gateway retries, provider status, failover to alternate model. |
| timeout | Provider or network exceeded SLA | Tune timeout budget, add jitter, or reduce payload size. |
| context_overflow | Prompt + excerpts exceed context window | Trim preamble, reduce number of chunks, or upgrade model window. |
CrawlBot stores these reasons per message with timestamps, tenant IDs, embed IDs, and retrieval metadata.
Triage workflow
- Monitor dashboards: Plot fallback_reason counts per tenant and embed. Set alerts when any category exceeds 5 percent of chats.
- Drill into transcripts: For low_score/no_context, review retrieved chunks, scores, and citations to spot crawl gaps.
- Check LLM gateway logs: Correlate provider_error/timeout spikes with upstream provider incidents; ensure failover to OpenAI or Gemini fallback stays healthy.
- Validate prompts: Context_overflow often signals prompts with too much boilerplate; ensure instructions stay tight and reference citations succinctly.
- Feed results to ops: Tag fallback events with severity and route them to the AI ops Chat space. Include direct links to CrawlBot’s analytics view.
Prevention tactics
- Keep crawl runs fresh (IndexNow + scheduled crawls) to avoid low_score from stale content.
- Use adaptive thresholds seeded from P95 historical scores per AGENTS spec.
- Trim citations to the minimum required per policy; long legal disclaimers eat context budget.
- Store fallback reason along with user feedback. If thumbs-down correlate with provider_error, escalate to LLM vendor.
Reporting cadence
- Daily: Quick scan of fallback charts for enterprise tenants and demo environments.
- Weekly: Generate a summary with counts per reason, top affected tenants, and remediation status.
- Quarterly: Review threshold settings, prompt boilerplate, and crawler coverage using aggregated fallback trends.
Fallback reason triage keeps AI quality measurable. Instead of shipping band-aid prompts, use these signals to fix the right layer: crawl scope, retrieval thresholds, or provider reliability.***